Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

24 results for

Filters (1)

Download Available

API Endpoint

Launchable

Enterprise

Use Case

Speech-to-Text

Text Translation

Text-to-Speech

Object Detection

Image Generation

Publisher

NVIDIA

Mistral AI

OpenAI

audio2face-3d

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

Model

Speech-to-Animation

8mo

NVIDIA

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Model

Automatic Speech Recognition

1.58K

11mo

NVIDIA

gliner-pii

GLiNER PII detects Personally Identifiable Information in text.

Model

PII Detection

36.16K

NVIDIA

magpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

Model

TTS

829

8mo

NVIDIA

magpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

Model

TTS

33.06K

8mo

NVIDIA

magpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

Model

TTS

1.22K

8mo

NVIDIA

maisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

Model

Image Generation

773

11mo

NVIDIA

megatron-1b-nmt

Enable smooth global interactions in 36 languages.

Model

Text Translation

11mo

Mistral AI

mistral-7b-instruct-v0.2

This LLM follows instructions, completes requests, and generates creative text.

Model

chat

486K

9mo

NVIDIA

nv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Model

Image-to-Embedding

1.01M

11mo

NVIDIA

nv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Model

Object Detection

3.45K

11mo

NVIDIA

parakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

Model

Automatic Speech Recognition

41.55K

10mo

NVIDIA

parakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

Model

ASR

8.25K

8mo

NVIDIA

parakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

Model

ASR

5mo

NVIDIA

parakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

Model

ASR

469

5mo

NVIDIA

parakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

Model

ASR

6.97K

5mo

NVIDIA

parakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

Model

ASR

330

4mo

NVIDIA

parakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

Model

ASR

25.8K

8mo

NVIDIA

parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

Model

ASR

2.82K

7mo

NVIDIA

retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Model

Object Detection

794

NVIDIA

riva-translate-1.6b

Enable smooth global interactions in 36 languages.

Model

Text Translation

632K

8mo

NVIDIA

riva-translate-4b-instruct-v1_1

Translation model in 12 languages with few-shots example prompts capability.

Model

nvidia nim

461K

2mo

NVIDIA

visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

Model

image

615

OpenAI

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

Model

ASR

45.35K

11mo

Items per page

of 1 pages