Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

15 results for

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Speech-to-Text

Retrieval Augmented Generation

Text-to-Embedding

Inference Providers

Deep Infra

Together AI

Publisher

NVIDIA

conformer-ctc-asr

Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance

Model

ASR

Items per page

of 1 pages

NVIDIA

Downloadable

parakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

Model

ASR

71.47K

11mo

NVIDIA

Downloadable

parakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

Model

ASR

2.84K

11mo

NVIDIA

Downloadable

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Model

Automatic Speech Recognition

28.65K

NVIDIA

Downloadable

nemotron-asr-streaming

Real-time speech recognition for English

Model

Automatic Speech Recognition

16.6K

2mo

NVIDIA

Downloadable

parakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

Model

Automatic Speech Recognition

26.6K

NVIDIA

Downloadable

parakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

Model

ASR

1.39K

8mo

NVIDIA

Downloadable

parakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

Model

ASR

130

8mo

NVIDIA

Downloadable

parakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

Model

ASR

7.28K

8mo

NVIDIA

Downloadable

parakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

Model

ASR

1.57K

7mo

OpenAI

Downloadable

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

Model

ASR

77.68K

NVIDIA

Downloadable

parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

Model

ASR

49.29K

9mo

llama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

Model

LLM Multimodal Safety

138K

10mo

NVIDIA

Downloadable

llama-nemotron-embed-vl-1b-v2

Multimodal question-answer retrieval representing user queries as text and documents as images.

Model

nemo retriever

7.15M

3mo

Baidu

Downloadable

paddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Model

Optical Character Recognition

2.2M

10mo