Explore Speech Models | Try NVIDIA NIM APIs

Explore Speech Models | Try NVIDIA NIM APIs

Skip to main content

Discover Models Skills Blueprints GPUs Docs Forums

workstations

Run on RTX
Run on Spark
Run on Station

models

Reasoning
Vision
Visual Design
Retrieval
Speech
Biology
Simulation
Climate & Weather
Safety & Moderation

industries

Automotive
Financial Services
Gaming
Healthcare
Industrial
Robotics

Speech

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models

Free serverless APIs for development

Accelerated by DGX Cloud

Self-Host on your GPU infrastructure

Continuous vulnerability fixes

Speech to Speech (S2S)

Ultra-low latency, end-to-end, full duplex models for real-time voice-to-voice interactions.

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

NVIDIA NIM voice chat

2K API calls in the last 30 days

Last updated on March 16, 2026

Convert Text to Speech (TTS)

Convert written text to spoken audio in multiple languages with NVIDIA Nemotron Speech models.

Downloadable

chatterbox-multilingual-tts

Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.

Speech Generation TTS multilingual

7K API calls in the last 30 days

Last updated on June 3, 2026

Downloadable

magpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

NVIDIA Riva TTS multilingual

Last updated on June 26, 2025

Free Endpoint

magpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

NVIDIA Riva TTS

Last updated on June 12, 2025

Automatic Speech Recognition (ASR)

Low Latency NVIDIA Nemotron Speech transcription models for your agentic AI workflows.

Downloadable

nemotron-asr-streaming

Real-time speech recognition for English

Automatic Speech Recognition

NVIDIA NIM NVIDIA Riva

Last updated on March 14, 2026

Downloadable

parakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

Automatic Speech Recognition

NVIDIA NIM NVIDIA Riva

19K API calls in the last 30 days

Last updated on April 30, 2025

Downloadable

parakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

NVIDIA NIM Streaming Taiwanese

1K API calls in the last 30 days

Last updated on October 16, 2025

Downloadable

parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

English NVIDIA NIM NVIDIA Riva speech-to-text

146K API calls in the last 30 days

Last updated on July 30, 2025

Downloadable

parakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

English NVIDIA NIM Streaming batch

Last updated on June 26, 2025

Downloadable

parakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

Batch English Fast NVIDIA NIM Run-on-RTX Streaming

1K API calls in the last 30 days

Last updated on June 13, 2025

Downloadable

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech Recognition

Automatic Speech Translation NVIDIA NIM NVIDIA Riva

52K API calls in the last 30 days

Last updated on April 10, 2025

Downloadable

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

AST Multilingual NVIDIA NIM NVIDIA Riva OpenAI batch whisper

Last updated on April 10, 2025

Downloadable

parakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

NVIDIA NIM Streaming Vietnamese

123 API calls in the last 30 days

Last updated on September 8, 2025

Downloadable

parakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

Mandarin NVIDIA NIM Streaming

13K API calls in the last 30 days

Last updated on September 9, 2025

Downloadable

parakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

NVIDIA NIM Spanish Streaming

1K API calls in the last 30 days

Last updated on September 9, 2025

Neural Machine Translation (NMT) & Audio Speech Translation (AST)

Enable seamless multilingual global communication across dozens of languages with NVIDIA Nemotron Speech models.

Free Endpoint

riva-translate-4b-instruct-v1_1

Translation model in 12 languages with few-shots example prompts capability.

neural machine translation

282K API calls in the last 30 days

Last updated on December 12, 2025

Downloadable

riva-translate-1.6b

Enable smooth global interactions in 36 languages.

Neural machine translation

42K API calls in the last 30 days

Last updated on June 26, 2025

Downloadable

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech Recognition

Automatic Speech Translation NVIDIA NIM NVIDIA Riva

52K API calls in the last 30 days

Last updated on April 10, 2025

Downloadable

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

AST Multilingual NVIDIA NIM NVIDIA Riva OpenAI batch whisper

Last updated on April 10, 2025