NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for development
Accelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
DiscoverModelsBlueprintsGPUsDocsForums

workstations

  • Run on RTX
  • Run on Spark

models

  • Reasoning
  • Vision
  • Visual Design
  • Retrieval
  • Speech
  • Biology
  • Simulation
  • Climate & Weather
  • Safety & Moderation

industries

  • Automotive
  • Financial Services
  • Gaming
  • Healthcare
  • Industrial
  • Robotics

Speech

Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Speech to Speech (S2S)

Ultra-low latency, end-to-end, full duplex models for real-time voice-to-voice interactions.

NVIDIA
Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat
English
3d

Automatic Speech Recognition (ASR)

Low Latency NVIDIA Nemotron Speech transcription models for your agentic AI workflows.

NVIDIA
Downloadable

nemotron-asr-streaming

Real-time speech recognition for English
Automatic Speech Recognition
5d
NVIDIA
Downloadable

parakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages
Automatic Speech Recognition
10mo
NVIDIA
Downloadable

parakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.
ASR
5mo
NVIDIA
Downloadable

parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps
ASR
7mo
NVIDIA
Downloadable

parakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.
ASR
8mo
NVIDIA
Downloadable

parakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.
ASR
9mo
NVIDIA
Downloadable

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.
Automatic Speech Recognition
11mo
OpenAI
Downloadable

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.
ASR
11mo
NVIDIA
Downloadable

parakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.
ASR
6mo
NVIDIA
Downloadable

parakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.
ASR
6mo
NVIDIA
Downloadable

parakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
ASR
6mo

Convert Text to Speech (TTS)

Convert written text to spoken audio in multiple languages with NVIDIA Nemotron Speech models.

NVIDIA
Downloadable

magpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
NVIDIA NIM
8mo
NVIDIA
Free Endpoint

magpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.
NVIDIA NIM
9mo
NVIDIA
Free Endpoint

magpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.
NVIDIA NIM
8mo

Neural Machine Translation (NMT) & Audio Speech Translation (AST)

Enable seamless multilingual global communication across dozens of languages with NVIDIA Nemotron Speech models.

NVIDIA
Free Endpoint

riva-translate-4b-instruct-v1_1

Translation model in 12 languages with few-shots example prompts capability.
neural machine translation
3mo
NVIDIA
Downloadable

riva-translate-1.6b

Enable smooth global interactions in 36 languages.
NVIDIA NIM
8mo
NVIDIA
Downloadable

canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.
Automatic Speech Recognition
11mo
OpenAI
Downloadable

whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.
ASR
11mo