NVIDIA
Explore Models Blueprints GPUs Docs

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for developmentAccelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
DiscoverModelsBlueprintsGPUsDocsForums
models
ReasoningVisionVisual DesignRetrievalSpeechBiologySimulationClimate & WeatherSafety & ModerationRun on RTX
industries
AutomotiveGamingHealthcareIndustrialRobotics

Speech

Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Speech Enhancement

Speech enhancing AI models for common voice degradations.

Run Anywhere

nvidiastudiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

digital humannvidia maxinerun-on-rtxspeech enhancementspeech-to-speech
Run Anywhere

nvidiaBackground Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

digital humannvidia maxinespeech enhancementspeech-to-speech

Automatic Speech Recognition (ASR)

Connect generative AI models to speech by transcribing spoken audio to text.

Run Anywhere

nvidiaparakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

asrenglishnvidia nimnvidia rivaspeech-to-text
Run Anywhere

nvidiaparakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

asrenglishnvidia nimstreamingbatchspeech-to-text
Run Anywhere

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

asrbatchenglishfastnvidia nimrun-on-rtxstreamingspeech-to-text
Run Anywhere

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastmultilingualnvidia nimnvidia rivaspanishbatchstreamingspeech-to-text
Run Anywhere

nvidiaparakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

asrmultilingualnvidia nimstreamingspeech-to-text
Run Anywhere

nvidiaconformer-ctc-asr

Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance

asrnvidia nimnvidia rivaspanishstreamingspeech-to-text
Run Anywhere

openaiwhisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

asrastmultilingualnvidia nimnvidia rivaopenaibatchspeech-to-textwhisper
Run Anywhere

nvidiacanary-0.6b-turbo-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastmultilingualnvidia nimnvidia rivabatchfastspeech-to-text

Neural Machine Translation (NMT) & Audio Speech Translation (AST)

Create multilingual generative AI models by translating speech and text between languages.

PREVIEW

nvidiariva-translate-4b-instruct

Translation model in 12 languages with few-shots example prompts capability.

text translation
Run Anywhere

nvidiariva-translate-1.6b

Enable smooth global interactions in 36 languages.

nvidia nimneural machine translationtext translation
Run Anywhere

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastmultilingualnvidia nimnvidia rivaspanishbatchstreamingspeech-to-text
Run Anywhere

nvidiacanary-0.6b-turbo-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastmultilingualnvidia nimnvidia rivabatchfastspeech-to-text
Run Anywhere

openaiwhisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

asrastmultilingualnvidia nimnvidia rivaopenaibatchspeech-to-textwhisper

Convert Text to Speech (TTS)

Voice generative AI models by converting written text to spoken audio.

Run Anywhere

nvidiamagpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

nvidia nimnvidia rivattsmultilingualtext-to-speech
PREVIEW

nvidiamagpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

nvidia nimnvidia rivattstext-to-speech
PREVIEW

nvidiamagpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

nvidia nimnvidia rivattstext-to-speech