Speech enhancing AI models for common voice degradations.

Enhance speech by correcting common audio degradations to create studio quality speech output.

Removes unwanted noises from audio improving speech intelligibility.
Low Latency NVIDIA Nemotron Speech transcription models for your agentic AI workflows.

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

Accurate and optimized English transcriptions with punctuation and word timestamps

Record-setting accuracy and performance for English transcription.

State-of-the-art accuracy and speed for English transcriptions.

Multi-lingual model supporting speech-to-text recognition and translation.

High accuracy and optimized performance for transcription in 25 languages

Robust Speech Recognition via Large-Scale Weak Supervision.

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

Record-setting accuracy and performance for Mandarin English transcriptions.

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.
Enable seamless multilingual global communication across dozens of languages with NVIDIA Nemotron Speech models.

Translation model in 12 languages with few-shots example prompts capability.

Enable smooth global interactions in 36 languages.

Multi-lingual model supporting speech-to-text recognition and translation.

Robust Speech Recognition via Large-Scale Weak Supervision.
Convert written text to spoken audio in multiple languages with NVIDIA Nemotron Speech models.

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

Expressive and engaging text-to-speech, generated from a short audio sample.

Expressive and engaging text-to-speech, generated from a short audio sample.