
Accurate and optimized English transcriptions with punctuation and word timestamps

Expressive and engaging text-to-speech, generated from a short audio sample.

Translation model in 12 languages with few-shots example prompts capability.

Enable smooth global interactions in 36 languages.

Expressive and engaging text-to-speech, generated from a short audio sample.

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

Robust Speech Recognition via Large-Scale Weak Supervision.

Multi-lingual model supporting speech-to-text recognition and translation.