Cutting-edge vision-language model exceling in retrieving text and metadata from images.
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Robust Speech Recognition via Large-Scale Weak Supervision.
Multi-lingual model supporting speech-to-text recognition and translation.
Multi-lingual model supporting speech-to-text recognition and translation.
Transform PDFs into AI podcasts for engaging on-the-go audio content.
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
Enhance speech by correcting common audio degradations to create studio quality speech output.
Create intelligent, interactive avatars for customer service across industries
Natural, high-fidelity, English voices for personalizing text-to-speech services and voiceovers