Cutting-edge vision-language model exceling in retrieving text and metadata from images.
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Robust Speech Recognition via Large-Scale Weak Supervision.
Multi-lingual model supporting speech-to-text recognition and translation.
Multi-lingual model supporting speech-to-text recognition and translation.
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
Generates consistent characters across a series of images without requiring additional training.
Creates diverse synthetic data that mimics the characteristics of real-world data.