Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Multi-lingual model supporting speech-to-text recognition and translation.
Multi-lingual model supporting speech-to-text recognition and translation.
Transform PDFs into AI podcasts for engaging on-the-go audio content.
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
Enhance speech by correcting common audio degradations to create studio quality speech output.
Create intelligent, interactive avatars for customer service across industries
Natural, high-fidelity, English voices for personalizing text-to-speech services and voiceovers
OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.