Accurate and optimized English transcriptions with punctuation and word timestamps
Expressive and engaging text-to-speech, generated from a short audio sample.
Translation model in 12 languages with few-shots example prompts capability.
Enable smooth global interactions in 36 languages.
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Removes unwanted noises from audio improving speech intelligibility.
Expressive and engaging text-to-speech, generated from a short audio sample.
A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Robust Speech Recognition via Large-Scale Weak Supervision.
Multi-lingual model supporting speech-to-text recognition and translation.
Transform PDFs into AI podcasts for engaging on-the-go audio content.
Enhance speech by correcting common audio degradations to create studio quality speech output.
Enable smooth global interactions in 36 languages.
Supports Chinese and English languages to handle tasks including chatbot, content generation, coding, and translation.