Multi-modal model to classify safety for input prompts as well output responses.
Multimodal question-answer retrieval representing user queries as text and documents as images.
State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.
High accuracy and optimized performance for transcription in 25 languages
Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.
Robust Speech Recognition via Large-Scale Weak Supervision.
Multi-lingual model supporting speech-to-text recognition and translation.
Multi-lingual model supporting speech-to-text recognition and translation.
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
Record-setting accuracy and performance for English transcription.
State-of-the-art accuracy and speed for English transcriptions.
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.