Robust Speech Recognition via Large-Scale Weak Supervision.
Multi-lingual model supporting speech-to-text recognition and translation.
Multi-lingual model supporting speech-to-text recognition and translation.
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance
Record-setting accuracy and performance for English transcription.
State-of-the-art accuracy and speed for English transcriptions.
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.