NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: ASR
Sorting by Most Recent

metallama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

llm multimodal safetycontent safetyguardrailcontent moderatormeta

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingretrieval augmented generationtext-to-embeddingnvidia

speakleashbielik-11b-v2.3-instruct

State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

polishsovereign aichatchatbotssummarizationspeakleash

nvidiaparakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

asrstreamingspeech-to-textmultilingualnvidia nimnvidia

nvidiacosmos-predict1-7b

Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.

synthetic data generationautonomous vehiclesphysical airoboticstext-to-worldimage-to-worldnvidia

openaiwhisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

asrastspeech-to-textbatchwhisperopenaimultilingualnvidia nimnvidia rivaopenai

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asraststreamingspeech-to-textbatchspanishmultilingualnvidia nimnvidia rivanvidia

nvidiacanary-0.6b-turbo-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastfastspeech-to-textbatchmultilingualnvidia nimnvidia rivanvidia

baidupaddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

optical character recognitiontable extractionoptical character detectionnemo retrieverdata ingestionrun-on-rtxextractionbaidu

nvidiaconformer-ctc-asr

Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance

asrstreamingspeech-to-textspanishnvidia nimnvidia rivanvidia

nvidiaparakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

asrstreamingenglishspeech-to-textbatchnvidia nimnvidia

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

asrstreamingenglishbatchspeech-to-textfastnvidia nimrun-on-rtxnvidia

stabilityaistable-video-diffusion

Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.

image generationtext-to-imagestabilityai