NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: ASR
Sorting by Most Recent

nvidiaAmbient Healthcare Agents

Build advanced AI agents for providers and patients using this developer example powered by NeMo Microservices, NVIDIA Nemotron, Riva ASR and TTS, and NVIDIA LLM NIM

agent blueprintblueprintnimLaunchablenemollmNVIDIA AI

nvidiaparakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

ASRStreamingTaiwaneseSpeech-to-TextNVIDIA NIM

speakleashbielik-11b-v2.6-instruct

State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

PolishSovereign AIchatChatbotsSummarization

nvidiaparakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

ASRStreamingSpeech-to-TextMandarinNVIDIA NIM

nvidiaparakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

ASRStreamingSpeech-to-TextSpanishNVIDIA NIM

nvidiaparakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

ASRStreamingSpeech-to-TextVietnameseNVIDIA NIM

nvidiaparakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

ASREnglishNVIDIA NIMNVIDIA Rivaspeech-to-text

metallama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal SafetyContent SafetyGuardrailContent Moderator

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

nvidiaparakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

ASRStreamingSpeech-to-TextMultilingualNVIDIA NIM

openaiwhisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

ASRASTSpeech-to-TextbatchwhisperOpenAIMultilingualNVIDIA NIMNVIDIA Riva

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech RecognitionAutomatic Speech TranslationNVIDIA NIMNVIDIA Riva

baidupaddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Optical Character RecognitionTable ExtractionOptical Character Detectionnemo retrieverdata ingestionrun-on-rtxextraction

nvidiaparakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

ASRStreamingEnglishSpeech-to-TextbatchNVIDIA NIM

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

ASRStreamingEnglishBatchSpeech-to-TextFastNVIDIA NIMRun-on-RTX