NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

NVIDIA

nvidiastreampetr

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

autonomous vehiclesbevAV Stackautomotive

nvidianemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extractiondocument parsingsupported language - english

nvidianemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generationchatImage-to-Textvision assistantvisual question answering

nvidiallama-3.1-nemotron-safety-guard-8b-v3

Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs

content moderationllm safetymultilingual guard modelmultilingual content safetynemoguard

nvidiaparakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

ASRStreamingTaiwaneseSpeech-to-TextNVIDIA NIM

nvidiallama-3_2-nemoretriever-300m-embed-v2

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented GenerationText-to-EmbeddingNeMo Retriever

nvidiaparakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

ASRStreamingSpeech-to-TextMandarinNVIDIA NIM

nvidiaparakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

ASRStreamingSpeech-to-TextSpanishNVIDIA NIM

nvidiaparakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

ASRStreamingSpeech-to-TextVietnameseNVIDIA NIM

nvidianvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budgetchatreasoning

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingSynthetic Data Generationautonomous vehiclesindustrialPhysical AIvision language modelreasoningroboticssmart cities

nvidianemoretriever-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Optical Character RecognitionTable Extractionnemo retrieverdata ingestionextraction

nvidiaparakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

ASREnglishNVIDIA NIMNVIDIA Rivaspeech-to-text

nvidiallama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chatmathadvanced reasoninginstruction followingfunction calling

nvidiallama-3_2-nemoretriever-300m-embed-v1

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented GenerationText-to-EmbeddingNeMo Retriever

nvidianemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Optical Character RecognitionTable Extractionnemo retrieverdata ingestionextraction

nvidiamagpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

TTSText-to-SpeechNVIDIA NIMNVIDIA Riva

nvidiariva-translate-4b-instruct

Translation model in 12 languages with few-shots example prompts capability.

Text TranslationNeural machine translationNVIDIA NIM

nvidiariva-translate-1.6b

Enable smooth global interactions in 36 languages.

Text TranslationNeural machine translationNVIDIA NIM

nvidiallama-3.2-nemoretriever-500m-rerank-v2

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

nemo retrieverRetrieval Augmented Generationreranking

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data GenerationAutonomous VehiclesPhysical AIroboticsvideo-to-world

nvidiaBackground Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

Nvidia MaxineSpeech-to-speechDigital HumanSpeech Enhancement

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

nvidiallama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligencechatmultiple image understandingOCR

nvidiallama-3.1-nemotron-nano-4b-v1.1

State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents

edgetool callingchatreasoningmath

nvidiamagpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

TTSText-to-SpeechNVIDIA NIMNVIDIA Riva

nvidiaparakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

ASRStreamingSpeech-to-TextMultilingualNVIDIA NIM

nvidiallama-3.1-nemotron-ultra-253b-v1

Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.

chatmathadvanced reasoninginstruction followingfunction calling

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data GenerationPhysical AIpolicy evaluationroboticsvideo-to-world

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotive

nvidiabevformer

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

autonomous vehiclesbevautomotiveperception

nvidiallama-3.3-nemotron-super-49b-v1

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chatmathadvanced reasoninginstruction followingfunction calling

nvidiallama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

chatmathadvanced reasoninginstruction followingfunction calling

nvidiamagpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

TTSText-to-SpeechNVIDIA NIMNVIDIA Rivamultilingual

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverEmbeddingRetrieval Augmented Generation

nvidianemoretriever-table-structure-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

nvidianemoretriever-graphic-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

nvidianemoretriever-page-elements-v2

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

nvidianemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognitionnemo retrieverdata ingestiontable extractionsupported language - english

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech RecognitionAutomatic Speech TranslationNVIDIA NIMNVIDIA Riva

nvidiallama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

Dialogue SafetyLLM safetyGuard ModelContent safety

nvidianemoguard-jailbreak-detect

Industry leading jailbreak classification model for protection from adversarial attempts

LLM SecurityJailbreak DetectionPrompt InjectionNVIDIA NIM

nvidiallama-3.1-nemoguard-8b-content-safety

Leading content safety model for enhancing the safety and moderation capabilities of LLMs

LLM safetycontent moderationGuard modelContent safety

nvidiagenmol

Fragment-Based Molecular Generation by Discrete Diffusion.

ChemistrynimBioNemoMolecule GenerationDrug Discovery

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun-on-rtxembeddingRetrieval Augmented GenerationText-to-Embedding

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverrun-on-rtxRetrieval Augmented Generationreranking

nvidiausdcode

State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.

OpenUSDSynthetic Data GenerationDigital TwinchatCode Generation

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionData ingestionChart Detectionnemo retrieverTable Detectionrun-on-rtxextraction

nvidiaaudio2face-3d

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

Speech-to-AnimationDigital HumansAudio-to-FaceNVIDIA NIM

nvidiacorrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

AI Weather predictionWeather SimulationEarth-2

nvidiafourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

Weather SimulationAI Weather PredictionClimate scienceEarth-2

nvidianemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

IndicchatText-to-TextLanguage Generation

nvidiastudiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

Nvidia MaxineSpeech-to-speechDigital HumanRun-on-RTXSpeech Enhancement

nvidiallama-3.1-nemotron-70b-reward

Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

Text-to-textReward ModelRLHF

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

nvidianemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chatText-to-TextLanguage Generation

nvidiamistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generationtext-to-textchatsmall language model

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embeddingcomputer visiondeepstreamNVIDIA NIMobject Classification

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detectioncomputer visiondeepstreamNVIDIA NIM

nvidiamegatron-1b-nmt

Enable smooth global interactions in 36 languages.

Text TranslationNeural machine translationNVIDIA NIM

nvidiaparakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

ASRStreamingEnglishSpeech-to-TextbatchNVIDIA NIM

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

ASRStreamingEnglishBatchSpeech-to-TextFastNVIDIA NIMRun-on-RTX

nvidiausdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

OpenUSDSynthetic Data GenerationDigital TwinUSDText-to-3D

nvidiaeyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresenceNvidia MaxineDigital Human

nvidiausdvalidate

Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.

ValidationOpenUSDSynthetic Data GenerationDigital TwinUSDVisualization 3D

nvidianv-rerankqa-mistral-4b-v3

Multilingual text reranking model.

nemo retrieverRerankingRetrieval Augmented Generation

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embeddingrun-on-rtxRetrieval Augmented GenerationNemo retrieverText-to-Embedding

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverEmbeddingRetrieval Augmented Generation

nvidiamaisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

Image GenerationMedical ImagingNVIDIA NIM

nvidiallama3-chatqa-1.5-8b

Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

text-to-textchatNon-Commercial Use Only

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer visionmultimodal embeddingstext and imageRun-on-rtx

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use OnlyRetrieval Augmented GenerationText-to-Embedding

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiarerank-qa-mistral-4b

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

RankingRetrieval Augmented Generation

nvidiavista-3d

VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.

Interactive AnnotationImage SegmentationNon-Commercial Use OnlyMedical Imaging

nvidiamolmim

MolMIM performs controlled generation, finding molecules with the right properties.

ChemistrynimBioNemoMolecule GenerationDrug Discovery

nvidiacuopt

World-record accuracy and performance for complex route optimization.

Route Optimization