NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Publisher
Use Case
NIM Type
Sorting by Most Recent

nvidiaparakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

asrstreamingspeech-to-textmandarinnvidia nimnvidia

nvidiaparakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

asrstreamingspeech-to-textspanishnvidia nimnvidia

nvidiaparakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

asrstreamingspeech-to-textvietnamesenvidia nimnvidia

nvidianvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

mambathinking budgetslmchatnanoreasoningthroughputagenticnvidia

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingsynthetic data generationautonomous vehiclesindustrialphysical aivision language modelreasoningroboticssmart citiesnvidia

nvidianemoretriever-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

optical character recognitiontable extractionnemo retrieverdata ingestionextractionnvidia

nvidiaparakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

asrenglishnvidia nimnvidia rivaspeech-to-textnvidia

nvidiallama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chatmathadvanced reasoninginstruction followingfunction callingnvidia

nvidiallama-3_2-nemoretriever-300m-embed-v1

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

retrieval augmented generationtext-to-embeddingnemo retrievernvidia

nvidianemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

optical character recognitiontable extractionnemo retrieverdata ingestionextractionnvidia

nvidiamagpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

ttstext-to-speechnvidia nimnvidia rivanvidia

nvidiariva-translate-4b-instruct

Translation model in 12 languages with few-shots example prompts capability.

text translationchatnvidia

nvidiariva-translate-1.6b

Enable smooth global interactions in 36 languages.

text translationneural machine translationnvidia nimnvidia

nvidiallama-3.2-nemoretriever-500m-rerank-v2

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

nemo retrieverretrieval augmented generationrerankingnvidia

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

synthetic data generationautonomous vehiclesphysical airoboticsvideo-to-worldnvidia

nvidiaBackground Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

nvidia maxinespeech-to-speechdigital humanspeech enhancementnvidia

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingretrieval augmented generationtext-to-embeddingnvidia

nvidiallama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligencemultiple image understandingocrnvidia

nvidiallama-3.1-nemotron-nano-4b-v1.1

State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents

edgetool callingchatreasoningmathnvidia

nvidiamagpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

ttstext-to-speechnvidia nimnvidia rivanvidia

nvidiaparakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

asrstreamingspeech-to-textmultilingualnvidia nimnvidia

nvidiallama-3.1-nemotron-ultra-253b-v1

Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.

chatmathadvanced reasoninginstruction followingfunction callingnvidia

nvidiacosmos-predict1-7b

Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.

synthetic data generationautonomous vehiclesphysical airoboticstext-to-worldimage-to-worldnvidia

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

synthetic data generationphysical aipolicy evaluationroboticsvideo-to-worldnvidia

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotivenvidia

nvidiabevformer

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

autonomous vehiclesbevautomotiveperceptionnvidia

nvidiallama-3.3-nemotron-super-49b-v1

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chatmathadvanced reasoninginstruction followingfunction callingnvidia

nvidiallama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

mathadvanced reasoninginstruction followingfunction callingnvidia

nvidiamagpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

ttstext-to-speechnvidia nimnvidia rivamultilingualnvidia

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverembeddingretrieval augmented generationnvidia

nvidianemoretriever-table-structure-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

object detectionchart detectionnemo retrievertable detectiondata ingestionrun-on-rtxnvidia

nvidianemoretriever-graphic-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

object detectionchart detectionnemo retrievertable detectiondata ingestionrun-on-rtxnvidia

nvidianemoretriever-page-elements-v2

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

object detectionchart detectionnemo retrievertable detectiondata ingestionrun-on-rtxnvidia

nvidianemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognitionnemo retrieverdata ingestiontable extractionsupported language - englishnvidia

openaiwhisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

asrastspeech-to-textbatchwhisperopenaimultilingualnvidia nimnvidia rivaopenai

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asraststreamingspeech-to-textbatchspanishmultilingualnvidia nimnvidia rivanvidia

nvidiacanary-0.6b-turbo-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastfastspeech-to-textbatchmultilingualnvidia nimnvidia rivanvidia

nvidiallama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

dialogue safetyllm safetyguard modelcontent safetynvidia

nvidianemoguard-jailbreak-detect

Industry leading jailbreak classification model for protection from adversarial attempts

llm securityjailbreak detectionprompt injectionnvidia nimnvidia

nvidiallama-3.1-nemoguard-8b-content-safety

Leading content safety model for enhancing the safety and moderation capabilities of LLMs

llm safetycontent moderationguard modelcontent safetynvidia

igeniuscolosseum_355b_instruct_16k

NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry

heavy industrygovernmentchathighly regulated use case supportfinancial servicesigenius

nvidiagenmol

Fragment-Based Molecular Generation by Discrete Diffusion.

chemistrynimbionemomolecule generationdrug discoverynvidia

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun-on-rtxembeddingretrieval augmented generationtext-to-embeddingnvidia

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverrun-on-rtxretrieval augmented generationrerankingnvidia

nvidiausdcode

State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.

openusdsynthetic data generationdigital twincode generationchatnvidia nimnvidia

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

object detectiondata ingestionchart detectionnemo retrievertable detectionrun-on-rtxextractionnvidia

nvidiaaudio2face-3d

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

speech-to-animationdigital humansaudio-to-facenvidia nimnvidia

nvidiaconformer-ctc-asr

Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance

asrstreamingspeech-to-textspanishnvidia nimnvidia rivanvidia

nvidiacorrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

ai weather predictionweather simulationearth-2nvidia

nvidiafourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

weather simulationai weather predictionclimate scienceearth-2nvidia

nvidianemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

indicchatchattext-to-textlanguage generationnvidia

nvidiallama-3.1-nemotron-70b-instruct

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses.

chatcode generationchattext-to-textlanguage generationnvidia

nvidiastudiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

nvidia maxinespeech-to-speechdigital humanrun-on-rtxspeech enhancementnvidia

nvidiallama-3.1-nemotron-70b-reward

Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

text-to-textreward modelrlhfnvidia

nvidiallama-3.1-nemotron-51b-instruct

Unique language model that delivers an unmatched accuracy-efficiency performance.

chatlanguage generationchattext-to-textnvidia

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

nvidianemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chattext-to-textlanguage generationnvidia

nvidiamistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generationtext-to-textchatsmall language modelnvidia

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

image-to-embeddingcomputer visiondeepstreamnvidia nimobject classificationnvidia

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

object detectioncomputer visiondeepstreamnvidia nimnvidia

nvidiamegatron-1b-nmt

Enable smooth global interactions in 36 languages.

text translationneural machine translationnvidia nimnvidia

nvidiaparakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

asrstreamingenglishspeech-to-textbatchnvidia nimnvidia

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

asrstreamingenglishbatchspeech-to-textfastnvidia nimrun-on-rtxnvidia

nvidiausdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

openusdsynthetic data generationdigital twinusdtext-to-3dnvidia nimnvidia

nvidiaeyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresencenvidia maxinedigital humannvidia

nvidiausdvalidate

Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.

validationopenusdsynthetic data generationdigital twinusdvisualization 3dnvidia

nvidianv-rerankqa-mistral-4b-v3

Multilingual text reranking model.

nemo retrieverrerankingretrieval augmented generationnvidia

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

embeddingrun-on-rtxretrieval augmented generationnemo retrievertext-to-embeddingnvidia

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverembeddingretrieval augmented generationnvidia

nvidiamaisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

image generationmedical imagingnvidia nimnvidia

nvidiallama3-chatqa-1.5-70b

Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

text-to-textchatnon-commercial use onlychatnvidia

nvidiallama3-chatqa-1.5-8b

Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

text-to-textchatnon-commercial use onlynvidia

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

computer visionmultimodal embeddingstext and imagenvidia nimrun-on-rtxnvidia

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

optical character recognitionimageoptical character detectioncvvlmcomputer visiontao toolkitvideonvidia

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

non-commercial use onlyretrieval augmented generationtext-to-embeddingnvidia

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageimage generationcvimage segmentationvlmcomputer visiontao toolkitvideonvidia nimnvidia

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

object detectionimagecvvlmcomputer visiontao toolkitvideonvidia nimnvidia

nvidiarerank-qa-mistral-4b

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

rankingretrieval augmented generationnvidia

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia

nvidiavista-3d

VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.

interactive annotationimage segmentationnon-commercial use onlymedical imagingnvidia

mistralaimistral-7b-instruct-v0.2

This LLM follows instructions, completes requests, and generates creative text.

chattext-to-textlanguage generationnvidia nimmistralai

nvidiamolmim

MolMIM performs controlled generation, finding molecules with the right properties.

chemistrynimbionemomolecule generationdrug discoverynvidia

nvidiacuopt

World-record accuracy and performance for complex route optimization.

route optimizationnvidia