Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
nvidianemoretriever-parse
Cutting-edge vision-language model exceling in retrieving text and metadata from images.

deepseek-aideepseek-r1-distill-qwen-32b
Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.

deepseek-aideepseek-r1-distill-qwen-14b
Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.

deepseek-aideepseek-r1-distill-qwen-7b
Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.

microsoftphi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

microsoftphi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

openaiwhisper-large-v3
Robust Speech Recognition via Large-Scale Weak Supervision.

nvidiacanary-1b-asr
Multi-lingual model supporting speech-to-text recognition and translation.

nvidiacanary-0.6b-turbo-asr
Multi-lingual model supporting speech-to-text recognition and translation.

mistralaimistral-small-24b-instruct
Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.

deepseek-aideepseek-r1
State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.

nvidiallama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

nvidianemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts

nvidiallama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs

igeniuscolosseum_355b_instruct_16k
NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry

tiiuaefalcon3-7b-instruct
Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities

igeniusitalia_10b_instruct_16k
Multilingual LLM with emphasis on European languages supporting regulated use cases including financial services, government, heavy industry

qwenqwen2.5-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.


nvidiacosmos-nemotron-34b
Multi-modal vision-language model that understands text/img/video and creates informative responses

nvidiacosmos-1.0-diffusion-7b
Generates physics-aware video world states from text and image prompts for physical AI development.

nvidiacosmos-1.0-autoregressive-5b
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

qwenqwen2.5-coder-32b-instruct
Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

qwenqwen2.5-coder-7b-instruct
Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.

writerpalmyra-creative-122b
Powerful LLM designed for creative thinking and writing.

nvidiallama-3.2-nv-embedqa-1b-v2
Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nvidiallama-3.2-nv-rerankqa-1b-v2
Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

metallama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling

university-at-buffalocached
Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

nvidianv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

nvidiaaudio2face-3d
Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

nvidiaconformer-ctc-asr
Automatic speech recognition model that transcribes speech in lower case English with record-setting accuracy and performance

nvidiafourcastnet
FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

hivedeepfake-image-detection
Advanced AI model detects faces and identifies deep fake images.

nvidiallama-3.2-nv-rerankqa-1b-v1
Efficiently refine retrieval results over multiple sources and languages.

nvidiallama-3.2-nv-embedqa-1b-v1
World-class multilingual and cross-lingual question-answering retrieval.

nvidianemotron-4-mini-hindi-4b-instruct
A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

ibmgranite-guardian-3.0-8b
Detects jailbreaking, bias, violence, profanity, sexual content, and unethical behavior

ibmgranite-3.0-8b-instruct
Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI

ibmgranite-3.0-3b-a800m-instruct
Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification

shutterstockedify-360-hdri
Shutterstock Generative 3D service for 360 HDRi generation. Trained on NVIDIA Edify using Shutterstock’s licensed creative libraries.

nvidiallama-3.1-nemotron-70b-instruct
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses.

zyphrazamba2-7b-instruct
Efficient hybrid state-space model designed for conversational and reasoning tasks.

institute-of-science-tokyollama-3.1-swallow-70b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.

institute-of-science-tokyollama-3.1-swallow-8b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.

nvidiastudiovoice
Enhance speech by correcting common audio degradations to create studio quality speech output.

nvidiamistral-nemo-minitron-8b-8k-instruct
State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

nvidiallama-3.1-nemotron-70b-reward
Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

metallama-3.2-3b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

metallama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.

metallama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.

metallama-3.2-1b-instruct
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

nvidiallama-3.1-nemotron-51b-instruct
Unique language model that delivers an unmatched accuracy-efficiency performance.

qwenqwen2-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

abacusaidracarys-llama-3.1-70b-instruct
Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

deepmindalphafold2-multimer
Predicts the 3D structure of a protein from its amino acid sequence.

nvidiaconsistory
Generates consistent characters across a series of images without requiring additional training.

hiveai-generated-image-detection
Robust image classification model for detecting and managing AI-generated content.


deepmindalphafold2
Predicts the 3D structure of a protein from its amino acid sequence.

yentinglinllama-3-taiwan-70b-instruct
Sovereign AI model finetuned on Traditional Mandarin and English data using the Llama-3 architecture.

tokyotech-llmllama-3-swallow-70b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.

nvidiaBuild A Generative Virtual Screening Pipeline
This blueprint shows how generative AI and accelerated NIM microservices can design optimized small molecules smarter and faster.

microsoftphi-3.5-vision-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from images.

ai21labsjamba-1.5-mini-instruct
Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

ai21labsjamba-1.5-large-instruct
Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

nvidianemotron-mini-4b-instruct
Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

nvidiamistral-nemo-minitron-8b-base
State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

microsoftphi-3.5-moe-instruct
Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation

microsoftphi-3.5-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

rakutenrakutenai-7b-instruct
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

rakutenrakutenai-7b-chat
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

nvidianv-grounding-dino
Grounding dino is an open vocabulary zero-shot object detection model.

nvidiaradtts-hifigan-tts
Natural, high-fidelity, English voices for personalizing text-to-speech services and voiceovers

nvidiamegatron-1b-nmt
Enable smooth global interactions in 36 languages.

nvidiafastpitch-hifigan-tts
Expressive and engaging English voices for Q&A assistants, brand ambassadors, and service robots

nvidiaparakeet-ctc-1.1b-asr
Record-setting accuracy and performance for English transcription.

nvidiaparakeet-ctc-0.6b-asr
State-of-the-art accuracy and speed for English transcriptions.

ipdproteinmpnn
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.

microsoftflorence-2
Vision foundation model capable of performing diverse computer vision and vision language tasks.

writerpalmyra-fin-70b-32k
Specialized LLM for financial analysis, reporting, and data processing

googleshieldgemma-9b
Guardrail model to ensure that responses from LLMs are appropriate and safe

googlegemma-2-2b-it
Advanced small language generative AI model for edge applications

Shutterstockedify-3d
Shutterstock Generative 3D service for 3D asset generation. Trained on NVIDIA Edify using Shutterstock’s licensed creative libraries

GettyImagesedify-image
Getty Images’ API service for 4K image generation. Trained on NVIDIA Edify using Getty Images' commercially safe creative libraries.

nvidiaeyecontact
Estimate gaze angles of a person in a video and redirect to make it frontal.

nvidiaaudio2face-2d
Create facial animations using a portrait photo and synchronize mouth movement with audio.

nvidiausdvalidate
Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.

thudmchatglm3-6b
Supports Chinese and English languages to handle tasks including chatbot, content generation, coding, and translation.

mistralaimamba-codestral-7b-v0.1
Model for writing and interacting with code across a wide range of programming languages and tasks.

baichuan-incbaichuan2-13b-chat
Support Chinese and English chat, coding, math, instruction following, solving quizzes

metallama-3.1-405b-instruct
Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks.

metallama-3.1-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.

metallama-3.1-8b-instruct
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

nv-mistralaimistral-nemo-12b-instruct
Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.

nvidianv-rerankqa-mistral-4b-v3
Multilingual text reranking model.

nvidianv-embedqa-e5-v5
English text embedding model for question-answering retrieval.

nvidianv-embedqa-mistral-7b-v2
Multilingual text question-answering retrieval, transforming textual information into dense vector representations.


microsoftphi-3-medium-128k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.

bigcodestarcoder2-7b
Advanced programming model for code completion, summarization, and generation

bigcodestarcoder2-15b
Advanced programming model for code completion, summarization, and generation

googlegemma-2-27b-it
Cutting-edge text generation model text understanding, transformation, and code generation.

googlegemma-2-9b-it
Cutting-edge text generation model text understanding, transformation, and code generation.

nvidiallama3-chatqa-1.5-70b
Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

nvidiallama3-chatqa-1.5-8b
Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

nvidianemotron-4-340b-reward
Grades responses on five attributes helpfulness, correctness, coherence, complexity and verbosity.

nvidianemotron-4-340b-instruct
Creates diverse synthetic data that mimics the characteristics of real-world data.

mistralaimistral-7b-instruct-v0.3
This LLM follows instructions, completes requests, and generates creative text.


stabilityaistable-diffusion-3-medium
Advanced text-to-image model for generating high quality images

writerpalmyra-med-70b-32k
Leading LLM for accurate, contextually relevant responses in the medical domain.

writerpalmyra-med-70b
Leading LLM for accurate, contextually relevant responses in the medical domain.

nvidianv-embed-v1
Generates high-quality numerical embeddings from text inputs.

upstagesolar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.

mediatekbreeze-7b-instruct
LLM for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese.

nvidiavisual-changenet
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

googlecodegemma-1.1-7b
Advanced programming model for code generation, completion, reasoning, and instruction following.

ibmgranite-34b-code-instruct
Software programming LLM for code generation, completion, explanation, and multi-turn conversion.

ibmgranite-8b-code-instruct
Software programming LLM for code generation, completion, explanation, and multi-turn conversion.

nvidiaretail-object-detection
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

ipdrfdiffusion
A generative model of protein backbones for protein binder design.

microsoftphi-3-small-8k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.

microsoftphi-3-small-128k-instruct
Long context cutting-edge lightweight open language model exceling in high-quality reasoning.

microsoftphi-3-medium-4k-instruct
Cutting-edge lightweight open language model exceling in high-quality reasoning.

microsoftphi-3-vision-128k-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from images.

aisingaporesea-lion-7b-instruct
LLM to represent and serve the linguistic and cultural diversity of Southeast Asia

microsoftphi-3-mini-4k-instruct
Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.

databricksdbrx-instruct
A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG.

snowflakearctic-embed-l
Optimized community model for text embedding.

microsoftphi-3-mini-128k-instruct
Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.

mistralaimixtral-8x22b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.

metallama3-70b-instruct
Powers complex conversations with superior contextual understanding, reasoning and text generation.

metallama3-8b-instruct
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

googlerecurrentgemma-2b
Novel recurrent architecture based language model for faster inference when generating long sequences.

googlecodegemma-7b
Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion.

nvidiaembed-qa-4
GPU-accelerated generation of text embeddings used for question-answering retrieval.

nvidiarerank-qa-mistral-4b
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

stabilityaistable-diffusion-xl
Generate images and stunning visuals with realistic aesthetics.


metacodellama-70b
LLM capable of generating code from natural language and vice versa.

mistralaimistral-7b-instruct-v0.2
This LLM follows instructions, completes requests, and generates creative text.

nvidiadeepvariant
Run Google's DeepVariant optimized for GPU. Switch models for high accuracy on all major sequencers.

stabilityaistable-video-diffusion
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.

stabilityaisdxl-turbo
A fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation



mistralaimixtral-8x7b-instruct-v0.1
An MOE LLM that follows instructions, completes requests, and generates creative text.
