# Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

- [Active Speaker Detection](/qc69jvmznzxy/active-speaker-detection.md) — Detect and track speaker identities across video frames.
- [alphafold2](/qc69jvmznzxy/alphafold2.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [alphafold2-multimer](/qc69jvmznzxy/alphafold2-multimer.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [Background Noise Removal](/qc69jvmznzxy/bnr.md) — Removes unwanted noises from audio improving speech intelligibility.
- [bevformer](/qc69jvmznzxy/bevformer.md) — Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
- [bge-m3](/qc69jvmznzxy/bge-m3.md) — Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
- [Boltz-2](/qc69jvmznzxy/boltz2.md) — Predict complex structures using Boltz-2.
- [canary-1b-asr](/qc69jvmznzxy/canary-1b-asr.md) — Multi-lingual model supporting speech-to-text recognition and translation.
- [chatterbox-multilingual-tts](/qc69jvmznzxy/chatterbox-multilingual-tts.md) — Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
- [conformer-ctc-asr](/qc69jvmznzxy/conformer-ctc-asr.md) — Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
- [cosmos-reason2-8b](/qc69jvmznzxy/cosmos-reason2-8b.md) — Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
- [cosmos-transfer1-7b](/qc69jvmznzxy/cosmos-transfer1-7b.md) — Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
- [cosmos-transfer2.5-2b](/qc69jvmznzxy/cosmos-transfer2_5-2b.md) — Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
- [cosmos3-nano](/qc69jvmznzxy/cosmos3-nano.md) — Generates physics-aware videos from text prompts or an image prompt for physical AI development.
- [cosmos3-nano-reasoner](/qc69jvmznzxy/cosmos3-nano-reasoner.md) — Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
- [cuopt](/qc69jvmznzxy/nvidia-cuopt.md) — World-record accuracy and performance for complex route optimization.
- [deepseek-v4-flash](/qc69jvmznzxy/deepseek-v4-flash.md) — DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
- [deepseek-v4-pro](/qc69jvmznzxy/deepseek-v4-pro.md) — DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
- [diffdock](/qc69jvmznzxy/diffdock.md) — Predicts the 3D structure of how a molecule interacts with a protein.
- [diffusiongemma-26b-a4b-it](/qc69jvmznzxy/diffusiongemma-26b-a4b-it.md) — Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
- [dracarys-llama-3.1-70b-instruct](/qc69jvmznzxy/dracarys-llama-3_1-70b-instruct.md) — Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
- [esm2-650m](/qc69jvmznzxy/esm2-650m.md) — Generates embeddings of proteins from their amino acid sequences.
- [esmfold](/qc69jvmznzxy/esmfold.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [evo2-40b](/qc69jvmznzxy/evo2-40b.md) — Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
- [eyecontact](/qc69jvmznzxy/eyecontact.md) — Estimate gaze angles of a person in a video and redirect to make it frontal.
- [fidelity](/qc69jvmznzxy/fidelity.md) — Run computational-fluid dynamics (CFD) simulations
- [fluent](/qc69jvmznzxy/fluent.md) — Run computational-fluid dynamics (CFD) simulations
- [FLUX.1-dev](/qc69jvmznzxy/flux_1-dev.md) — FLUX.1 is a state-of-the-art suite of image generation models
- [FLUX.1-Kontext-dev](/qc69jvmznzxy/flux_1-kontext-dev.md) — FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
- [FLUX.1-schnell](/qc69jvmznzxy/flux_1-schnell.md) — FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
- [flux.2-klein-4b](/qc69jvmznzxy/flux_2-klein-4b.md) — FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
- [fourcastnet](/qc69jvmznzxy/fourcastnet.md) — FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
- [gemma-2-2b-it](/qc69jvmznzxy/gemma-2-2b-it.md) — Advanced small language generative AI model for edge applications
- [gemma-3n-e2b-it](/qc69jvmznzxy/gemma-3n-e2b-it.md) — An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
- [gemma-3n-e4b-it](/qc69jvmznzxy/gemma-3n-e4b-it.md) — An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
- [gemma-4-31b-it](/qc69jvmznzxy/gemma-4-31b-it.md) — Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
- [genmol](/qc69jvmznzxy/genmol-generate.md) — Fragment-Based Molecular Generation by Discrete Diffusion.
- [gliner-pii](/qc69jvmznzxy/gliner-pii.md) — GLiNER PII detects Personally Identifiable Information in text.
- [glm-5.1](/qc69jvmznzxy/glm-5.1.md) — GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
- [gpt-oss-120b](/qc69jvmznzxy/gpt-oss-120b.md) — Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
- [gpt-oss-20b](/qc69jvmznzxy/gpt-oss-20b.md) — Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
- [ising-calibration-1-35b-a3b](/qc69jvmznzxy/ising-calibration-1-35b-a3b.md) — Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
- [kimi-k2.6](/qc69jvmznzxy/kimi-k2.6.md) — 1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
- [LipSync](/qc69jvmznzxy/lipsync.md) — Generative lip dubbing that syncs lips in a video to input audio.
- [llama-3.1-70b-instruct](/qc69jvmznzxy/llama-3_1-70b-instruct.md) — Powers complex conversations with superior contextual understanding, reasoning and text generation.
- [llama-3.1-8b-instruct](/qc69jvmznzxy/llama-3_1-8b-instruct.md) — Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
- [llama-3.1-nemoguard-8b-content-safety](/qc69jvmznzxy/llama-3_1-nemoguard-8b-content-safety.md) — Leading content safety model for enhancing the safety and moderation capabilities of LLMs
- [llama-3.1-nemoguard-8b-topic-control](/qc69jvmznzxy/llama-3_1-nemoguard-8b-topic-control.md) — Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
- [llama-3.1-nemotron-nano-8b-v1](/qc69jvmznzxy/llama-3_1-nemotron-nano-8b-v1.md) — Leading reasoning and agentic AI accuracy model for PC and edge.
- [llama-3.1-nemotron-nano-vl-8b-v1](/qc69jvmznzxy/llama-3.1-nemotron-nano-vl-8b-v1.md) — Multi-modal vision-language model that understands text/img and creates informative responses
- [llama-3.1-nemotron-safety-guard-8b-v3](/qc69jvmznzxy/llama-3_1-nemotron-safety-guard-8b-v3.md) — Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
- [llama-3.2-11b-vision-instruct](/qc69jvmznzxy/llama-3.2-11b-vision-instruct.md) — Cutting-edge vision-language model exceling in high-quality reasoning from images.
- [llama-3.2-1b-instruct](/qc69jvmznzxy/llama-3.2-1b-instruct.md) — Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
- [llama-3.2-3b-instruct](/qc69jvmznzxy/llama-3.2-3b-instruct.md) — Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
- [llama-3.2-90b-vision-instruct](/qc69jvmznzxy/llama-3.2-90b-vision-instruct.md) — Cutting-edge vision-Language model exceling in high-quality reasoning from images.
- [llama-3.3-70b-instruct](/qc69jvmznzxy/llama-3_3-70b-instruct.md) — Advanced LLM for reasoning, math, general knowledge, and function calling
- [llama-3.3-nemotron-super-49b-v1](/qc69jvmznzxy/llama-3_3-nemotron-super-49b-v1.md) — High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
- [llama-3.3-nemotron-super-49b-v1.5](/qc69jvmznzxy/llama-3_3-nemotron-super-49b-v1_5.md) — High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
- [llama-4-maverick-17b-128e-instruct](/qc69jvmznzxy/llama-4-maverick-17b-128e-instruct.md) — A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
- [llama-guard-4-12b](/qc69jvmznzxy/llama-guard-4-12b.md) — Multi-modal model to classify safety for input prompts as well output responses.
- [llama-nemotron-embed-1b-v2](/qc69jvmznzxy/llama-nemotron-embed-1b-v2.md) — Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
- [llama-nemotron-embed-vl-1b-v2](/qc69jvmznzxy/llama-nemotron-embed-vl-1b-v2.md) — Multimodal question-answer retrieval representing user queries as text and documents as images.
- [llama-nemotron-rerank-1b-v2](/qc69jvmznzxy/llama-nemotron-rerank-1b-v2.md) — GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
- [llama-nemotron-rerank-vl-1b-v2](/qc69jvmznzxy/llama-nemotron-rerank-vl-1b-v2.md) — GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
- [magpie-tts-multilingual](/qc69jvmznzxy/magpie-tts-multilingual.md) — Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
- [magpie-tts-zeroshot](/qc69jvmznzxy/magpie-tts-zeroshot.md) — Expressive and engaging text-to-speech, generated from a short audio sample.
- [megatron-1b-nmt](/qc69jvmznzxy/megatron-1b-nmt.md) — Enable smooth global interactions in 36 languages.
- [minimax-m2.7](/qc69jvmznzxy/minimax-m2.7.md) — MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
- [minimax-m3](/qc69jvmznzxy/minimax-m3.md) — MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
- [ministral-14b-instruct-2512](/qc69jvmznzxy/ministral-14b-instruct-2512.md) — A general purpose VLM ideal for chat and instruction based use cases
- [mistral-large-3-675b-instruct-2512](/qc69jvmznzxy/mistral-large-3-675b-instruct-2512.md) — A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
- [mistral-medium-3.5-128b](/qc69jvmznzxy/mistral-medium-3.5-128b.md) — A high performing model for text generation, coding and agentic use cases
- [mistral-nemotron](/qc69jvmznzxy/mistral-nemotron.md) — Built for agentic workflows, this model excels in coding, instruction following, and function calling
- [mistral-small-4-119b-2603](/qc69jvmznzxy/mistral-small-4-119b-2603.md) — Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
- [mixtral-8x7b-instruct-v0.1](/qc69jvmznzxy/mixtral-8x7b-instruct.md) — An MOE LLM that follows instructions, completes requests, and generates creative text.
- [molmim](/qc69jvmznzxy/molmim-generate.md) — MolMIM performs controlled generation, finding molecules with the right properties.
- [msa-search](/qc69jvmznzxy/msa-search.md) — Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
- [nemoguard-jailbreak-detect](/qc69jvmznzxy/nemoguard-jailbreak-detect.md) — Industry leading jailbreak classification model for protection from adversarial attempts
- [nemoretriever-ocr](/qc69jvmznzxy/nemoretriever-ocr.md) — Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
- [nemoretriever-page-elements-v2](/qc69jvmznzxy/nemoretriever-page-elements-v2.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemoretriever-parse](/qc69jvmznzxy/nemoretriever-parse.md) — Cutting-edge vision-language model exceling in retrieving text and metadata from images.
- [nemotron-3-content-safety](/qc69jvmznzxy/nemotron-3-content-safety.md) — Multilingual, multimodal model for detecting unsafe and toxic content.
- [nemotron-3-nano-30b-a3b](/qc69jvmznzxy/nemotron-3-nano-30b-a3b.md) — Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
- [nemotron-3-nano-omni-30b-a3b-reasoning](/qc69jvmznzxy/nemotron-3-nano-omni-30b-a3b-reasoning.md) — Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
- [nemotron-3-super-120b-a12b](/qc69jvmznzxy/nemotron-3-super-120b-a12b.md) — Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
- [nemotron-3-ultra-550b-a55b](/qc69jvmznzxy/nemotron-3-ultra-550b-a55b.md) — Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
- [nemotron-3.5-content-safety](/qc69jvmznzxy/nemotron-3.5-content-safety.md) — Multilingual, multimodal model for detecting unsafe and toxic content.
- [nemotron-asr-streaming](/qc69jvmznzxy/nemotron-asr-streaming.md) — Real-time speech recognition for English
- [nemotron-content-safety-reasoning-4b](/qc69jvmznzxy/nemotron-content-safety-reasoning-4b.md) — A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
- [nemotron-graphic-elements-v1](/qc69jvmznzxy/nemotron-graphic-elements-v1.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-mini-4b-instruct](/qc69jvmznzxy/nemotron-mini-4b-instruct.md) — Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
- [nemotron-nano-12b-v2-vl](/qc69jvmznzxy/nemotron-nano-12b-v2-vl.md) — Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
- [nemotron-ocr-v1](/qc69jvmznzxy/nemotron-ocr-v1.md) — Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
- [nemotron-ocr-v2](/qc69jvmznzxy/nemotron-ocr-v2.md) — Nemotron OCR v2 is a state-of-the-art multilingual text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images.
- [nemotron-page-elements-v3](/qc69jvmznzxy/nemotron-page-elements-v3.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-parse](/qc69jvmznzxy/nemotron-parse.md) — Cutting-edge vision-language model exceling in retrieving text and metadata from images.
- [nemotron-table-structure-v1](/qc69jvmznzxy/nemotron-table-structure-v1.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-voicechat](/qc69jvmznzxy/nemotron-voicechat.md) — Nemotron 3 Voicechat
- [nv-embed-v1](/qc69jvmznzxy/nv-embed-v1.md) — Generates high-quality numerical embeddings from text inputs.
- [nv-embedcode-7b-v1](/qc69jvmznzxy/nv-embedcode-7b-v1.md) — The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
- [Active Speaker Detection](/qc69jvmznzxy/active-speaker-detection.md) — Detect and track speaker identities across video frames.
- [alphafold2](/qc69jvmznzxy/alphafold2.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [alphafold2-multimer](/qc69jvmznzxy/alphafold2-multimer.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [Background Noise Removal](/qc69jvmznzxy/bnr.md) — Removes unwanted noises from audio improving speech intelligibility.
- [bevformer](/qc69jvmznzxy/bevformer.md) — Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.
- [bge-m3](/qc69jvmznzxy/bge-m3.md) — Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
- [Boltz-2](/qc69jvmznzxy/boltz2.md) — Predict complex structures using Boltz-2.
- [canary-1b-asr](/qc69jvmznzxy/canary-1b-asr.md) — Multi-lingual model supporting speech-to-text recognition and translation.
- [chatterbox-multilingual-tts](/qc69jvmznzxy/chatterbox-multilingual-tts.md) — Natural and expressive voices in 23 languages. For voice agents and brand ambassadors.
- [conformer-ctc-asr](/qc69jvmznzxy/conformer-ctc-asr.md) — Automatic speech recognition model that transcribes speech in lower case Spanish with record-setting accuracy and performance
- [cosmos-reason2-8b](/qc69jvmznzxy/cosmos-reason2-8b.md) — Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
- [cosmos-transfer1-7b](/qc69jvmznzxy/cosmos-transfer1-7b.md) — Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
- [cosmos-transfer2.5-2b](/qc69jvmznzxy/cosmos-transfer2_5-2b.md) — Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
- [cosmos3-nano](/qc69jvmznzxy/cosmos3-nano.md) — Generates physics-aware videos from text prompts or an image prompt for physical AI development.
- [cosmos3-nano-reasoner](/qc69jvmznzxy/cosmos3-nano-reasoner.md) — Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
- [cuopt](/qc69jvmznzxy/nvidia-cuopt.md) — World-record accuracy and performance for complex route optimization.
- [deepseek-v4-flash](/qc69jvmznzxy/deepseek-v4-flash.md) — DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
- [deepseek-v4-pro](/qc69jvmznzxy/deepseek-v4-pro.md) — DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
- [diffdock](/qc69jvmznzxy/diffdock.md) — Predicts the 3D structure of how a molecule interacts with a protein.
- [diffusiongemma-26b-a4b-it](/qc69jvmznzxy/diffusiongemma-26b-a4b-it.md) — Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps
- [dracarys-llama-3.1-70b-instruct](/qc69jvmznzxy/dracarys-llama-3_1-70b-instruct.md) — Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.
- [esm2-650m](/qc69jvmznzxy/esm2-650m.md) — Generates embeddings of proteins from their amino acid sequences.
- [esmfold](/qc69jvmznzxy/esmfold.md) — Predicts the 3D structure of a protein from its amino acid sequence.
- [evo2-40b](/qc69jvmznzxy/evo2-40b.md) — Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.
- [eyecontact](/qc69jvmznzxy/eyecontact.md) — Estimate gaze angles of a person in a video and redirect to make it frontal.
- [fidelity](/qc69jvmznzxy/fidelity.md) — Run computational-fluid dynamics (CFD) simulations
- [fluent](/qc69jvmznzxy/fluent.md) — Run computational-fluid dynamics (CFD) simulations
- [FLUX.1-dev](/qc69jvmznzxy/flux_1-dev.md) — FLUX.1 is a state-of-the-art suite of image generation models
- [FLUX.1-Kontext-dev](/qc69jvmznzxy/flux_1-kontext-dev.md) — FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
- [FLUX.1-schnell](/qc69jvmznzxy/flux_1-schnell.md) — FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
- [flux.2-klein-4b](/qc69jvmznzxy/flux_2-klein-4b.md) — FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
- [fourcastnet](/qc69jvmznzxy/fourcastnet.md) — FourCastNet predicts global atmospheric dynamics of various weather / climate variables.
- [gemma-2-2b-it](/qc69jvmznzxy/gemma-2-2b-it.md) — Advanced small language generative AI model for edge applications
- [gemma-3n-e2b-it](/qc69jvmznzxy/gemma-3n-e2b-it.md) — An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
- [gemma-3n-e4b-it](/qc69jvmznzxy/gemma-3n-e4b-it.md) — An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
- [gemma-4-31b-it](/qc69jvmznzxy/gemma-4-31b-it.md) — Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
- [genmol](/qc69jvmznzxy/genmol-generate.md) — Fragment-Based Molecular Generation by Discrete Diffusion.
- [gliner-pii](/qc69jvmznzxy/gliner-pii.md) — GLiNER PII detects Personally Identifiable Information in text.
- [glm-5.1](/qc69jvmznzxy/glm-5.1.md) — GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
- [gpt-oss-120b](/qc69jvmznzxy/gpt-oss-120b.md) — Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
- [gpt-oss-20b](/qc69jvmznzxy/gpt-oss-20b.md) — Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
- [ising-calibration-1-35b-a3b](/qc69jvmznzxy/ising-calibration-1-35b-a3b.md) — Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
- [kimi-k2.6](/qc69jvmznzxy/kimi-k2.6.md) — 1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
- [LipSync](/qc69jvmznzxy/lipsync.md) — Generative lip dubbing that syncs lips in a video to input audio.
- [llama-3.1-70b-instruct](/qc69jvmznzxy/llama-3_1-70b-instruct.md) — Powers complex conversations with superior contextual understanding, reasoning and text generation.
- [llama-3.1-8b-instruct](/qc69jvmznzxy/llama-3_1-8b-instruct.md) — Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
- [llama-3.1-nemoguard-8b-content-safety](/qc69jvmznzxy/llama-3_1-nemoguard-8b-content-safety.md) — Leading content safety model for enhancing the safety and moderation capabilities of LLMs
- [llama-3.1-nemoguard-8b-topic-control](/qc69jvmznzxy/llama-3_1-nemoguard-8b-topic-control.md) — Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
- [llama-3.1-nemotron-nano-8b-v1](/qc69jvmznzxy/llama-3_1-nemotron-nano-8b-v1.md) — Leading reasoning and agentic AI accuracy model for PC and edge.
- [llama-3.1-nemotron-nano-vl-8b-v1](/qc69jvmznzxy/llama-3.1-nemotron-nano-vl-8b-v1.md) — Multi-modal vision-language model that understands text/img and creates informative responses
- [llama-3.1-nemotron-safety-guard-8b-v3](/qc69jvmznzxy/llama-3_1-nemotron-safety-guard-8b-v3.md) — Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
- [llama-3.2-11b-vision-instruct](/qc69jvmznzxy/llama-3.2-11b-vision-instruct.md) — Cutting-edge vision-language model exceling in high-quality reasoning from images.
- [llama-3.2-1b-instruct](/qc69jvmznzxy/llama-3.2-1b-instruct.md) — Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
- [llama-3.2-3b-instruct](/qc69jvmznzxy/llama-3.2-3b-instruct.md) — Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
- [llama-3.2-90b-vision-instruct](/qc69jvmznzxy/llama-3.2-90b-vision-instruct.md) — Cutting-edge vision-Language model exceling in high-quality reasoning from images.
- [llama-3.3-70b-instruct](/qc69jvmznzxy/llama-3_3-70b-instruct.md) — Advanced LLM for reasoning, math, general knowledge, and function calling
- [llama-3.3-nemotron-super-49b-v1](/qc69jvmznzxy/llama-3_3-nemotron-super-49b-v1.md) — High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
- [llama-3.3-nemotron-super-49b-v1.5](/qc69jvmznzxy/llama-3_3-nemotron-super-49b-v1_5.md) — High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
- [llama-4-maverick-17b-128e-instruct](/qc69jvmznzxy/llama-4-maverick-17b-128e-instruct.md) — A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
- [llama-guard-4-12b](/qc69jvmznzxy/llama-guard-4-12b.md) — Multi-modal model to classify safety for input prompts as well output responses.
- [llama-nemotron-embed-1b-v2](/qc69jvmznzxy/llama-nemotron-embed-1b-v2.md) — Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
- [llama-nemotron-embed-vl-1b-v2](/qc69jvmznzxy/llama-nemotron-embed-vl-1b-v2.md) — Multimodal question-answer retrieval representing user queries as text and documents as images.
- [llama-nemotron-rerank-1b-v2](/qc69jvmznzxy/llama-nemotron-rerank-1b-v2.md) — GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
- [llama-nemotron-rerank-vl-1b-v2](/qc69jvmznzxy/llama-nemotron-rerank-vl-1b-v2.md) — GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
- [magpie-tts-multilingual](/qc69jvmznzxy/magpie-tts-multilingual.md) — Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
- [magpie-tts-zeroshot](/qc69jvmznzxy/magpie-tts-zeroshot.md) — Expressive and engaging text-to-speech, generated from a short audio sample.
- [megatron-1b-nmt](/qc69jvmznzxy/megatron-1b-nmt.md) — Enable smooth global interactions in 36 languages.
- [minimax-m2.7](/qc69jvmznzxy/minimax-m2.7.md) — MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
- [minimax-m3](/qc69jvmznzxy/minimax-m3.md) — MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.
- [ministral-14b-instruct-2512](/qc69jvmznzxy/ministral-14b-instruct-2512.md) — A general purpose VLM ideal for chat and instruction based use cases
- [mistral-large-3-675b-instruct-2512](/qc69jvmznzxy/mistral-large-3-675b-instruct-2512.md) — A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
- [mistral-medium-3.5-128b](/qc69jvmznzxy/mistral-medium-3.5-128b.md) — A high performing model for text generation, coding and agentic use cases
- [mistral-nemotron](/qc69jvmznzxy/mistral-nemotron.md) — Built for agentic workflows, this model excels in coding, instruction following, and function calling
- [mistral-small-4-119b-2603](/qc69jvmznzxy/mistral-small-4-119b-2603.md) — Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
- [mixtral-8x7b-instruct-v0.1](/qc69jvmznzxy/mixtral-8x7b-instruct.md) — An MOE LLM that follows instructions, completes requests, and generates creative text.
- [molmim](/qc69jvmznzxy/molmim-generate.md) — MolMIM performs controlled generation, finding molecules with the right properties.
- [msa-search](/qc69jvmznzxy/msa-search.md) — Generates a multiple sequence alignment from a query sequence and a protein sequence database search.
- [nemoguard-jailbreak-detect](/qc69jvmznzxy/nemoguard-jailbreak-detect.md) — Industry leading jailbreak classification model for protection from adversarial attempts
- [nemoretriever-ocr](/qc69jvmznzxy/nemoretriever-ocr.md) — Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
- [nemoretriever-page-elements-v2](/qc69jvmznzxy/nemoretriever-page-elements-v2.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemoretriever-parse](/qc69jvmznzxy/nemoretriever-parse.md) — Cutting-edge vision-language model exceling in retrieving text and metadata from images.
- [nemotron-3-content-safety](/qc69jvmznzxy/nemotron-3-content-safety.md) — Multilingual, multimodal model for detecting unsafe and toxic content.
- [nemotron-3-nano-30b-a3b](/qc69jvmznzxy/nemotron-3-nano-30b-a3b.md) — Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
- [nemotron-3-nano-omni-30b-a3b-reasoning](/qc69jvmznzxy/nemotron-3-nano-omni-30b-a3b-reasoning.md) — Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
- [nemotron-3-super-120b-a12b](/qc69jvmznzxy/nemotron-3-super-120b-a12b.md) — Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
- [nemotron-3-ultra-550b-a55b](/qc69jvmznzxy/nemotron-3-ultra-550b-a55b.md) — Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
- [nemotron-3.5-content-safety](/qc69jvmznzxy/nemotron-3.5-content-safety.md) — Multilingual, multimodal model for detecting unsafe and toxic content.
- [nemotron-asr-streaming](/qc69jvmznzxy/nemotron-asr-streaming.md) — Real-time speech recognition for English
- [nemotron-content-safety-reasoning-4b](/qc69jvmznzxy/nemotron-content-safety-reasoning-4b.md) — A context‑aware safety model that applies reasoning to enforce domain‑specific policies.
- [nemotron-graphic-elements-v1](/qc69jvmznzxy/nemotron-graphic-elements-v1.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-mini-4b-instruct](/qc69jvmznzxy/nemotron-mini-4b-instruct.md) — Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
- [nemotron-nano-12b-v2-vl](/qc69jvmznzxy/nemotron-nano-12b-v2-vl.md) — Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
- [nemotron-ocr-v1](/qc69jvmznzxy/nemotron-ocr-v1.md) — Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
- [nemotron-ocr-v2](/qc69jvmznzxy/nemotron-ocr-v2.md) — Nemotron OCR v2 is a state-of-the-art multilingual text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images.
- [nemotron-page-elements-v3](/qc69jvmznzxy/nemotron-page-elements-v3.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-parse](/qc69jvmznzxy/nemotron-parse.md) — Cutting-edge vision-language model exceling in retrieving text and metadata from images.
- [nemotron-table-structure-v1](/qc69jvmznzxy/nemotron-table-structure-v1.md) — Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
- [nemotron-voicechat](/qc69jvmznzxy/nemotron-voicechat.md) — Nemotron 3 Voicechat
- [nv-embed-v1](/qc69jvmznzxy/nv-embed-v1.md) — Generates high-quality numerical embeddings from text inputs.
- [nv-embedcode-7b-v1](/qc69jvmznzxy/nv-embedcode-7b-v1.md) — The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

_100 of 144 shown. Fetch /models.md?page=2 for the next 100._