Try NVIDIA NIM APIs

An AI-powered, multi-agent system designed to optimize warehouse operations through intelligent automation, real-time monitoring, and natural language interaction.

blueprint nemo retriever nim Launchable Retrieval-Augmented Generation NVIDIA AI

nvidia Retail Catalog Enrichment

A GenAI system that enhances and localizes product catalogs with rich text content and imagery.

blueprint nim Launchable NVIDIA AI

nvidia cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

nvidia nemoretriever-page-elements-v3

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Chart Detection nemo retriever Table Detection data ingestion

nvidia nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

MoE Reasoning chat Long Context Instruction Following

nvidia riva-translate-4b-instruct-v1_1

Translation model in 12 languages with few-shots example prompts capability.

nvidia nim Text Translation neural machine translation

nvidia AI Model Distillation for Financial Data

Distill and deploy domain-specific AI models from unstructured financial data to generate market signals efficiently—scaling your workflow with the NVIDIA Data Flywheel Blueprint for high-performance, cost-efficient experimentation.

blueprint developer example nim nvidia ai Launchable Nemotron algorithmic trading llm financial services data flywheel

nvidia Quantitative Portfolio Optimization

Enable fast, scalable, and real-time portfolio optimization for financial institutions.

developer example Launchable Blueprint cuopt portfolio optimization algorithmic trading financial services

nvidia streampetr

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

autonomous vehicles bev AV Stack automotive

nvidia Cosmos Dataset Search

Accelerate post-training of end-to-end autonomous vehicle stacks with vector search and retrieval for large video datasets.

blueprint Autonomous Vehicles data Physical AI Search Enterprise Cosmos NVIDIA AI

nvidia Ambient Healthcare Agents

Build advanced AI agents for providers and patients using this developer example powered by NeMo Microservices, NVIDIA Nemotron, Riva ASR and TTS, and NVIDIA LLM NIM

agent blueprint blueprint nim Launchable nemo llm NVIDIA AI

nvidia nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extraction document parsing supported language - english

nvidia nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation chat Image-to-Text vision assistant visual question answering

nvidia llama-3.1-nemotron-safety-guard-8b-v3

Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs

content moderation llm safety multilingual guard model multilingual content safety nemoguard

nvidia parakeet-ctc-0.6b-zh-tw

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

ASR Streaming Taiwanese Speech-to-Text NVIDIA NIM

nvidia llama-3_2-nemoretriever-300m-embed-v2

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented Generation Text-to-Embedding NeMo Retriever

nvidia parakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

ASR Streaming Speech-to-Text Mandarin NVIDIA NIM

nvidia parakeet-ctc-0.6b-es

Accurate and optimized Spanish English transcriptions with punctuation and word timestamps.

ASR Streaming Speech-to-Text Spanish NVIDIA NIM

nvidia parakeet-ctc-0.6b-vi

Accurate and optimized Vietnamese-English transcriptions with punctuation and word timestamps.

ASR Streaming Speech-to-Text Vietnamese NVIDIA NIM

nvidia 3D Object Generation

Transform your scene idea into ready-to-use 3D assets using Llama 3.1 8B, NV SANA, and Microsoft TRELLIS

Blueprint Run-on-RTX NVIDIA AI

nvidia Retail Shopping Assistant

Elevate Shopping Experiences Online and In Stores.

blueprint nemo retriever nim Launchable Retrieval-Augmented Generation NVIDIA AI

nvidia nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budget chat reasoning

nvidia cosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

nvidia nemoretriever-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Optical Character Recognition Table Extraction nemo retriever data ingestion extraction

nvidia parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

ASR English NVIDIA NIM NVIDIA Riva speech-to-text

nvidia Streaming Data to RAG

Sensor-captured radio enables real-time awareness, AI-driven analytics for actionable, searchable insights.

blueprint NIM Riva Launchable RAG NVIDIA AI NeMo Retriever

nvidia llama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chat math advanced reasoning instruction following function calling

nvidia llama-3_2-nemoretriever-300m-embed-v1

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented Generation Text-to-Embedding NeMo Retriever

nvidia nemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Optical Character Recognition Table Extraction nemo retriever data ingestion extraction

nvidia nv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use Only Retrieval Augmented Generation Text-to-Embedding

nvidia Vulnerability Analysis for Container Security

Rapidly identify and mitigate container security vulnerabilities with generative AI.

generative ai Launchable nv-embedqa-e5-v5 Blueprint llama-3_1-70b-instruct cybersecurity NVIDIA AI

nvidia Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

vision video-to-text generative AI Launchable Blueprint chat Enterprise NVIDIA AI

nvidia llama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retriever Retrieval Augmented Generation reranking

nvidia nv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embedding run-on-rtx Retrieval Augmented Generation Nemo retriever Text-to-Embedding

nvidia Build A Generative Virtual Screening Pipeline

This blueprint shows how generative AI and accelerated NIM microservices can design optimized small molecules smarter and faster.

Chemistry NIM NVIDIA BioNemo Blueprint Enterprise BioNemo Docking Drug Discovery

nvidia genmol

Fragment-Based Molecular Generation by Discrete Diffusion.

Chemistry nim BioNemo Molecule Generation Drug Discovery

nvidia llama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retriever embedding Retrieval Augmented Generation Text-to-Embedding

nvidia Build A Generative Protein Binder Design Pipeline

This blueprint shows how generative AI and accelerated NIM microservices can design protein binders smarter and faster.

NVIDIA BioNemo Blueprint Enterprise BioNemo Biology Drug Discovery Protein Generation

nvidia Evo 2 Protein Design

This workflow shows how generative AI can generate DNA sequences that can be translated into proteins for bioengineering.

blueprint NIM biology BioNemo Drug Discovery Protein Generation

nvidia llama-3.3-nemotron-super-49b-v1

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

chat math advanced reasoning instruction following function calling

nvidia sparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehicles bev av stack automotive

nvidia bevformer

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

autonomous vehicles bev automotive perception

nvidia molmim

MolMIM performs controlled generation, finding molecules with the right properties.

Chemistry nim BioNemo Molecule Generation Drug Discovery

nvidia Safety for Agentic AI

Improve safety, security, and privacy of AI systems at build, deploy and run stages.

security Launchable Blueprint safety privacy Nemo Guardrails open models NVIDIA AI

nvidia Build an AI Agent for Enterprise Research

Build a custom enterprise research assistant powered by state-of-the-art models that process and synthesize multimodal data, enabling reasoning, planning, and refinement to generate comprehensive reports.

NIM Launchable Llama Nemotron Reasoning Blueprint Enterprise Retrieval-Augmented Generation NVIDIA AI NeMo Retriever

nvidia Build an AI Virtual Assistant

Create intelligent virtual assistants for customer service across every industry

Customer Service Launchable Blueprint Retrieval-augmented generation llm contact center NVIDIA AI

nvidia llama-3.1-nemotron-ultra-253b-v1

Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.

chat math advanced reasoning instruction following function calling

nvidia Multi-LLM NIM

Use the multi-LLM compatible NIM container to deploy a broad range of LLMs from Hugging Face.

Blueprint NVIDIA AI

nvidia magpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

TTS Text-to-Speech NVIDIA NIM NVIDIA Riva

nvidia Build an Enterprise RAG Pipeline Blueprint

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.

NIM Launchable Nemotron Blueprint Enterprise Retrieval-Augmented Generation NVIDIA AI NeMo Retriever

nvidia nv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Data ingestion Chart Detection nemo retriever Table Detection run-on-rtx extraction

nvidia Build Digital Twins for AI Factory Design and Operations

Design, test, and optimize a new generation of intelligence manufacturing data centers using digital twins.

AI Factory Industrial NVIDIA Omniverse Blueprint simulation Enterprise

nvidia usdcode

State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.

OpenUSD Synthetic Data Generation Digital Twin chat Code Generation

iguazio AI Orchestration for Data Flywheel

Orchestrate AI agents for data flywheel with MLRun and NVIDIA NeMo microservices.

Orchestration Launchable AI Agents Data Flywheel Blueprint Partner NVIDIA AI

nvidia nemoguard-jailbreak-detect

Industry leading jailbreak classification model for protection from adversarial attempts

nemo guardrails llm security NIM Prompt Injection Safety and Moderation LLM Safety nemotron

nvidia llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligence chat multiple image understanding OCR

nvidia Refine AI Agents through Continuous Model Distillation with Data Flywheels

Build a data flywheel, with NVIDIA NeMo microservices, that continuously optimizes AI agents for latency and cost — while maintaining accuracy targets.

NIM Launchable Data Flywheel Blueprint Enterprise NeMo microservices NVIDIA AI

nvidia llama-3.1-nemotron-nano-4b-v1.1

State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents

edge tool calling chat reasoning math

nvidia cosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data Generation Autonomous Vehicles Physical AI robotics video-to-world

nvidia llama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

chat math advanced reasoning instruction following function calling

nvidia parakeet-ctc-1.1b-asr

Record-setting accuracy and performance for English transcription.

ASR Streaming English Speech-to-Text batch NVIDIA NIM

nvidia magpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

TTS Text-to-Speech NVIDIA NIM NVIDIA Riva multilingual

nvidia riva-translate-1.6b

Enable smooth global interactions in 36 languages.

Text Translation Neural machine translation NVIDIA NIM

nvidia llama-3.2-nemoretriever-500m-rerank-v2

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

nemo retriever Retrieval Augmented Generation reranking

nvidia llama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retriever embedding Retrieval Augmented Generation Text-to-Embedding

nvidia AI Agent for Telecom Network Configuration Planning

Automate and optimize the configuration of radio access network (RAN) parameters using agentic AI and a large language model (LLM)-driven framework.

nim Launchable Blueprint simulation Telecommunications NVIDIA AI

nvidia audio2face-3d

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

Speech-to-Animation Digital Humans Audio-to-Face NVIDIA NIM

nvidia Background Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

Nvidia Maxine Speech-to-speech Digital Human Speech Enhancement

nvidia nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer vision multimodal embeddings text and image Run-on-rtx

nvidia parakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

ASR Streaming English Batch Speech-to-Text Fast NVIDIA NIM Run-on-RTX

nvidia studiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

Nvidia Maxine Speech-to-speech Digital Human Run-on-RTX Speech Enhancement

nvidia 3D Guided Generative AI

Create high quality images using Flux.1 in ComfyUI, guided by 3D.

Blueprint Run-on-RTX NVIDIA AI

nvidia magpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

TTS Text-to-Speech NVIDIA NIM NVIDIA Riva

nvidia Single Cell Analysis

Investigate, understand, and interpret single cell data in minutes, not days by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS

RAPIDS RNA Sequencing Launchable Blueprint Genomics Single Cell Biology NVIDIA AI

nvidia Biomedical AI-Q Research Agent Blueprint

Build advanced AI agents within the biomedical domain using the AI-Q Blueprint and the BioNeMo Virtual Screening Blueprint

Launchable Agent Blueprint Blueprint Retrieval-augmented generation llm

nvidia nemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognition nemo retriever data ingestion table extraction supported language - english

nvidia Financial Fraud Detection

Detect and prevent sophisticated fraudulent activities for financial services with high accuracy.

Financial Services Launchable Blueprint GNN Payments NVIDIA AI Fraud Detection

nvidia Test Multi-Robot Fleets for Industrial Automation

Simulate, test, and optimize physical AI and robotic fleets at scale in industrial digital twins before real-world deployment.

industrial NVIDIA Omniverse Blueprint simulation Enterprise omniverse blueprint

nvidia nv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retriever Embedding Retrieval Augmented Generation

nvidia LLM Router

Route LLM requests to the best model for the task at hand.

Launchable Blueprint LLM Router NVIDIA AI

nvidia Synthetic Manipulation Motion Generation for Robotics

Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.

NVIDIA Omniverse Blueprint synthetic data Enterprise robotics physical ai robot learning Humanoids NVIDIA Isaac GR00T text-to-world image-to-world teleop

nvidia 3D Conditioning for Precise Visual Generative AI

Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.

visual design NVIDIA Omniverse Blueprint simulation Enterprise

nvidia llama3-chatqa-1.5-8b

Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

text-to-text chat Non-Commercial Use Only

igenius colosseum_355b_instruct_16k

NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry

Heavy industry Government chat Highly regulated use case support Financial services

nvidia nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

Indic chat Text-to-Text Language Generation

mistralai mistral-7b-instruct-v0.2

This LLM follows instructions, completes requests, and generates creative text.

chat Text-to-Text Language Generation NVIDIA NIM

nvidia cuopt

World-record accuracy and performance for complex route optimization.

Route Optimization

nvidia Genomics Analysis

Easily run essential genomics workflows to save time leveraging Parabricks

Parabricks Launchable Blueprint Genomics Biology DNA Sequencing NVIDIA AI

nvidia AI Weather Analytics with Earth-2

Develop AI powered weather analysis and forecasting application visualizing multi-layered geospatial data.

Blueprint Climate Science Enterprise Weather Simulation AI Weather Prediction NVIDIA AI Earth-2

nvidia Build a Digital Twin for Interactive Fluid Simulation

This NVIDIA Omniverse™ Blueprint demonstrates how commercial software vendors can create interactive digital twins.

NVIDIA Omniverse Blueprint CAE simulation External Aerodynamics Enterprise Computer-aided-engineering

pipecat Voice Agent Framework for Conversational AI

Automate voice AI agents with NVIDIA NIM microservices and Pipecat.

Pipecat Launchable AI Agents Blueprint Conversational AI Partner NVIDIA AI

nvidia PDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

blueprint Multi-modal Launchable Text-to-Speech Conversational AI PDF-to-Podcast NVIDIA AI AI Agent

nvidia parakeet-1.1b-rnnt-multilingual-asr

High accuracy and optimized performance for transcription in 25 languages

Automatic Speech Recognition Speech-to-Text NVIDIA NIM NVIDIA Riva

nvidia vista-3d

VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.

Interactive Annotation Image Segmentation Non-Commercial Use Only Medical Imaging

nvidia megatron-1b-nmt

Enable smooth global interactions in 36 languages.

Text Translation Neural machine translation NVIDIA NIM

openai whisper-large-v3

Robust Speech Recognition via Large-Scale Weak Supervision.

ASR AST Speech-to-Text batch whisper OpenAI Multilingual NVIDIA NIM NVIDIA Riva

nvidia canary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech Recognition Automatic Speech Translation NVIDIA NIM NVIDIA Riva

nvidia nv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detection computer vision deepstream NVIDIA NIM

nvidia nv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embedding computer vision deepstream NVIDIA NIM object Classification

nvidia llama-3.1-nemoguard-8b-content-safety

Leading content safety model for enhancing the safety and moderation capabilities of LLMs

nemo guardrails LLM safety Safety and moderation dialogue safety nemotron

nvidia llama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

nemo guardrails LLM safety Safety and moderation dialogue safety nemotron

nvidia corrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

AI Weather prediction Weather Simulation Earth-2

nvidia fourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

Weather Simulation AI Weather Prediction Climate science Earth-2

nvidia eyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresence Nvidia Maxine Digital Human

nvidia maisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

Image Generation Medical Imaging NVIDIA NIM

nvidia cosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data Generation Physical AI policy evaluation robotics video-to-world

nvidia nemoretriever-graphic-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Chart Detection nemo retriever Table Detection data ingestion

nvidia nemoretriever-table-structure-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Chart Detection nemo retriever Table Detection data ingestion

nvidia nemoretriever-page-elements-v2

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Chart Detection nemo retriever Table Detection data ingestion run-on-rtx

nvidia cosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLM Vision language model image caption image to text

nvidia rerank-qa-mistral-4b

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

Ranking Retrieval Augmented Generation

nvidia usdvalidate

Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.

Validation OpenUSD Synthetic Data Generation Digital Twin USD Visualization 3D

nvidia llama-3.1-nemotron-70b-reward

Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

Text-to-text Reward Model RLHF

nvidia usdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

OpenUSD Synthetic Data Generation Digital Twin USD Text-to-3D

nvidia visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

image Image Generation cv Image Segmentation vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detection image cv vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character Recognition image Optical Character Detection cv vlm computer vision TAO Toolkit video

nvidia nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chat Text-to-Text Language Generation

nvidia mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation text-to-text chat small language model