Try NVIDIA NIM APIs

Discover

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models

Free serverless APIs for development

Self-Host on your GPU infrastructure

Continuous vulnerability fixes

Build with gpt-oss: OpenAI's Latest Open-Weight Reasoning Model

Try Now

Achieves near-parity with o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.

Featured Models

View All

The leading open models built by the community, optimized and accelerated by NVIDIA's enterprise-ready inference runtime.

Run Anywhere

qwen qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

reasoning text-to-text

PREVIEW

qwen qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

agentic coding browser use long context moe

Run Anywhere

openai gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

math chat reasoning text-to-text

Run Anywhere

nvidia nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

mamba agentic nano reasoning slm thinking budget throughput

Customize a Blueprint

View All

Get started with workflows and code samples to build AI applications from the ground up.

nvidia Build an AI Agent for Enterprise Research

Build a custom deep researcher powered by state-of-the-art models that continuously process and synthesize multimodal enterprise data, enabling reasoning, planning, and refinement to generate comprehensive reports.

blueprint llama nemotron nim nemo retriever reasoning retrieval-augmented generation enterprise launchable nvidia ai

nvidia Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

blueprint enterprise launchable nvidia ai chat generative ai video-to-text vision

nvidia Build an Enterprise RAG pipeline

Continuously extract, embed, and index multimodal data for fast, accurate semantic search. Built on world-class NeMo Retriever models, the RAG blueprint connects AI applications to multimodal enterprise data wherever it resides.

blueprint nim nemo retriever retrieval-augmented generation enterprise launchable nvidia ai

nvidia Safety for Agentic AI

Improve safety, security, and privacy of AI systems at build, deploy and run stages.

blueprint nemo guardrails launchable nvidia ai open models privacy safety security

Build with gpt-oss: OpenAI's Latest Open-Weight Reasoning Model

Try Now

Achieves near-parity with o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.

Featured Models

View All

The leading open models built by the community, optimized and accelerated by NVIDIA's enterprise-ready inference runtime.

Run Anywhere

qwen qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

reasoning text-to-text

PREVIEW

qwen qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

agentic coding browser use long context moe

Run Anywhere

openai gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

math chat reasoning text-to-text

Run Anywhere

nvidia nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

mamba agentic nano reasoning slm thinking budget throughput

Customize a Blueprint

View All

Get started with workflows and code samples to build AI applications from the ground up.