Try NVIDIA NIM APIs

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models

Free serverless APIs for development

Self-Host on your GPU infrastructure

Continuous vulnerability fixes

Discover

Build with gpt-oss: OpenAI's Latest Open-Weight Reasoning Model

Try Now

Achieves near-parity with o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU.

Featured Models

View All

The leading open models built by the community, optimized and accelerated by NVIDIA's enterprise-ready inference runtime.

Run Anywhere

openai gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

math chat reasoning text-to-text

Run Anywhere

openai gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

math chat reasoning text-to-text

PREVIEW

nvidia llama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

advanced reasoning function calling instruction following math

PREVIEW

moonshotai kimi-k2-instruct

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities

advanced reasoning agentic coding chat

Customize a Blueprint

View All

Get started with workflows and code samples to build AI applications from the ground up.

nvidia Build an AI Agent for Enterprise Research

Build artificial general agents (AGA) powered by AGI models that continuously process and synthesize multimodal enterprise data, enabling reasoning, planning, and refinement to generate comprehensive reports.

blueprint llama nemotron nim nemo retriever reasoning retrieval-augmented generation enterprise launchable nvidia ai

nvidia Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

blueprint enterprise launchable nvidia ai chat generative ai video-to-text vision

nvidia Build an Enterprise RAG pipeline

Continuously extract, embed, and index multimodal data for fast, accurate semantic search. Built on world-class NeMo Retriever models, the RAG blueprint connects AI applications to multimodal enterprise data wherever it resides.

blueprint nim nemo retriever retrieval-augmented generation enterprise launchable nvidia ai

nvidia Safety for Agentic AI

Improve safety, security, and privacy of AI systems at build, deploy and run stages.

blueprint nemo guardrails launchable nvidia ai open models privacy safety security