Deploy Models Now with NVIDIA NIM
Optimized inference for the world’s leading modelsFree serverless APIs for development
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
The top large language models for your enterprise AI

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities
The latest innovations in intelligence models