Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Record-setting accuracy and performance for Mandarin Taiwanese English transcriptions.

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.