Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
16 models
Sort By
dateCreated:DESC
Most Recent
Qwen
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
tool calling
+4
Today
Moonshotai
kimi-k2.5
1T multimodal MoE for high‑capacity video and image understanding with efficient inference.
Multimodal
+4
19.52M
1mo
Mistral AI
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
language generation
+4
4.89M
3mo
Mistral AI
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
language generation
+4
3.6M
3mo
NVIDIA
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
language generation
+4
1.55M
4mo
Mistral AI
mistral-medium-3-instruct
Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.
language generation
+4
3.69M
7mo
Meta
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
language generation
+4
2.59M
7mo
Meta
llama-4-scout-17b-16e-instruct
A multimodal, multilingual 16 MoE model with 17B parameters.
language generation
+4
295K
7mo
Google
gemma-3-27b-it
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Vision Assistant
+4
5.3M
9mo
Microsoft
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Speech Recognition
+5
385K
9mo
NVIDIA
cosmos-nemotron-34b
Multi-modal vision-language model that understands text/img/video and creates informative responses
VLM
+3
6
1y
University at Buffalo
cached
Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.
nemo retriever
+2
924
1y
Meta
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Image-Text Retrieval
+5
592K
9mo
Meta
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Image-Text Retrieval
+5
559K
9mo
Microsoft
phi-3.5-vision-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Vision Assistant
+3
451K
1y
Google
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
image
+8
325K
1y
Items per page
24
1
1
of 1 pages