Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters (1)
64 models
Sort By
dateCreated:DESC
Most Recent
NVIDIA
Downloadable
nemotron-3-nano-omni-30b-a3b-reasoning
Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
Image-to-Text
+4
Today
Items per page
24
1
1
2
2
3
3
of 3 pages
DeepSeek AI
Downloadable
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
coding
+3
361K
4d
DeepSeek AI
Downloadable
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
coding
+3
781K
4d
Z.ai
Downloadable
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
Agentic AI
+3
2.53M
1w
Z.ai
Free Endpoint
glm-4.7
GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
Tool Calling
+3
4.57M
1w
NVIDIA
Downloadable
ising-calibration-1-35b-a3b
Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
Quantum
+3
93.01K
2w
Minimaxai
Free Endpoint
minimax-m2.7
MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
coding
+2
4.15M
2w
Google
Downloadable
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
coding
+3
3.46M
3w
Mistral AI
Downloadable
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
code generation
+2
8.24M
1mo
NVIDIA
Free Endpoint
nemotron-voicechat
Nemotron 3 Voicechat
English
+2
3.33K
1mo
NVIDIA
Downloadable
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
MoE
+4
43.63M
1mo
Qwen
Downloadable
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
tool calling
+3
7.65M
1mo
Minimaxai
Deprecation in 15d
Downloadable
minimax-m2.5
MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
reasoning
+2
9.85M
2mo
Qwen
Downloadable
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
MoE
+3
9.6M
2mo
Stepfun-ai
Free Endpoint
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
Agentic
+2
9.07M
2mo
Moonshotai
Deprecation in 3d
Downloadable
kimi-k2.5
1T multimodal MoE for high‑capacity video and image understanding with efficient inference.
Multimodal
+3
35.21M
3mo
DeepSeek AI
Deprecation in 7d
Free Endpoint
deepseek-v3.2
State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.
long context
+2
9.51M
4mo
NVIDIA
Downloadable
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
MoE
+3
9.61M
4mo
Mistral AI
Deprecation in 14d
Free Endpoint
devstral-2-123b-instruct-2512
State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.
coding
+3
2.81M
4mo
Moonshotai
Free Endpoint
kimi-k2-thinking
Open reasoning model with 256K context window, native INT4 quantization and enhanced tool use.
Conversational
+3
3.37M
4mo
Mistral AI
Free Endpoint
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
language generation
+3
4.15M
4mo
Mistral AI
Downloadable
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
language generation
+3
1.6M
4mo
NVIDIA
Downloadable
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
language generation
+3
4.63M
6mo
DeepSeek AI
Deprecation in 7d
Free Endpoint
deepseek-v3.1-terminus
DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.
tool calling
+3
7.29M
6mo