Skip to main content
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters (1)
6 models
Sort By
dateCreated:DESC
Most Recent
DeepSeek AI
Downloadable
Free Endpoint
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
B200
+6
Items per page
24
1
1
of 1 pages
13.19M
1mo
DeepSeek AI
Downloadable
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
B200
+4
8.1M
1mo
NVIDIA
Downloadable
Free Endpoint
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
MoE
+4
57.3M
2mo
Qwen
Downloadable
Free Endpoint
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
MoE
+3
11.04M
3mo
NVIDIA
Downloadable
Free Endpoint
nemotron-3-nano-30b-a3b
Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more
MoE
+3
11.25M
5mo
Qwen
Free Endpoint
qwen3-coder-480b-a35b-instruct
Excels in agentic coding and browser use and supports 256K context, delivering top results.
agentic coding
+3
5.13M
9mo