Skip to main content
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters (1)
10 models
Sort By
dateCreated:DESC
Most Recent
Mistral AI
Downloadable
Free Endpoint
mistral-medium-3.5-128b
A high performing model for text generation, coding and agentic use cases
coding
+3
3.14M
1mo
Items per page
24
1
1
of 1 pages
DeepSeek AI
Downloadable
Free Endpoint
deepseek-v4-flash
DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.
B200
+6
13.22M
1mo
DeepSeek AI
Downloadable
deepseek-v4-pro
DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.
B200
+4
8.11M
1mo
Z.ai
Downloadable
Free Endpoint
glm-5.1
GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
B200
+5
25.15M
1mo
Google
Downloadable
Free Endpoint
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
B200
+6
5.64M
2mo
Qwen
Downloadable
Free Endpoint
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
MoE
+3
11.16M
3mo
Stepfun-ai
Free Endpoint
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
Agentic
+2
11.53M
4mo
Mistral AI
Free Endpoint
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
language generation
+3
3.33M
6mo
Qwen
Downloadable
Free Endpoint
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
B200
+4
22.67M
8mo
Qwen
Free Endpoint
qwen3-coder-480b-a35b-instruct
Excels in agentic coding and browser use and supports 256K context, delivering top results.
agentic coding
+3
5.03M
9mo