Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters (1)
3 models
Sort By
dateCreated:DESC
Most Recent
Qwen
Downloadable
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
text-generation
+1
22.35M
6mo
ByteDance
Free Endpoint
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
thinking budget
+2
1.33M
7mo
Microsoft
Deprecation in 1d
Free Endpoint
phi-4-mini-flash-reasoning
Lightweight reasoning model for applications in latency bound, memory/compute constrained environments
edge
+3
158K
8mo
Items per page
24
1
1
of 1 pages