Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
22 models
Sort By
dateCreated:DESC
Most Recent
Google
Downloadable
gemma-4-31b-it
Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.
coding
+4
15.72K
1d
NVIDIA
Downloadable
nemotron-3-super-120b-a12b
Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more
chat
+5
33.52M
3w
Qwen
Downloadable
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
chat
+4
7.81M
4w
Qwen
Downloadable
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
chat
+4
13.54M
1mo
Z.ai
Downloadable
glm-5
GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.
MoE
+3
36.94M
1mo
Stepfun-ai
Free Endpoint
step-3.5-flash
200B open-source reasoning engine with sparse MoE powering frontier agentic AI.
chat
+3
8.99M
2mo
Z.ai
Free Endpoint
glm-4.7
GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
Tool Calling
+4
14.46M
2mo
DeepSeek AI
Free Endpoint
deepseek-v3.2
State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.
chat
+3
15.8M
3mo
Mistral AI
Free Endpoint
devstral-2-123b-instruct-2512
State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.
coding
+4
4.39M
3mo
Mistral AI
Free Endpoint
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
chat
+4
6.07M
4mo
DeepSeek AI
Free Endpoint
deepseek-v3.1-terminus
DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.
chat
+4
13.17M
5mo
Qwen
Downloadable
qwen3-next-80b-a3b-instruct
Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.
chat
+2
21.08M
6mo
Moonshotai
Free Endpoint
kimi-k2-instruct-0905
Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.
long-context
+4
14.28M
6mo
ByteDance
Free Endpoint
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
chat
+3
1.99M
7mo
Qwen
Free Endpoint
qwen3-coder-480b-a35b-instruct
Excels in agentic coding and browser use and supports 256K context, delivering top results.
agentic coding
+4
3.59M
7mo
NVIDIA
Downloadable
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
chat
+2
386K
7mo
Moonshotai
Free Endpoint
kimi-k2-instruct
State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities
coding
+4
20.82M
8mo
Mistral AI
Free Endpoint
mistral-nemotron
Built for agentic workflows, this model excels in coding, instruction following, and function calling
chat
+3
799K
9mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-4b-v1.1
State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents
chat
+4
110K
9mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
chat
+4
371K
9mo
NVIDIA
Downloadable
magpie-tts-multilingual
Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.
TTS
+4
42.44K
9mo
Mistral AI
Downloadable
mistral-small-24b-instruct
Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.
chat
+4
275K
9mo
Items per page
24
1
1
of 1 pages