Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
25 models
Sort By
dateCreated:DESC
Most Recent
OpenAI
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
text-to-text
+3
7.97M
7mo
OpenAI
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
text-to-text
+3
36.1M
7mo
NVIDIA
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
chat
+4
4.78M
7mo
Sarvamai
sarvam-m
Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.
coding
+6
473K
7mo
Microsoft
phi-4-mini-flash-reasoning
Lightweight reasoning model for applications in latency bound, memory/compute constrained environments
edge
+4
468K
7mo
Mistral AI
magistral-small-2506
High performance reasoning model optimized for efficiency and edge deployment
coding
+4
4.05M
8mo
NVIDIA
llama-3.1-nemotron-nano-4b-v1.1
State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents
edge
+4
98.31K
8mo
Marin
marin-8b-instruct
State-of-the-art open model trained on open datasets, excelling in reasoning, math, and science.
Reasoning
+4
487K
9mo
NVIDIA
llama-3.1-nemotron-ultra-253b-v1
Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.
chat
+4
7.73M
8mo
Qwen
qwq-32b
Powerful reasoning model capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems.
coding
+3
4.01M
8mo
NVIDIA
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
chat
+4
1.11M
7mo
NVIDIA
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
chat
+4
606K
8mo
DeepSeek AI
deepseek-r1-distill-llama-8b
Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.
Distillation
+5
4.77M
8mo
DeepSeek AI
deepseek-r1-distill-qwen-32b
Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.
coding
+4
2.36K
4.82M
9mo
DeepSeek AI
deepseek-r1-distill-qwen-14b
Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.
coding
+4
1.98K
4.37M
9mo
DeepSeek AI
deepseek-r1-distill-qwen-7b
Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.
coding
+3
2.16K
4.71M
9mo
Mistral AI
mistral-small-24b-instruct
Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.
code
+4
573K
8mo
Tiiuae
falcon3-7b-instruct
Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities
Coding
+6
1.77M
9mo
Qwen
qwen2.5-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.
Chinese Language Generation
+3
1.32M
9mo
Meta
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
Reasoning
+5
23.53M
9mo
Qwen
qwen2-7b-instruct
Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.
Chinese Language Generation
+3
640K
9mo
Baichuan AI
baichuan2-13b-chat
Support Chinese and English chat, coding, math, instruction following, solving quizzes
Chinese Language Generation
+3
524K
9mo
Upstage
solar-10.7b-instruct
Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
Non-Commercial Use Only
+4
499K
11mo
Microsoft
phi-3-mini-4k-instruct
Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
chat
+4
503K
9mo
Items per page
24
1
1
2
2
of 2 pages