Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
53 models
Sort By
dateCreated:DESC
Most Recent
Qwen
Downloadable
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
chat
+4
7.95M
1mo
Qwen
Downloadable
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
chat
+4
13.73M
1mo
Z.ai
Downloadable
glm-5
GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.
MoE
+3
37.18M
1mo
Z.ai
Free Endpoint
glm-4.7
GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
Tool Calling
+4
14.31M
2mo
DeepSeek AI
Free Endpoint
deepseek-v3.2
State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.
chat
+3
15.2M
3mo
Mistral AI
Free Endpoint
mistral-large-3-675b-instruct-2512
A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
chat
+4
5.59M
4mo
Mistral AI
Downloadable
ministral-14b-instruct-2512
A general purpose VLM ideal for chat and instruction based use cases
chat
+4
1.65M
4mo
NVIDIA
Free Endpoint
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
content moderation
+4
191K
5mo
DeepSeek AI
Free Endpoint
deepseek-v3.1-terminus
DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.
chat
+4
12.57M
6mo
ByteDance
Free Endpoint
seed-oss-36b-instruct
ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
chat
+3
1.61M
7mo
NVIDIA
Downloadable
nvidia-nemotron-nano-9b-v2
High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
chat
+2
365K
7mo
OpenAI
Downloadable
gpt-oss-20b
Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
reasoning
+4
7.83M
8mo
OpenAI
Downloadable
gpt-oss-120b
Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
reasoning
+4
43.71M
8mo
Opengpt-x
Downloadable
teuken-7b-instruct-commercial-v0.4
Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.
sovereign ai
+5
119K
8mo
Meta
Free Endpoint
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
LLM Multimodal Safety
+3
123K
9mo
NVIDIA
Downloadable
llama-3.2-nemoretriever-1b-vlm-embed-v1
Multimodal question-answer retrieval representing user queries as text and documents as images.
nemo retriever
+3
201K
9mo
Gotocompany
Downloadable
gemma-2-9b-cpt-sahabatai-instruct
SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.
chat
+5
114K
9mo
Google
Downloadable
gemma-3-1b-it
A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications
chat
+4
3.76K
321K
10mo
Microsoft
Downloadable
phi-4-mini-instruct
Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
chat
+4
636K
10mo
NVIDIA
Downloadable
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nemo guardrails
+4
136K
1y
NVIDIA
Downloadable
nemoguard-jailbreak-detect
Industry leading jailbreak classification model for protection from adversarial attempts
nemo guardrails
+6
31.5K
9mo
NVIDIA
Downloadable
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nemo guardrails
+4
161K
1y
Igenius
Free Endpoint
colosseum_355b_instruct_16k
NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry
chat
+4
10mo
Tiiuae
Free Endpoint
falcon3-7b-instruct
Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities
chat
+6
1.58M
10mo
Items per page
24
1
1
2
2
3
3
of 3 pages