Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
39 models
Sort By
dateCreated:DESC
Most Recent
NVIDIA
Downloadable
llama-nemotron-rerank-1b-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nemo retriever
+2
4.35K
6d
NVIDIA
Downloadable
llama-nemotron-embed-1b-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
Text-to-Embedding
+2
366K
1w
NVIDIA
Downloadable
llama-nemotron-embed-vl-1b-v2
Multimodal question-answer retrieval representing user queries as text and documents as images.
nemo retriever
+3
883K
1mo
NVIDIA
API Endpoint
llama-3.1-nemotron-safety-guard-8b-v3
Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
content moderation
+4
607K
4mo
NVIDIA
Downloadable
llama-3_2-nemoretriever-300m-embed-v2
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
Text-to-Embedding
+2
119K
5mo
NVIDIA
Downloadable
llama-3.3-nemotron-super-49b-v1.5
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
chat
+4
5.03M
7mo
NVIDIA
API Endpoint
llama-3_2-nemoretriever-300m-embed-v1
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
Text-to-Embedding
+2
91.16K
7mo
Meta
API Endpoint
llama-guard-4-12b
Multi-modal model to classify safety for input prompts as well output responses.
LLM Multimodal Safety
+3
495K
8mo
NVIDIA
Downloadable
llama-3.2-nemoretriever-500m-rerank-v2
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
nemo retriever
+2
1.42K
8mo
NVIDIA
Downloadable
llama-3.2-nemoretriever-1b-vlm-embed-v1
Multimodal question-answer retrieval representing user queries as text and documents as images.
nemo retriever
+3
271K
8mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
chat
+3
9.15M
8mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-4b-v1.1
State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents
chat
+4
134K
8mo
NVIDIA
Downloadable
llama-3.1-nemotron-ultra-253b-v1
Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.
chat
+4
8.54M
8mo
Meta
API Endpoint
llama-4-maverick-17b-128e-instruct
A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
chat
+4
3.25M
7mo
Meta
Downloadable
API Endpoint
llama-4-scout-17b-16e-instruct
A multimodal, multilingual 16 MoE model with 17B parameters.
chat
+4
156K
8mo
NVIDIA
Downloadable
llama-3.3-nemotron-super-49b-v1
High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.
chat
+4
1.18M
7mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-8b-v1
Leading reasoning and agentic AI accuracy model for PC and edge.
chat
+4
659K
8mo
DeepSeek AI
Downloadable
deepseek-r1-distill-llama-8b
Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.
Distillation
+5
5M
8mo
NVIDIA
Downloadable
llama-3.1-nemoguard-8b-topic-control
Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
nemo guardrails
+4
549K
11mo
NVIDIA
Downloadable
llama-3.1-nemoguard-8b-content-safety
Leading content safety model for enhancing the safety and moderation capabilities of LLMs
nemo guardrails
+4
574K
11mo
NVIDIA
Downloadable
llama-3.2-nv-embedqa-1b-v2
Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.
nemo retriever
+3
7.04M
7mo
NVIDIA
Downloadable
llama-3.2-nv-rerankqa-1b-v2
Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.
nemo retriever
+2
157K
7mo
Meta
Downloadable
llama-3.3-70b-instruct
Advanced LLM for reasoning, math, general knowledge, and function calling
chat
+5
23.28M
9mo
Institute of Science Tokyo
Downloadable
llama-3.1-swallow-70b-instruct-v0.1
Sovereign AI model trained on Japanese language that understands regional nuances.
chat
+4
526K
9mo
Items per page
24
1
1
2
2
of 2 pages