Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters (1)
6 models
Sort By
dateCreated:DESC
Most Recent
Qwen
Downloadable
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
MoE
+3
Items per page
24
1
1
of 1 pages
9.6M
2mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
doc intelligence
+2
7.32M
10mo
Meta
Downloadable
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Image-Text Retrieval
+4
908K
11mo
Meta
Downloadable
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Image-Text Retrieval
+4
1.38M
11mo
NVIDIA
Downloadable
nvclip
NV-CLIP is a multimodal embeddings model for image and text.
Computer vision
+3
57.45K
10mo
Google
Free Endpoint
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
image
+8
28.56K
1y