NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
DiscoverModelsBlueprintsGPUsDocsForums

workstations

  • Run on RTX
  • Run on Spark
  • Run on Station

models

  • Reasoning
  • Vision
  • Visual Design
  • Retrieval
  • Speech
  • Biology
  • Simulation
  • Climate & Weather
  • Safety & Moderation

industries

  • Automotive
  • Financial Services
  • Gaming
  • Healthcare
  • Industrial
  • Robotics

Vision

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for development
Accelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Explore NVIDIA Blueprints

Comprehensive reference workflows that accelerate application development and deployment, featuring NVIDIA acceleration libraries, APIs, and microservices for AI agents, digital twins, and more.

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

chatgenerative AIvideo-to-textvision

Specialized Foundation Models

Computer vision models that excel at particular visual perception tasks

NVIDIA
Downloadable

nvclip

NV-CLIP is a multimodal embeddings model for image and text.
Computer vision
11mo

Vision Language Models (VLM)

Multimodal models that can reason against image and video inputs and perform descriptive language generation​

NVIDIA
Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
Physical AI
4mo
Meta
Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Image-Text Retrieval
11mo
Meta
Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.
Image Captioning
11mo
Google
Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses
Language Generation
1y