NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for development
Accelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
DiscoverModelsBlueprintsGPUsDocsForums

workstations

  • Run on RTX
  • Run on Spark

models

  • Reasoning
  • Vision
  • Visual Design
  • Retrieval
  • Speech
  • Biology
  • Simulation
  • Climate & Weather
  • Safety & Moderation

industries

  • Automotive
  • Financial Services
  • Gaming
  • Healthcare
  • Industrial
  • Robotics

Vision

Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Explore NVIDIA Blueprints

Comprehensive reference workflows that accelerate application development and deployment, featuring NVIDIA acceleration libraries, APIs, and microservices for AI agents, digital twins, and more.

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

chatgenerative AIvideo-to-textvision

Specialized Foundation Models

Computer vision models that excel at particular visual perception tasks

NVIDIA
Free Endpoint

ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.
Optical Character Detection
1y
NVIDIA
Free Endpoint

visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
Image Segmentation
1y
NVIDIA
Downloadable

nvclip

NV-CLIP is a multimodal embeddings model for image and text.
Computer vision
9mo
NVIDIA
Free Endpoint

retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
NVIDIA NIM
1y

Vision Language Models (VLM)

Multimodal models that can reason against image and video inputs and perform descriptive language generation​

NVIDIA
Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
Physical AI
2mo
Meta
Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Image-Text Retrieval
9mo
Meta
Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.
Image Captioning
9mo
NVIDIA
Free Endpoint

nv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.
NVIDIA NIM
1y
NVIDIA
Free Endpoint

nv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.
NVIDIA NIM
12mo
Google
Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses
Language Generation
1y