NVIDIA
Explore
Models
Blueprints
GPUs
Docs

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for development
Accelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
DiscoverModelsBlueprintsGPUsDocsForums

workstations

  • Run on RTX
  • Run on Spark

models

  • Reasoning
  • Vision
  • Visual Design
  • Retrieval
  • Speech
  • Biology
  • Simulation
  • Climate & Weather
  • Safety & Moderation

industries

  • Automotive
  • Financial Services
  • Gaming
  • Healthcare
  • Industrial
  • Robotics

Vision

Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Explore NVIDIA Blueprints

Comprehensive reference workflows that accelerate application development and deployment, featuring NVIDIA acceleration libraries, APIs, and microservices for AI agents, digital twins, and more.

Enterprise

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Specialized Foundation Models

Computer vision models that excel at particular visual perception tasks

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

Run Anywhere

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Vision Language Models (VLM)

Multimodal models that can reason against image and video inputs and perform descriptive language generation​

Run Anywhere

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

Run Anywhere

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Run Anywhere

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Deprecation in 23 days

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses