NVIDIA
Explore
Models
Blueprints
GPUs
Docs

Deploy Models Now with NVIDIA NIM

Optimized inference for the world’s leading models
Free serverless APIs for developmentAccelerated by DGX Cloud
Self-Host on your GPU infrastructure
Continuous vulnerability fixes
ExploreModelsBlueprintsGPUsDocsForums

models

  • Reasoning
  • Vision
  • Visual Design
  • Retrieval
  • Speech
  • Biology
  • Simulation
  • Climate & Weather
  • Safety & Moderation
  • Run on RTX

industries

  • Automotive
  • Gaming
  • Healthcare
  • Industrial
  • Robotics

Vision

Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Explore NVIDIA Blueprints

Comprehensive reference workflows that accelerate application development and deployment, featuring NVIDIA acceleration libraries, APIs, and microservices for AI agents, digital twins, and more.

Enterprise

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Vision Language Models (VLM)

Multimodal models that can reason against image and video inputs and perform descriptive language generation​

Run Anywhere

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

Run Anywhere

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Run Anywhere

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Deprecation in 64 days

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

PREVIEW

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

PREVIEW

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

PREVIEW

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Specialized Foundation Models

Computer vision models that excel at particular visual perception tasks

PREVIEW

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

PREVIEW

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

Run Anywhere

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

PREVIEW

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.