NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: video
Sorting by Most Recent

nvidiaCosmos Dataset Search

Accelerate post-training of end-to-end autonomous vehicle stacks with vector search and retrieval for large video datasets.

blueprintAutonomous VehiclesdataPhysical AISearchEnterpriseCosmosNVIDIA AI

nvidianemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generationchatImage-to-Textvision assistantvisual question answering

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingSynthetic Data Generationautonomous vehiclesindustrialPhysical AIvision language modelreasoningroboticssmart cities

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data GenerationAutonomous VehiclesPhysical AIroboticsvideo-to-world

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data GenerationPhysical AIpolicy evaluationroboticsvideo-to-world

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

visionvideo-to-textgenerative AILaunchableBlueprintchatEnterpriseNVIDIA AI

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

nvidiaeyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresenceNvidia MaxineDigital Human

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvVision AssistantvlmVisual Question Answeringcomputer visionLanguage GenerationImage-to-Textvideo