NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: video
Sorting by Most Recent

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingsynthetic data generationautonomous vehiclesindustrialphysical aivision language modelreasoningroboticssmart citiesnvidia

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

synthetic data generationautonomous vehiclesphysical airoboticsvideo-to-worldnvidia

nvidiacosmos-predict1-7b

Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.

synthetic data generationautonomous vehiclesphysical airoboticstext-to-worldimage-to-worldnvidia

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

synthetic data generationphysical aipolicy evaluationroboticsvideo-to-worldnvidia

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

visionvideo-to-textgenerative ailaunchableblueprintchatenterprisenvidia ainvidia

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

nvidiaeyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresencenvidia maxinedigital humannvidia

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

optical character recognitionimageoptical character detectioncvvlmcomputer visiontao toolkitvideonvidia

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageimage generationcvimage segmentationvlmcomputer visiontao toolkitvideonvidia nimnvidia

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

object detectionimagecvvlmcomputer visiontao toolkitvideonvidia nimnvidia

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideogoogle

microsoftkosmos-2

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

imagecvmultimodalvlmvisual question answeringcomputer visionimage understandingimage-to-textvideomicrosoft

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia

adeptfuyu-8b

Multi-modal model for a wide range of tasks, including image understanding and language generation.

imagecvmultimodalvlmcomputer visionimage understandinglanguage generationimage-to-textvideoadept