Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

Search Results

Searching for: video

Sort By

Publisher

Use Case

NIM Type

Blueprint Type

GPU Types

Launchable

Sorting by Last Updated

moonshotai kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

Multimodal Reasoning chat Mixture-of-Experts Image-to-Text

nvidia cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

nvidia nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation chat Image-to-Text vision assistant visual question answering

nvidia cosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

nvidia cosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data Generation Autonomous Vehicles Physical AI robotics video-to-world

nvidia eyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresence Nvidia Maxine Digital Human

nvidia cosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data Generation Physical AI policy evaluation robotics video-to-world

nvidia cosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLM Vision language model image caption image to text

nvidia visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

image Image Generation cv Image Segmentation vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detection image cv vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character Recognition image Optical Character Detection cv vlm computer vision TAO Toolkit video

google paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

image cv Vision Assistant vlm Visual Question Answering computer vision Language Generation Image-to-Text video