Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

8 results for

Filters (2)

API Endpoint

Download Available

Launchable

Enterprise

Use Case

Synthetic Data Generation

Image-to-Text

Object Detection

Image Generation

Optical Character Recognition

Publisher

NVIDIA

Google

Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Blueprint

vision

NVIDIA

cosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

Model

video understanding

15.93K

6mo

NVIDIA

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model

video understanding

194K

2mo

NVIDIA

ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Model

Optical Character Recognition

798

Google

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

327K

NVIDIA

retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Model

Object Detection

794

DGX Spark

1 HR

Vision-Language Model Fine-tuning

Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3

Playbook

DGX

5mo

NVIDIA

visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

Model

image

615

Items per page

of 1 pages