Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

14 results for

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Synthetic Data Generation

Image-to-Text

Inference Providers

Deepinfra

Bitdeer

Vultr

Eigen AI

GMI Cloud

Publisher

NVIDIA

Google

Moonshotai

NIM Container GPUs

B200

H100 80GB HBM3

H200

Sort By

NVIDIA

DownloadableFree Endpoint

synthetic-video-detector

NVIDIA Synthetic Video Detector is an AI-powered micro-service for detecting AI‑generated (synthetic) videos.

Model

broadcast

Items per page

of 1 pages

90.31K

2mo

NVIDIA

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

Model

Image-to-Text

7.54M

1mo

NVIDIA

Free Endpoint

cosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Model

Synthetic Data Generation

250

11mo

NVIDIA

Free Endpoint

cosmos-transfer2.5-2b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Model

Synthetic Data Generation

3mo

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

10.22K

NVIDIA

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model

video understanding

191K

5mo

NVIDIA

DownloadableFree Endpoint

cosmos3-nano-reasoner

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model

video understanding

1.94K

16d

NVIDIA

DownloadableFree Endpoint

Active Speaker Detection

Detect and track speaker identities across video frames.

Model

broadcast

473

2mo

NVIDIA

Downloadable

eyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

Model

telepresence

1.62K

Moonshotai

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

Model

Multimodal

7.09M

1mo

NVIDIA

Downloadable

LipSync

Generative lip dubbing that syncs lips in a video to input audio.

Model

broadcast

2mo

NVIDIA

Downloadable

Relighting

Re-illuminate people in video to match target lighting from a 360 HDRI environment map.

Model

HDRI

227

2mo

NVIDIA

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

language generation

2.47M

7mo

NVIDIA

Free Endpoint

cosmos3-nano

Generates physics-aware videos from text prompts or an image prompt for physical AI development.

Model

autonomous vehicles

1.79K

16d