Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

10 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Launchable

Enterprise

Use Case

Image-to-Text

Synthetic Data Generation

Drug Discovery

Retrieval Augmented Generation

Code Generation

Inference Providers

Deep Infra

Bitdeer AI

Together AI

GMI Cloud

CoreWeave

Publisher

NVIDIA

Google

Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Blueprint

NVIDIA AI

2mo

Items per page

of 1 pages

NVIDIA

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model

video understanding

407K

4mo

Google

Deprecation in 7dFree Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Model

Vision Assistant

4.22M

11mo

NVIDIA

Downloadable

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

Model

Quantum

193K

DGX Spark

20 MIN

Live VLM WebUI

Real-time Vision Language Model interaction with webcam streaming

Playbook

Vision AI

4mo

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Model

language generation

16.99M

9mo

NVIDIA

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

language generation

2.64M

6mo

NVIDIA

Downloadable

nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Model

Computer vision

31.7K

10mo

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

15.91K

DGX Spark

1 HR

Vision-Language Model Fine-tuning

Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3

Playbook

DGX

7mo