Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

9 results for

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Synthetic Data Generation

Inference Providers

Deep Infra

Bitdeer AI

Publisher

NVIDIA

Meta

Google

NIM Container GPUs

B200

B300 SXM6 AC

DGX Spark

GB200

GH200 120GB

Sort By

Meta

Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

Items per page

of 1 pages

1.56M

Meta

Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

2.22M

NVIDIA

Downloadable

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

Model

Quantum

352K

1mo

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

15.84K

Meta

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Model

language generation

26.82M

10mo

NVIDIA

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Model

B200

505K

5mo

NVIDIA

Downloadable

nemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

Model

optical character recognition

125K

11mo

NVIDIA

Downloadable

nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

Model

text and table extraction

293K

7mo

NVIDIA

Downloadable

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

Model

doc intelligence

8.58M

11mo