Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

16 results for

Filters (1)

Free Endpoint

9

Partner Endpoint

11

Download Available

8

Launchable

0

Enterprise

0

Use Case

Image-to-Text

13

Code Generation

0

Drug Discovery

0

Retrieval Augmented Generation

0

Object Detection

0

Inference Providers

Fireworks AI

8

Deep Infra

7

Bitdeer AI

6

Together AI

4

GMI Cloud

3

Publisher

Meta

4

Mistral AI

4

Microsoft

2

Google

2

NVIDIA

1

Blueprint Type

NVIDIA AI

0

NVIDIA Omniverse

0

NVIDIA BioNemo

0

NVIDIA Isaac GR00T

0

Labels (1)

Image-to-Text

Sort By

University at Buffalo

Free Endpoint

cached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

173

1y

Free Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

6.34M

10mo

Downloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

42.89M

2mo

Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

976K

10mo

Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

849K

10mo

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

6.94M

8mo

DownloadableFree Endpoint

llama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generation

22.43K

8mo

Downloadable

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

2.32M

4mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

6.27M

4mo

Free Endpoint

mistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

3.25M

8mo

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

2.9M

2w

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

812K

5mo

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

37.36K

1y

Free Endpoint

phi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision Assistant

509K

1y

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

348K

10mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

7.38M

4w

Items per page

of 1 pages