Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

6 results for

Filters (2)

Free Endpoint

Partner Endpoint

Download Available

Enterprise

Launchable

Use Case

Image-to-Text

Object Detection

Image Generation

Image-to-Embedding

Optical Character Recognition

Inference Providers

Fireworks AI

Bitdeer AI

Deep Infra

Together AI

GMI Cloud

Publisher

phi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Model

Vision Assistant

534K

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

110K

Google

Free Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Model

chat

6.59M

10mo

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Model

chat

6.9M

8mo

NVIDIA

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

chat

981K

5mo

llama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

Model

language generation

23.06K

8mo

Items per page

of 1 pages