Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

5 results for

Filters (2)

Free Endpoint

Partner Endpoint

Download Available

Enterprise

Launchable

Use Case

Image-to-Text

Object Detection

Image Generation

Image-to-Embedding

Optical Character Recognition

Inference Providers

Bitdeer AI

Deep Infra

Together AI

GMI Cloud

Publisher

Google

NVIDIA

phi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Model

Vision Assistant

1.22M

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

44.48K

Google

Free Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Model

Vision Assistant

5.92M

10mo

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Model

language generation

8.25M

9mo

NVIDIA

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

language generation

4.58M

5mo

Items per page

of 1 pages