Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

6 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Synthetic Data Generation

Inference Providers

Deep Infra

Bitdeer AI

GMI Cloud

Together AI

Publisher

NVIDIA

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

Items per page

of 1 pages

1.56M

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

2.22M

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

15.84K

NVIDIA

Downloadable

nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Model

Computer vision

11mo

NVIDIA

Downloadable

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

Model

doc intelligence

8.58M

11mo

Qwen

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

Model

MoE

12.01M

3mo