Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

5 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Developer Example

Launchable

Use Case

Image-to-Text

Code Generation

Retrieval Augmented Generation

Text-to-Embedding

Inference Providers

Deepinfra

OpenRouter

GMI Cloud

Together AI

Bitdeer

Publisher

NVIDIA

Qwen

Google

Mistral AI

Vision-Language Model Fine-tuning

Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3

Playbook

DGX

8mo

Items per page

of 1 pages

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

10K

Qwen

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

Model

MoE

13M

4mo

NVIDIA

DownloadableFree Endpoint

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

Model

doc intelligence

10M

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

Model

tool calling

10M

3mo