Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

5 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Image Generation

Text-to-Image

Inference Providers

Deepinfra

OpenRouter

Together AI

Publisher

Google

Microsoft

NVIDIA

Black forest labs

Mistral AI

Labels (1)

language generation

Sort By

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

Items per page

of 1 pages

10K

Google

Free Endpoint

gemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Model

language generation

34M

11mo

Microsoft

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Model

Speech Recognition

244K

Google

Free Endpoint

gemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Model

language generation

11mo

NVIDIA

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

language generation

8mo