Try NVIDIA NIM APIs

⌘KCtrl+K

Explore

Models

Skills

⌘KCtrl+K

5 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Image Generation

Text-to-Image

Synthetic Data Generation

Optical Character Recognition

Inference Providers

Together AI

Deep Infra

Bitdeer AI

GMI Cloud

Vultr

Publisher

Google

NVIDIA

Microsoft

Black forest labs

Qwen

NIM Container GPUs

B200

H100 80GB HBM3

H100 NVL

H200

L40S

Labels (1)

language generation

Sort By

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

Items per page

of 1 pages

10.29K

Microsoft

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Model

Speech Recognition

269K

Google

Free Endpoint

gemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Model

language generation

43.86M

10mo

Google

Free Endpoint

gemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Model

language generation

3.74M

10mo

NVIDIA

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Model

language generation

2.44M

7mo