⌘KCtrl+K

Explore

Models

Skills

Blueprints

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Drug Discovery

Code Generation

Retrieval Augmented Generation

Speech-to-Text

Inference Providers

Deep Infra

Bitdeer AI

GMI Cloud

Together AI

Lightning AI

Publisher

Meta

Mistral AI

NVIDIA

Qwen

Google

NIM Container GPUs

B200

H100 80GB HBM3

L40S

H200

GB200

Labels (1)

image-to-text

12 models

Sort By

Moonshotai

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

B200

Items per page

of 1 pages

6.69M

1mo

NVIDIA

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

Image-to-Text

8.57M

1mo

Mistral AI

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

17.36M

2mo

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

B200

9.95M

3mo

Mistral AI

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

3.3M

6mo

Mistral AI

DownloadableFree Endpoint

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

3.54M

6mo

NVIDIA

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

2.31M

7mo

Meta

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

23.49M

10mo

Microsoft

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

274K

Meta

DownloadableFree Endpoint

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.68M

Meta

DownloadableFree Endpoint

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Image-Text Retrieval

2.76M

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

image

10.68K