⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

9

Partner Endpoint

10

Download Available

8

Use Case

Image-to-Text

13

Code Generation

0

Retrieval Augmented Generation

0

Drug Discovery

0

Object Detection

0

Inference Providers

Deep Infra

7

Bitdeer AI

6

Together AI

4

GMI Cloud

3

CoreWeave

1

Publisher

Meta

4

Mistral AI

4

Microsoft

2

Google

2

NVIDIA

1

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

Labels (1)

Image-to-Text

16 models

Sort By

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

7.15M

1mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

8.34M

1mo

Downloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

51.97M

2mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

5.4M

4mo

Downloadable

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

1.47M

4mo

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

4.58M

5mo

Free Endpoint

mistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generation

1.5M

9mo

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

8.25M

9mo

DownloadableFree Endpoint

llama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generation

20.14K

9mo

Free Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision Assistant

5.92M

10mo

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

324K

10mo

University at Buffalo

Free Endpoint

cached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

156

1y

Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.01M

10mo

Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.17M

10mo

DeprecatedFree Endpoint

phi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision Assistant

1.22M

1y

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

44.48K

1y

Items per page

of 1 pages