⌘KCtrl+K

Your Privacy Choices

Contact

Explore

Models

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Synthetic Data Generation

Drug Discovery

Code Generation

Retrieval Augmented Generation

Inference Providers

Deep Infra

Together AI

Bitdeer AI

GMI Cloud

CoreWeave

Publisher

NVIDIA

Meta

Qwen

Google

Stepfun ai

NIM Container GPUs

H100 80GB HBM3

H200

L40S

B200

H100 NVL

13 models

Sort By

NVIDIA

DownloadableFree Endpoint

cosmos3-nano-reasoner

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

B200

Items per page

of 1 pages

1.22K

Stepfun-ai

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

B200

2.58M

NVIDIA

DownloadableFree Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

Quantum

339K

1mo

Qwen

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

12.11M

3mo

NVIDIA

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

B200

300K

5mo

NVIDIA

Downloadable

nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extraction

278K

7mo

NVIDIA

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

2.29M

7mo

NVIDIA

DownloadableFree Endpoint

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligence

9.38M

11mo

Meta

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

23.78M

10mo

NVIDIA

Downloadable

nemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognition

98.76K

Meta

DownloadableFree Endpoint

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.66M

Meta

DownloadableFree Endpoint

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Image-Text Retrieval

2.71M

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

image

11.24K