Skip to main content

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint6

Partner Endpoint1

Download Available5

Use Case

Image-to-Text3Synthetic Data Generation2Drug Discovery0Retrieval Augmented Generation0Speech-to-Text0

Inference Providers

Deepinfra1OpenRouter1Together AI0GMI Cloud0Bitdeer0

Publisher

NVIDIA4Meta1Google1Stepfun ai1Mistral AI0

NIM Container GPUs

H100 80GB HBM30A100 SXM4 80GB0L40S0A10G0B2000

Labels (1)

Vision

7 models

Sort By

DownloadableFree Endpoint

cosmos3-nano-reasoner

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities Synthetic Data Generation

Items per page

of 1 pages

2K API calls in the last 30 days

Last updated on June 1, 2026

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

7M API calls in the last 30 days

Last updated on May 29, 2026

DownloadableFree Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

reasoning Vision Language Model calibration

442K API calls in the last 30 days

Last updated on April 14, 2026

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities Synthetic Data Generation

191K API calls in the last 30 days

Last updated on December 27, 2025

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

vision assistant visual question answering Image-to-Text

5M API calls in the last 30 days

Last updated on October 28, 2025

Deprecation in 7dFree Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

vision assistant visual question answering Image-to-Text

16M API calls in the last 30 days

Last updated on July 17, 2025

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

cv Vision Assistant vlm Visual Question Answering computer vision Language Generation video Image-to-Text

12K API calls in the last 30 days

Last updated on August 26, 2024