Skip to main content

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

15

Partner Endpoint

16

Download Available

27

Use Case

Image-to-Text

7

Image Generation

7

Text-to-Image

7

Synthetic Data Generation

3

Optical Character Recognition

3

Inference Providers

Deepinfra

14

Together AI

7

OpenRouter

6

GMI Cloud

3

Vultr

3

Publisher

NVIDIA

13

Qwen

4

Black forest labs

4

Google

3

Meta

2

NIM Container GPUs

B200

3

H100 80GB HBM3

2

H200

1

L40S

1

A100 SXM4 80GB

1

32 models

Sort By

Downloadable

nemotron-ocr-v2

Nemotron OCR v2 is a state-of-the-art multilingual text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images.

Table Extraction

151

2d

Items per page

of 2 pages

Free Endpoint

cosmos3-nano

Generates physics-aware videos from text prompts or an image prompt for physical AI development.

autonomous vehicles

2K

26d

DownloadableFree Endpoint

cosmos3-nano-reasoner

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

2K

26d

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

7M

1mo

Downloadable

qwen-image

Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.

1mo

Downloadable

qwen-image-edit

Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.

1mo

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

8M

1mo

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

13M

3mo

Black-forest-labs

Downloadable

flux.2-klein-4b

FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed

271K

3mo

Downloadable

nemotron-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Table Extraction

341K

3mo

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

10M

3mo

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

13M

4mo

Downloadable

llama-nemotron-embed-vl-1b-v2

Multimodal question-answer retrieval representing user queries as text and documents as images.

8M

4mo

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

191K

6mo

Downloadable

nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extraction

218K

8mo

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

2M

8mo

Downloadable

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

4K

9mo

Downloadable

stable-diffusion-3.5-large

Stable Diffusion 3.5 is a popular text-to-image generation model

10mo

Black-forest-labs

Downloadable

FLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

4K

10mo

Downloadable

nemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Table Extraction

9K

11mo

Free Endpoint

gemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation

4M

11mo

Free Endpoint

gemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation

34M

11mo

DownloadableFree Endpoint

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligence

10M

1y

Black-forest-labs

Downloadable

FLUX.1-schnell

FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds

253K

1y