Explore

Models

Skills

Blueprints

GPUs

Docs

Your Privacy Choices

Contact

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image Generation

Text-to-Image

Image-to-Text

Synthetic Data Generation

Optical Character Recognition

Inference Providers

Deepinfra

OpenRouter

Together AI

GMI Cloud

Bitdeer

Publisher

Qwen

NVIDIA

Google

Mistral AI

Microsoft

NIM Container GPUs

B200

GB200

H100 80GB HBM3

L40S

H200

9 models

Sort By

NVIDIA

Downloadable

nemotron-ocr-v2

Nemotron OCR v2 is a state-of-the-art multilingual text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images.

Table Extraction

Items per page

of 1 pages

NVIDIA

Free Endpoint

cosmos3-nano

Generates physics-aware videos from text prompts or an image prompt for physical AI development.

autonomous vehicles

1mo

Qwen

Downloadable

qwen-image

Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.

Text-to-Image

2mo

Mistral AI

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

13M

3mo

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

tool calling

10M

3mo

Qwen

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

13M

4mo

Microsoft

Downloadable

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

text-to-3d

10mo

Stability AI

Downloadable

stable-diffusion-3.5-large

Stable Diffusion 3.5 is a popular text-to-image generation model

Text-to-Image

10mo

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

image

10K