⌘KCtrl+K

Your Privacy Choices

Contact

Explore

Models

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Retrieval Augmented Generation

Image Generation

Text-to-Image

Text-to-Embedding

Inference Providers

Deep Infra

Bitdeer AI

Together AI

GMI Cloud

Vultr

Publisher

Mistral AI

NVIDIA

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

B200

Items per page

of 1 pages

497K

Moonshotai

Downloadable

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

Multimodal

5.28M

1mo

NVIDIA

Free Endpoint

nemotron-3-content-safety

Multilingual, multimodal model for detecting unsafe and toxic content.

llm safety

126K

1mo

Mistral AI

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

19.37M

2mo

Qwen

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

tool calling

9.58M

2mo

NVIDIA

Downloadable

llama-nemotron-embed-vl-1b-v2

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retriever

6.36M

3mo

Mistral AI

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

3.26M

6mo

Mistral AI

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

2.75M

6mo

Black-forest-labs

Downloadable

FLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

Text-to-Image

3.37K

9mo

llama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal Safety

141K

11mo

Mistral AI

DeprecatedFree Endpoint

mistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generation

10mo

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

21.26M

10mo

Microsoft

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

353K

NVIDIA

Downloadable

nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer vision

11mo