Explore

Models

Skills

Blueprints

GPUs

Docs

Your Privacy Choices

Contact

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Retrieval Augmented Generation

Image Generation

Text-to-Image

Text-to-Embedding

Inference Providers

Deepinfra

OpenRouter

Bitdeer

Together AI

GMI Cloud

Publisher

NVIDIA

Mistral AI

minimax-m3

MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.

coding

Items per page

of 1 pages

10M

1mo

NVIDIA

DownloadableFree Endpoint

nemotron-3.5-content-safety

Multilingual, multimodal model for detecting unsafe and toxic content.

llm safety

1mo

Stepfun-ai

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

Coding

1mo

NVIDIA

Deprecation in 3dFree Endpoint

nemotron-3-content-safety

Multilingual, multimodal model for detecting unsafe and toxic content.

llm safety

295K

2mo

Mistral AI

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

13M

3mo

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

tool calling

10M

4mo

NVIDIA

Downloadable

llama-nemotron-embed-vl-1b-v2

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retriever

5mo

Mistral AI

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

7mo

Mistral AI

DownloadableFree Endpoint

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

7mo

Black-forest-labs

Downloadable

FLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

Text-to-Image

11mo

llama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal Safety

222K

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

20M

Microsoft

Deprecation in 3dFree Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

173K