⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

7

Partner Endpoint

16

Download Available

13

Use Case

Image-to-Text

4

Code Generation

2

Drug Discovery

0

Retrieval Augmented Generation

0

Object Detection

0

Inference Providers

Fireworks AI

13

Deep Infra

11

GMI Cloud

9

Together AI

8

Bitdeer AI

8

Publisher

Qwen

5

Mistral AI

4

NVIDIA

2

Meta

2

OpenAI

2

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

19 models

Sort By

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

847K

1w

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

12.57M

1w

Free Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

3.51M

2w

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

12.25M

1mo

Downloadable

glm-5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

19.68M

1mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

8.85M

1mo

Downloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

29.65M

1mo

Downloadable

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

13.77M

3mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

7.73M

3mo

Downloadable

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

14.91M

6mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

4.72M

6mo

Free Endpoint

qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

3.79M

6mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

9.21M

7mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

43.53M

7mo

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

4.91M

8mo

DownloadableFree Endpoint

llama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generation

25.06K

8mo

Free Endpoint

jamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

582K

10mo

Downloadable

mixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

5.31M

8mo

Downloadable

mixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

748K

8mo

Items per page

of 1 pages