⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

60

Partner Endpoint

63

Download Available

54

Use Case

Code Generation

27

Image-to-Text

11

Text Translation

2

Synthetic Data Generation

1

Digital Twin

1

Inference Providers

Fireworks AI

47

Deep Infra

39

Together AI

38

GMI Cloud

23

Bitdeer AI

18

Publisher

NVIDIA

16

Mistral AI

14

Meta

12

Microsoft

10

Qwen

10

Labels (1)

chat

113 models

Sort By

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

26

Today

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

Today

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

1.68M

5d

Free Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

1.58M

1w

Downloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

4.74M

2w

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

8.52M

4w

Downloadable

glm-5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

10.33M

1mo

Free Endpoint

minimax-m2.1

MiniMax M2.1 excels in multi-language coding, app/web dev, office AI, and agent integration

7.67M

1mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

7.75M

1mo

Downloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

20.17M

1mo

Free Endpoint

glm-4.7

GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.

15.79M

1mo

Free Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

15.35M

3mo

Downloadable

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

11.55M

3mo

Free Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

5.91M

3mo

Free Endpoint

kimi-k2-thinking

Open reasoning model with 256K context window, native INT4 quantization and enhanced tool use.

2.88M

3mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

6.48M

3mo

Downloadable

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

4.68M

3mo

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

1.09M

4mo

Free Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

12.33M

5mo

Downloadable

stockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

4.5M

5mo

Downloadable

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

11.85M

5mo

Free Endpoint

kimi-k2-instruct-0905

Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.

9.06M

5mo

Free Endpoint

bielik-11b-v2.6-instruct

State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

582K

5mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

4.31M

6mo

Items per page

of 5 pages