⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

26

Partner Endpoint

48

Download Available

38

Use Case

Image-to-Text

11

Code Generation

10

Synthetic Data Generation

1

Digital Twin

1

Retrieval Augmented Generation

0

Inference Providers

Deep Infra

34

Together AI

28

Bitdeer AI

19

GMI Cloud

18

CoreWeave

10

Publisher

NVIDIA

13

Mistral AI

10

Meta

8

Qwen

6

Google

5

GPU Types

A100 SXM4 80GB

0

B200

0

GB200

0

GH200 144G HBM3e

0

H100 80GB HBM3

0

Labels (1)

Chat

64 models

Sort By

Downloadable

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

Today

Items per page

of 3 pages

Downloadable

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

361K

4d

Downloadable

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

781K

4d

Downloadable

glm-5.1

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

2.53M

1w

Free Endpoint

glm-4.7

GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.

4.57M

1w

Downloadable

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

93.01K

2w

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

4.15M

2w

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

3.46M

3w

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

8.24M

1mo

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

3.33K

1mo

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

43.63M

1mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

7.65M

1mo

Deprecation in 15dDownloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

9.85M

2mo

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

9.6M

2mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

9.07M

2mo

Deprecation in 3dDownloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

35.21M

3mo

Deprecation in 7dFree Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

9.51M

4mo

Downloadable

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

9.61M

4mo

Deprecation in 14dFree Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

2.81M

4mo

Free Endpoint

kimi-k2-thinking

Open reasoning model with 256K context window, native INT4 quantization and enhanced tool use.

3.37M

4mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

4.15M

4mo

Downloadable

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

1.6M

4mo

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

4.63M

6mo

Deprecation in 7dFree Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

7.29M

6mo