⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

19

Partner Endpoint

33

Download Available

26

Use Case

Code Generation

4

Image-to-Text

1

Synthetic Data Generation

1

Retrieval Augmented Generation

0

Drug Discovery

0

Inference Providers

Deep Infra

23

Together AI

22

GMI Cloud

18

Bitdeer AI

10

CoreWeave

8

Publisher

NVIDIA

11

Mistral AI

6

DeepSeek AI

6

Moonshotai

4

Qwen

3

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

Labels (1)

reasoning

45 models

Sort By

Free Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

Today

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

567K

3d

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

1.47M

1w

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

7.15M

4w

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

48.93M

1mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

8.34M

1mo

Downloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

11.5M

1mo

Deprecation in 5dDownloadable

glm-5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

41.67M

2mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

9.4M

2mo

Downloadable

kimi-k2.5

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

51.97M

2mo

Free Endpoint

glm-4.7

GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.

16.27M

2mo

Free Endpoint

nemotron-content-safety-reasoning-4b

A context‑aware safety model that applies reasoning to enforce domain‑specific policies.

NeMo Guardrails

207K

2mo

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

50.01K

3mo

Free Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

14.89M

4mo

Downloadable

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

12.6M

4mo

Free Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

3.62M

4mo

Free Endpoint

kimi-k2-thinking

Open reasoning model with 256K context window, native INT4 quantization and enhanced tool use.

4.09M

4mo

Free Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

12.29M

6mo

Free Endpoint

kimi-k2-instruct-0905

Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.

14.63M

6mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

1.86M

7mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

1.36M

7mo

DeprecatedFree Endpoint

deepseek-v3.1

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

10.99M

7mo

Downloadable

nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budget

285K

7mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

8.45M

8mo

Items per page

of 2 pages