Skip to main content

⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

7

Partner Endpoint

18

Download Available

21

Use Case

Code Generation

3

Synthetic Data Generation

2

Image-to-Text

1

Drug Discovery

0

Retrieval Augmented Generation

0

Inference Providers

Deep Infra

14

GMI Cloud

10

Bitdeer AI

8

Together AI

7

Lightning AI

6

Publisher

NVIDIA

10

Mistral AI

5

Qwen

2

OpenAI

2

Minimaxai

2

NIM Container GPUs

B200

5

H100 80GB HBM3

4

H200

4

DGX Spark

2

L40S

1

Labels (1)

reasoning

29 models

Sort By

DownloadableFree Endpoint

cosmos3-nano-reasoner

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

Today

Items per page

of 2 pages

Downloadable

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

5.28M

1mo

Downloadable

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

2.88M

1mo

Downloadable

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

7.96M

1mo

Free Endpoint

glm-5.1

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

22.99M

1mo

Downloadable

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

301K

1mo

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

12.68M

1mo

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

5.71M

2mo

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

19.37M

2mo

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

53.53M

2mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

9.58M

2mo

DeprecatedDownloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

2.89M

3mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

10.97M

3mo

Free Endpoint

nemotron-content-safety-reasoning-4b

A context‑aware safety model that applies reasoning to enforce domain‑specific policies.

NeMo Guardrails

109K

4mo

Downloadable

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

349K

5mo

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

10.75M

5mo

DeprecatedDownloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

1.94M

8mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

1.14M

8mo

Downloadable

nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budget

1.05M

9mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

17.41M

10mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

42.04M

10mo

Downloadable

llama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

2.81M

10mo

Downloadable

sarvam-m

Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.

290K

10mo

DeprecatedFree Endpoint

magistral-small-2506

High performance reasoning model optimized for efficiency and edge deployment

422K

10mo