Skip to main content

⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

51

Partner Endpoint

31

Download Available

37

Use Case

Image-to-Text

9

Code Generation

9

Synthetic Data Generation

1

Digital Twin

1

Drug Discovery

0

Inference Providers

Deep Infra

23

Bitdeer AI

14

GMI Cloud

14

Together AI

10

CoreWeave

8

Publisher

NVIDIA

14

Meta

8

Mistral AI

6

Qwen

4

Google

4

NIM Container GPUs

B200

18

H200

18

H100 80GB HBM3

16

L40S

13

H100 NVL

11

Labels (1)

Chat

52 models

Sort By

DownloadableFree Endpoint

nemotron-3-ultra-550b-a55b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

Items per page

of 3 pages

2.33M

3d

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

3.07M

9d

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

6.69M

1mo

DownloadableFree Endpoint

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

3.54M

1mo

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

8.57M

1mo

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

14.6M

1mo

Downloadable

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

8.06M

1mo

DownloadableFree Endpoint

glm-5.1

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

27.1M

1mo

DownloadableFree Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

336K

1mo

DownloadableFree Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

14.24M

1mo

DownloadableFree Endpoint

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

5.93M

2mo

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

17.36M

2mo

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

2.06K

2mo

DownloadableFree Endpoint

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

61.44M

2mo

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

9.95M

3mo

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

12.3M

3mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

11.92M

4mo

DownloadableFree Endpoint

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

11.97M

5mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

3.3M

6mo

DownloadableFree Endpoint

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

3.54M

6mo

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

2.31M

7mo

DownloadableFree Endpoint

stockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

1.61M

8mo

DownloadableFree Endpoint

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

25.32M

8mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

1.23M

9mo