Skip to main content

⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

51

Partner Endpoint

32

Download Available

37

Use Case

Image-to-Text

9

Code Generation

9

Synthetic Data Generation

1

Digital Twin

1

Drug Discovery

0

Inference Providers

Deep Infra

24

Bitdeer AI

13

GMI Cloud

13

Together AI

12

CoreWeave

8

Publisher

NVIDIA

14

Meta

8

Mistral AI

6

Qwen

4

Google

4

NIM Container GPUs

B200

19

H200

19

H100 80GB HBM3

17

L40S

14

H100 NVL

12

Labels (1)

chat

52 models

Sort By

DownloadableFree Endpoint

nemotron-3-ultra-550b-a55b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

Today

Items per page

of 3 pages

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

1.65M

6d

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

5.92M

1mo

DownloadableFree Endpoint

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

3.14M

1mo

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

8.91M

1mo

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

13.22M

1mo

Downloadable

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

8.11M

1mo

DownloadableFree Endpoint

glm-5.1

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

25.15M

1mo

DownloadableFree Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

328K

1mo

DownloadableFree Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

13.49M

1mo

DownloadableFree Endpoint

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

5.64M

2mo

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

18.38M

2mo

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

2.13K

2mo

DownloadableFree Endpoint

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

57.73M

2mo

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

9.5M

3mo

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

11.16M

3mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

11.53M

4mo

DownloadableFree Endpoint

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

11.29M

5mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

3.33M

6mo

DownloadableFree Endpoint

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

3.42M

6mo

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

2.06M

7mo

DownloadableFree Endpoint

stockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

1.52M

8mo

DownloadableFree Endpoint

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

22.67M

8mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

1.17M

9mo