⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters

Free Endpoint

11

Partner Endpoint

17

Download Available

11

Use Case

Image-to-Text

1

Text-to-Speech

1

Code Generation

0

Retrieval Augmented Generation

0

Drug Discovery

0

Inference Providers

Fireworks AI

13

Deep Infra

12

Bitdeer AI

9

GMI Cloud

8

Together AI

7

Publisher

NVIDIA

5

Mistral AI

4

Qwen

4

DeepSeek AI

2

Moonshotai

2

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

22 models

Sort By

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

15.72K

1d

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

33.52M

3w

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

7.81M

4w

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

13.54M

1mo

Downloadable

glm-5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

36.94M

1mo

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

8.99M

2mo

Free Endpoint

glm-4.7

GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.

14.46M

2mo

Free Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

15.8M

3mo

Free Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

4.39M

3mo

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

6.07M

4mo

Free Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

13.17M

5mo

Downloadable

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

21.08M

6mo

Free Endpoint

kimi-k2-instruct-0905

Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.

14.28M

6mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

1.99M

7mo

Free Endpoint

qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

3.59M

7mo

Downloadable

nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

386K

7mo

Free Endpoint

kimi-k2-instruct

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities

20.82M

8mo

Free Endpoint

mistral-nemotron

Built for agentic workflows, this model excels in coding, instruction following, and function calling

799K

9mo

Downloadable

llama-3.1-nemotron-nano-4b-v1.1

State-of-the-art open model for reasoning, code, math, and tool calling - suitable for edge agents

110K

9mo

Downloadable

llama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

371K

9mo

Downloadable

magpie-tts-multilingual

Natural and expressive voices in multiple languages. For voice agents and brand ambassadors.

42.44K

9mo

Downloadable

mistral-small-24b-instruct

Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.

275K

9mo

Items per page

of 1 pages