⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

6

Partner Endpoint

18

Download Available

16

Use Case

Code Generation

9

Retrieval Augmented Generation

0

Drug Discovery

0

Image-to-Text

0

Speech-to-Text

0

Inference Providers

Deep Infra

12

Together AI

11

GMI Cloud

10

CoreWeave

7

Bitdeer AI

4

Publisher

Meta

5

Mistral AI

4

Qwen

3

Google

2

OpenAI

2

GPU Types

A100 SXM4 80GB

0

B200

0

GB200

0

GH200 144G HBM3e

0

H100 80GB HBM3

0

Labels (1)

chat

22 models

Sort By

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

Items per page

of 1 pages

8.36M

3w

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

5.36M

1mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

9.24M

2mo

Deprecation in 5dDownloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

7.55M

2mo

Deprecation in 4dFree Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

2.76M

4mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

1.93M

7mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

11.98M

9mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

28.78M

9mo

Downloadable

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

576K

11mo

Deprecation in 5dDownloadable

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

2.79M

10mo

Downloadable

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Instruction following

9.92M

10mo

Downloadable

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

17.29K1.02M

11mo

Downloadable

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

26.83K466K

11mo

Free Endpoint

dracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Code Generation

504K

11mo

Free Endpoint

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

648K

1y

Free Endpoint

gemma-2-2b-it

Advanced small language generative AI model for edge applications

524K

11mo

Downloadable

llama-3.1-70b-instruct

Powers complex conversations with superior contextual understanding, reasoning and text generation.

2.58M

10mo

Downloadable

llama-3.1-8b-instruct

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

18.41M

10mo

Downloadable

mistral-7b-instruct-v0.3

This LLM follows instructions, completes requests, and generates creative text.

483K

11mo

Free Endpoint

solar-10.7b-instruct

Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.

Non-Commercial Use Only

303K

1y

Downloadable

mixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced Reasoning

2.2M

9mo

Downloadable

mixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced Reasoning

576K

9mo