⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

21

Partner Endpoint

23

Download Available

20

Use Case

Code Generation

18

Retrieval Augmented Generation

0

Drug Discovery

0

Image-to-Text

0

Object Detection

0

Inference Providers

Deep Infra

17

Together AI

16

GMI Cloud

11

CoreWeave

7

Bitdeer AI

4

Publisher

Microsoft

8

Meta

6

Qwen

6

Mistral AI

5

NVIDIA

3

Labels (1)

chat

41 models

Sort By

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

Items per page

of 2 pages

3.05M

1w

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

2.96M

2w

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

9.06M

1mo

Downloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

11.14M

1mo

Free Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

12.37M

4mo

Free Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

3.36M

4mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

1.95M

7mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

9.05M

8mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

36.85M

8mo

DeprecatedDownloadable

teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

50.98K

9mo

Downloadable

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

374K

11mo

DeprecatedDownloadable

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

5.17M

11mo

Downloadable

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

2.91M

9mo

DeprecatedFree Endpoint

qwen2.5-coder-7b-instruct

Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.

code completion

185K

11mo

Downloadable

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Instruction following

11.07M

10mo

DeprecatedFree Endpoint

nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

334K

11mo

Downloadable

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

12.93K923K

11mo

Downloadable

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

17.08K383K

11mo

DeprecatedFree Endpoint

qwen2-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

115K

11mo

Free Endpoint

dracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Code Generation

378K

11mo

Free Endpoint

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

214K

1y

DeprecatedFree Endpoint

mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation

2.71K

1y

DeprecatedFree Endpoint

phi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

830K

1y

DeprecatedFree Endpoint

rakutenai-7b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

37.83K

11mo