⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

21

Partner Endpoint

24

Download Available

20

Use Case

Code Generation

18

Retrieval Augmented Generation

0

Drug Discovery

0

Image-to-Text

0

Object Detection

0

Inference Providers

Deep Infra

17

Together AI

16

GMI Cloud

11

CoreWeave

7

Bitdeer AI

5

Publisher

Microsoft

8

Meta

6

Qwen

6

Mistral AI

5

NVIDIA

3

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

Labels (1)

chat

41 models

Sort By

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

2.05M

1w

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

2.36M

2w

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

8.92M

1mo

Downloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

11.39M

1mo

Free Endpoint

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

13.7M

4mo

Free Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

3.5M

4mo

Downloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

1.94M

7mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

9.07M

8mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

39.37M

8mo

DeprecatedDownloadable

teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

58.76K

8mo

Downloadable

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

351K

11mo

DeprecatedDownloadable

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

6.98M

11mo

Downloadable

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

2.95M

9mo

DeprecatedFree Endpoint

qwen2.5-coder-7b-instruct

Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.

code completion

229K

11mo

Downloadable

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Instruction following

12.24M

10mo

DeprecatedFree Endpoint

nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

361K

11mo

Downloadable

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

16.14K993K

11mo

Downloadable

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

15.82K361K

11mo

DeprecatedFree Endpoint

qwen2-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

129K

11mo

Free Endpoint

dracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Code Generation

361K

11mo

Free Endpoint

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

197K

1y

DeprecatedFree Endpoint

mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation

3.16K

1y

DeprecatedFree Endpoint

phi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

1.19M

1y

DeprecatedFree Endpoint

rakutenai-7b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

44.26K

11mo

Items per page

of 2 pages