⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

6

Partner Endpoint

21

Download Available

25

Use Case

Code Generation

9

Image-to-Text

2

Drug Discovery

0

Retrieval Augmented Generation

0

Speech-to-Text

0

Inference Providers

Deep Infra

15

Together AI

14

GMI Cloud

11

CoreWeave

7

Bitdeer AI

6

Publisher

Meta

7

Mistral AI

5

NVIDIA

4

Qwen

4

Google

2

NIM Container GPUs

A100 SXM4 80GB

0

B200

0

GB200

0

GH200 144G HBM3e

0

H100 80GB HBM3

0

Labels (1)

text

31 models

Sort By

Downloadable

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

2.06M

3w

Items per page

of 2 pages

Free Endpoint

minimax-m2.7

MiniMax M2.7 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

12.55M

1mo

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

6.25M

1mo

Downloadable

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

code generation

20.83M

2mo

Downloadable

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

11.31M

2mo

DeprecatedDownloadable

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

5.29M

2mo

Downloadable

nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extraction

274K

6mo

Downloadable

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

text-generation

21.94M

8mo

Deprecation in 2dDownloadable

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

2.29M

8mo

Free Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

1.24M

8mo

Downloadable

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

4.53K

8mo

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

18.43M

9mo

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

35.6M

9mo

Downloadable

parakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

24.12K

9mo

Downloadable

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

555K

12mo

DeprecatedDownloadable

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

1.72M

10mo

Downloadable

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Instruction following

13.43M

11mo

Downloadable

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

22.4K1.19M

12mo

Downloadable

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.41M

11mo

Downloadable

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Image-Text Retrieval

1.81M

11mo

Downloadable

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

37.9K372K

12mo

Free Endpoint

dracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Code Generation

600K

12mo

Free Endpoint

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

1.51M

1y

Free Endpoint

gemma-2-2b-it

Advanced small language generative AI model for edge applications

513K

12mo