⌘KCtrl+K

Your Privacy Choices

Contact

Explore

Models

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Code Generation

Retrieval Augmented Generation

Drug Discovery

Image-to-Text

Object Detection

Inference Providers

Together AI

Deep Infra

GMI Cloud

CoreWeave

Digital Ocean

Publisher

NVIDIA

Mistral AI

OpenAI

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

reasoning

Items per page

of 1 pages

9.05M

8mo

OpenAI

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

reasoning

36.85M

8mo

NVIDIA

Downloadable

llama-3.3-nemotron-super-49b-v1.5

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

math

2.75M

9mo

Sarvamai

Downloadable

sarvam-m

Multilingual, hybrid-reasoning model optimized for Indian language tasks, programming, mathematical reasoning capabilities.

coding

155K

9mo

Microsoft

DeprecatedFree Endpoint

phi-4-mini-flash-reasoning

Lightweight reasoning model for applications in latency bound, memory/compute constrained environments

edge

139K

9mo

Mistral AI

Free Endpoint

magistral-small-2506

High performance reasoning model optimized for efficiency and edge deployment

coding

1.27M

9mo

Marin

DeprecatedFree Endpoint

marin-8b-instruct

State-of-the-art open model trained on open datasets, excelling in reasoning, math, and science.

Reasoning

135K

11mo

NVIDIA

Deprecation in 1dDownloadable

llama-3.1-nemotron-ultra-253b-v1

Superior inference efficiency with highest accuracy for scientific and complex math reasoning, coding, tool calling, and instruction following.

math

6.64M

9mo

Qwen

DeprecatedFree Endpoint

qwq-32b

Powerful reasoning model capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems.

coding

791K

10mo

NVIDIA

Downloadable

llama-3.3-nemotron-super-49b-v1

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

math

1.57M

9mo

NVIDIA

Downloadable

llama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

math

863K

9mo

Mistral AI

DeprecatedDownloadable

mistral-small-24b-instruct

Latency-optimized language model excelling in code, math, general knowledge, and instruction-following.

code

171K

9mo

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Instruction following

11.07M

10mo