⌘KCtrl+K

Your Privacy Choices

Contact

Explore

Models

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (2)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Code Generation

Retrieval Augmented Generation

Drug Discovery

Image-to-Text

Object Detection

Inference Providers

Fireworks AI

Deep Infra

Together AI

GMI Cloud

CoreWeave

Publisher

Mistral AI

OpenAI

NVIDIA

Qwen

DeepSeek AI

Labels (2)

chat

reasoning

8 models

Sort By

NVIDIA

Downloadable

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

chat

1.68M

DeepSeek AI

Free Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

chat

12.33M

5mo

OpenAI

Downloadable

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

reasoning

8.09M

7mo

OpenAI

Downloadable

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

reasoning

37.64M

7mo

Moonshotai

Free Endpoint

kimi-k2-instruct

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities

coding

17.01M

8mo

Qwen

Free Endpoint

qwq-32b

Powerful reasoning model capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems.

coding

4.3M

8mo

Mistral AI

Downloadable

mixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

chat

4.93M

8mo

Mistral AI

Downloadable

mixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

chat

717K

8mo

Items per page

of 1 pages