Explore

Models

Skills

Blueprints

GPUs

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint6

Partner Endpoint6

Download Available6

Use Case

Drug Discovery0Retrieval Augmented Generation0Image-to-Text0Speech-to-Text0Image Generation0

Inference Providers

OpenRouter6Deepinfra5GMI Cloud4Bitdeer4Together AI3

Publisher

NVIDIA3DeepSeek AI2Qwen1Meta0Google0

NIM Container GPUs

H100 80GB HBM31B2001H2001A100 SXM4 80GB0L40S0

Labels (1)

Moe

6 models

Sort By

NVIDIA

DownloadableFree Endpoint

nemotron-3-ultra-550b-a55b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

Agent

MoE Frontier Reasoning Long Context

Items per page

of 1 pages

52M API calls in the last 30 days

Last updated on June 4, 2026

DeepSeek AI

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

MoE

coding fast agentic

15M API calls in the last 30 days

Last updated on April 24, 2026

DeepSeek AI

DownloadableFree Endpoint

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

Moe

reasoning coding agentic

8M API calls in the last 30 days

Last updated on April 24, 2026

NVIDIA

DownloadableFree Endpoint

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

MoE

Reasoning Chat Long Context Instruction Following

60M API calls in the last 30 days

Last updated on March 11, 2026

Qwen

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

image-to-image VLM agentic

16M API calls in the last 30 days

Last updated on February 16, 2026

NVIDIA

DownloadableFree Endpoint

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

MoE

Reasoning Long Context Instruction Following

12M API calls in the last 30 days

Last updated on December 15, 2025