⌘KCtrl+K

Your Privacy Choices

Contact

Explore

Models

⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Retrieval Augmented Generation

Drug Discovery

Code Generation

Speech-to-Text

Inference Providers

Deep Infra

Bitdeer AI

GMI Cloud

Together AI

Lightning AI

Publisher

Qwen

DeepSeek AI

Mistral AI

Moonshotai

Google

GPU Types

A100 SXM4 80GB

B200

GB200

GH200 144G HBM3e

H100 80GB HBM3

Labels (1)

agentic

13 models

Sort By

DeepSeek AI

Downloadable

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

coding

Items per page

of 1 pages

361K

DeepSeek AI

Downloadable

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

coding

781K

Z.ai

Downloadable

glm-5.1

GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

Agentic AI

2.53M

Google

Downloadable

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

coding

3.46M

Qwen

Downloadable

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

9.6M

2mo

Stepfun-ai

Free Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

Agentic

9.07M

2mo

Mistral AI

Deprecation in 14dFree Endpoint

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

coding

2.81M

4mo

Mistral AI

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

4.15M

4mo

DeepSeek AI

Deprecation in 7dFree Endpoint

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

tool calling

7.29M

6mo

Qwen

Downloadable

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

text-generation

18.75M

7mo

Moonshotai

Free Endpoint

kimi-k2-instruct-0905

Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.

long-context

9.75M

7mo

Qwen

Free Endpoint

qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

agentic coding

3.27M

7mo

Moonshotai

Free Endpoint

kimi-k2-instruct

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities

coding

15.32M

9mo