⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Sort By

Qwen

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

4.66M

Z.ai

glm5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

MoE

5.4M

Minimaxai

minimax-m2.1

MiniMax M2.1 excels in multi-language coding, app/web dev, office AI, and agent integration

Agentic

7.8M

1mo

Stepfun-ai

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

Agentic

6.25M

1mo

Mistral AI

devstral-2-123b-instruct-2512

State-of-the-art open code model with deep reasoning, 256k context, and unmatched efficiency.

coding

4.46M

2mo

Mistral AI

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

4.43M

3mo

DeepSeek AI

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

tool calling

11.35M

4mo

Qwen

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

chat

8.81M

5mo

Moonshotai

kimi-k2-instruct-0905

Follow-on version of Kimi-K2-Instruct with longer context window and enhanced reasoning capabilities.

long-context

10.27M

5mo

Qwen

qwen3-coder-480b-a35b-instruct

Excels in agentic coding and browser use and supports 256K context, delivering top results.

agentic coding

2.89M

6mo

Moonshotai

kimi-k2-instruct

State-of-the-art open mixture-of-experts model with strong reasoning, coding, and agentic capabilities

coding

18.28M

7mo

Items per page

of 1 pages