⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Sort By

Qwen

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

tool calling

32.38K

Qwen

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

MoE

5.42M

Z.ai

glm5

GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.

MoE

6.37M

Z.ai

glm4.7

GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.

Tool Calling

17.69M

1mo

DeepSeek AI

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

long context

14.82M

2mo

Mistral AI

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

5.24M

3mo

Mistral AI

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

3.82M

3mo

NVIDIA

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

1.59M

4mo

DeepSeek AI

deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

tool calling

12.1M

5mo

ByteDance

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

2.8M

6mo

NVIDIA

nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budget

650K

6mo

OpenAI

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-text

7.06M

7mo

OpenAI

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

text-to-text

34.11M

7mo

Opengpt-x

teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign ai

426K

7mo

NVIDIA

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligence

6.69M

8mo

Gotocompany

gemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AI

426K

8mo

Google

gemma-3-1b-it

A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications

Translation

4.72K463K

9mo

Microsoft

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat

2.12M

9mo

Igenius

colosseum_355b_instruct_16k

NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry

Heavy industry

85.81K

9mo

Tiiuae

falcon3-7b-instruct

Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities

Coding

489K

9mo

Igenius

italia_10b_instruct_16k

Multilingual LLM with emphasis on European languages supporting regulated use cases including financial services, government, heavy industry

Heavy industry

423K

9mo

Qwen

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

861K

9mo

Qwen

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

4.75M

8mo

NVIDIA

usdcode

State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.

OpenUSD

326K

8mo

Items per page

of 2 pages