⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Sort By

Qwen

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

tool calling

Today

Minimaxai

minimax-m2.5

MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.

coding

1.8M

DeepSeek AI

deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

long context

14.58M

2mo

Qwen

qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

Reasoning

3.05M

5mo

DeepSeek AI

deepseek-v3.1

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

Reasoning

14.12M

6mo

OpenAI

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-text

6.72M

7mo

OpenAI

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

text-to-text

34.08M

7mo

Opengpt-x

teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign ai

401K

7mo

Utter-project

eurollm-9b-instruct

State-of-the-art, multilingual model tailored to all 24 official European Union languages.

Sovereign AI

5.52K402K

8mo

Gotocompany

gemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AI

402K

8mo

Google

gemma-3-1b-it

A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications

Translation

4.93K439K

9mo

Microsoft

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat

1.91M

9mo

Qwen

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

1.41M

9mo

llama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

Reasoning

23.33M

8mo

NVIDIA

nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

Indic

407K

9mo

IBM

granite-guardian-3.0-8b

Detects jailbreaking, bias, violence, profanity, sexual content, and unethical behavior

Guardrail

368K

NVIDIA

llama-3.1-nemotron-70b-reward

Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

Text-to-text

362K

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chat

421K

NVIDIA

mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation

4.88K

Items per page

of 3 pages