⌘KCtrl+K

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Sort By

OpenAI

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-text

7mo

OpenAI

gpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

text-to-text

7mo

Opengpt-x

teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign ai

7mo

Utter-project

eurollm-9b-instruct

State-of-the-art, multilingual model tailored to all 24 official European Union languages.

Sovereign AI

8mo

Gotocompany

gemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AI

8mo

Google

gemma-3-1b-it

A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications

Translation

9mo

Microsoft

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat

9mo

Qwen

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

9mo

NVIDIA

nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

Indic

9mo

Meta

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

chat

9mo

Meta

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

chat

9mo

Qwen

qwen2-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

9mo

AI21 Labs

jamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chat

9mo

NVIDIA

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chat

NVIDIA

mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation

Microsoft

phi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat

Rakuten

rakutenai-7b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

chat

9mo

Rakuten

rakutenai-7b-chat

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

chat

9mo

Google

gemma-2-2b-it

Advanced small language generative AI model for edge applications

chat

9mo

THUDM

chatglm3-6b

Supports Chinese and English languages to handle tasks including chatbot, content generation, coding, and translation.

Text Translation

7mo

Baichuan AI

baichuan2-13b-chat

Support Chinese and English chat, coding, math, instruction following, solving quizzes

Chinese Language Generation

9mo

Meta

llama-3.1-70b-instruct

Powers complex conversations with superior contextual understanding, reasoning and text generation.

chat

8mo

Meta

llama-3.1-8b-instruct

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

chat

7mo

Microsoft

phi-3-medium-128k-instruct

Cutting-edge lightweight open language model exceling in high-quality reasoning.

chat

9mo

Items per page

of 2 pages