Skip to main content

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint41

Partner Endpoint27

Download Available32

Use Case

Code Generation6Image-to-Text5Drug Discovery0Retrieval Augmented Generation0Speech-to-Text0

Inference Providers

Deepinfra21OpenRouter21Together AI16GMI Cloud12Bitdeer8

Publisher

NVIDIA14Meta7Mistral AI3Google2OpenAI2

NIM Container GPUs

A100 SXM4 80GB0H100 80GB HBM30L40S0A10G0B2000

Labels (1)

Chat

41 models

Sort By

Free Endpoint

ising-calibration-1.5-31b

NVIDIA-Ising-Calibration-1.5 is a dense multimodal vision-language model built on Gemma 4 31B. It analyzes quantum computing calibration experiment plots and generates structured technical text.

Quantum Computing

Calibration
NVIDIA NIM
Vision Language Model

Last updated on July 23, 2026

Items per page

of 2 pages

Thinkingmachines

DownloadableFree Endpoint

inkling

Inkling is a multimodal (text + image) reasoning model from Thinking Machines — a Mamba-hybrid, 256-expert Mixture-of-Experts architecture with tool use and switchable reasoning.

text-to-text

reasoning
image-to-text
multimodal

Last updated on July 16, 2026

Free Endpoint

laguna-xs-2.1

Efficient 33B MoE for local, long-horizon agentic coding and terminal tasks

Agentic AI

Coding
Reasoning
Tool Use

Last updated on July 15, 2026

DownloadableFree Endpoint

glm-5.2

GLM-5.2 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

Agentic AI

Coding
Reasoning
Tool Use

8M API calls in the last 30 days

Last updated on July 3, 2026

Free Endpoint

minimax-m3

MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.

coding

text-to-text
reasoning

10M API calls in the last 30 days

Last updated on June 12, 2026

DownloadableFree Endpoint

diffusiongemma-26b-a4b-it

Diffusion-based 26B parameter LLM enabling parallel token generation for real-time text apps

diffusion-llm

text-to-text
reasoning

4M API calls in the last 30 days

Last updated on June 10, 2026

DownloadableFree Endpoint

nemotron-3-ultra-550b-a55b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

Agent

MoE
Frontier
Reasoning
Long Context

52M API calls in the last 30 days

Last updated on June 4, 2026

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

Coding

Vision
Agents

7M API calls in the last 30 days

Last updated on May 29, 2026

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

Multimodal

Mixture-of-Experts
Reasoning
Image-to-Text

16M API calls in the last 30 days

Last updated on May 1, 2026

DownloadableFree Endpoint

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

coding

reasoning
text
agentic

5M API calls in the last 30 days

Last updated on April 29, 2026

DownloadableFree Endpoint

nemotron-3-nano-omni-30b-a3b-reasoning

Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.

Image-to-Text

VLM
Video
Omni
OCR

8M API calls in the last 30 days

Last updated on April 28, 2026

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

coding

MoE
fast
agentic

17M API calls in the last 30 days

Last updated on April 24, 2026

DownloadableFree Endpoint

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

Moe

reasoning
coding
agentic

7M API calls in the last 30 days

Last updated on April 24, 2026

DownloadableFree Endpoint

ising-calibration-1-35b-a3b

Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.

Quantum

reasoning
Vision Language Model
calibration

442K API calls in the last 30 days

Last updated on April 14, 2026

DownloadableFree Endpoint

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

reasoning

coding
text-to-text
agentic

6M API calls in the last 30 days

Last updated on April 2, 2026

Free Endpoint

nemotron-voicechat

Nemotron 3 Voicechat

English

voice chat
NVIDIA NIM

1K API calls in the last 30 days

Last updated on March 16, 2026

DownloadableFree Endpoint

nemotron-3-super-120b-a12b

Open, efficient hybrid Mamba-Transformer MoE with 1M context, excelling in agentic reasoning, coding, planning, tool calling, and more

MoE

Reasoning
Chat
Long Context
Instruction Following

65M API calls in the last 30 days

Last updated on March 11, 2026

DeprecatedFree Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

Agentic

Coding
Reasoning

10M API calls in the last 30 days

Last updated on February 2, 2026

DownloadableFree Endpoint

nemotron-3-nano-30b-a3b

Open, efficient MoE model with 1M context, excelling in coding, reasoning, instruction following, tool calling, and more

MoE

Reasoning
Long Context
Instruction Following

12M API calls in the last 30 days

Last updated on December 15, 2025

DownloadableFree Endpoint

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

vision assistant
visual question answering
Image-to-Text

5M API calls in the last 30 days

Last updated on October 28, 2025

DownloadableFree Endpoint

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

text-generation

agentic

25M API calls in the last 30 days

Last updated on September 22, 2025

DeprecatedFree Endpoint

seed-oss-36b-instruct

ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.

thinking budget

reasoning
text-generation

1M API calls in the last 30 days

Last updated on September 5, 2025

DownloadableFree Endpoint

nvidia-nemotron-nano-9b-v2

High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.

thinking budget

reasoning

2M API calls in the last 30 days

Last updated on August 18, 2025

DownloadableFree Endpoint

gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

reasoning

text-to-text
chat
math

19M API calls in the last 30 days

Last updated on August 5, 2025