⌘KCtrl+K

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint

28

Partner Endpoint

17

Download Available

12

Use Case

Code Generation

16

Image-to-Text

7

Retrieval Augmented Generation

0

Drug Discovery

0

Object Detection

0

Inference Providers

Deep Infra

11

Together AI

7

Bitdeer AI

4

CoreWeave

3

GMI Cloud

2

Publisher

Microsoft

9

Mistral AI

8

Meta

6

NVIDIA

4

Qwen

4

API Catalog Type

Enterprise

0

Blueprint Type

NVIDIA BioNemo

0

Labels (1)

chat

40 models

Sort By

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generation

5.56M

4mo

Downloadable

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generation

1.63M

4mo

Downloadable

nemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generation

4.73M

5mo

Free Endpoint

gemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation

429K

9mo

Free Endpoint

gemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation

302K

9mo

Free Endpoint

mistral-nemotron

Built for agentic workflows, this model excels in coding, instruction following, and function calling

language generation

3.97M

10mo

DeprecatedFree Endpoint

mistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generation

1.33M

11mo

Free Endpoint

mistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generation

1.49M

9mo

Free Endpoint

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generation

11.03M

9mo

Free Endpoint

gemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision Assistant

5.98M

11mo

Downloadable

phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

351K

11mo

Free Endpoint

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech Recognition

376K

11mo

DeprecatedDownloadable

qwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

6.98M

11mo

Downloadable

qwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completion

2.95M

9mo

DeprecatedFree Endpoint

qwen2.5-coder-7b-instruct

Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.

code completion

229K

11mo

DeprecatedFree Endpoint

nemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

361K

11mo

Downloadable

llama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

16.14K993K

11mo

Downloadable

llama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

15.82K361K

11mo

DeprecatedFree Endpoint

qwen2-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language Generation

129K

11mo

Free Endpoint

dracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

Code Generation

361K

11mo

Free Endpoint

nemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

197K

1y

DeprecatedFree Endpoint

mistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generation

3.16K

1y

DeprecatedFree Endpoint

phi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

1.19M

1y

DeprecatedFree Endpoint

rakutenai-7b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

44.26K

11mo

Items per page

of 2 pages