Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

14 results for

Filters

Free Endpoint

Partner Endpoint

Download Available

Use Case

Image-to-Text

Image Generation

Retrieval Augmented Generation

Text-to-Embedding

Text-to-Image

Inference Providers

Deepinfra

OpenRouter

GMI Cloud

Together AI

Bitdeer

Publisher

Mistral AI

NVIDIA

phi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Model

Speech Recognition

Items per page

of 1 pages

244K

Mistral AI

DownloadableFree Endpoint

ministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

Model

language generation

7mo

Mistral AI

Free Endpoint

mistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

Model

language generation

7mo

llama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

Model

LLM Multimodal Safety

222K

Moonshotai

DownloadableFree Endpoint

kimi-k2.6

1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.

Model

Multimodal

15M

2mo

NVIDIA

Free Endpoint

nemotron-3-content-safety

Multilingual, multimodal model for detecting unsafe and toxic content.

Model

llm safety

230K

2mo

NVIDIA

DownloadableFree Endpoint

nemotron-3.5-content-safety

Multilingual, multimodal model for detecting unsafe and toxic content.

Model

llm safety

29d

Black-forest-labs

Downloadable

FLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

Model

Text-to-Image

10mo

llama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Model

language generation

20M

11mo

NVIDIA

Downloadable

llama-nemotron-embed-vl-1b-v2

Multimodal question-answer retrieval representing user queries as text and documents as images.

Model

nemo retriever

4mo

Mistral AI

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

Model

code generation

13M

3mo

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

Model

tool calling

10M

3mo

Stepfun-ai

DownloadableFree Endpoint

step-3.7-flash

A sparse MoE multimodal reasoning model good for enterprise, agentic and coding tasks.

Model

Coding

1mo

Minimaxai

Free Endpoint

minimax-m3

MiniMax M3 Preview is a multimodal MoE vision-language model with strong reasoning, coding, and tool-calling capabilities.

Model

coding

19d