
MiniMax M2.1 excels in multi-language coding, app/web dev, office AI, and agent integration

1T multimodal MoE for high‑capacity video and image understanding with efficient inference.

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

A general purpose VLM ideal for chat and instruction based use cases

Securely extract, embed, and index multimodal data with encryption in-use for fast, accurate semantic search.

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

Build a custom enterprise research assistant powered by state-of-the-art models that process and synthesize multimodal data, enabling reasoning, planning, and refinement to generate comprehensive reports.

A multimodal, multilingual 16 MoE model with 17B parameters.

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.

Multi-modal model to classify safety for input prompts as well output responses.

Multimodal question-answer retrieval representing user queries as text and documents as images.


Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

Cutting-edge open multimodal model exceling in high-quality reasoning from images.