
MiniMax M2.5 is a 230B-parameter text-to-text AI model excelling in coding, reasoning, and office tasks.
MiniMax-M2.5 is a text generation model trained to perform complex agentic tasks, including software engineering, tool use, search, and office-work style workflows. It is extensively trained with reinforcement learning in hundreds of thousands of complex real-world environments, M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks, boasting scores of 80.2% in SWE-Bench Verified, 51.3% in Multi-SWE-Bench, and 76.3% in BrowseComp (with context management). Trained to reason efficiently and decompose tasks optimally, M2.5 exhibits tremendous speed in performing complicated agentic tasks, completing the SWE-Bench Verified evaluation 37% faster than M2.1, matching the speed of Claude Opus 4.6.
MiniMax-M2.5 was developed by MiniMaxAI.
This model is ready for commercial/non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2.5 Model Card.
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service; and use of this model is governed by the NVIDIA Open Model License. ADDITIONAL INFORMATION: Modified MIT License. MiniMax M2.5.
Global
Enterprises and developers building AI agents, chatbots, and tool-using applications across coding, office work, and information-retrieval tasks. The model is suited for NLP workloads that require advanced reasoning, long-context handling, and agentic tool use, including:
HuggingFace 02/12/2026 via MiniMaxAI/MiniMax-M2.5
Build.NVIDIA.com 02/26/2026 via link
NGC 02/26/2026 via MiniMax-M2.5 on NGC
Architecture Type: Transformer
Network Architecture: Mixture of Experts (MoE) with Lightning Attention, 8 experts per token (MiniMaxM2ForCausalLM)
Total Parameters: Undisclosed
Active Parameters: Undisclosed
Vocabulary Size: Undisclosed
Base Model: MiniMax M2-series (e.g., MiniMax-M2.1)
Input Types: Text
Input Formats: String
Input Parameters: One Dimensional (1D)
Other Input Properties: Context length up to 204,800 tokens.
Output Types: Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Output Properties: Autoregressive text generation (may include tool-calling structured outputs depending on serving stack).
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engines:
Supported Hardware:
Operating Systems: Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
MiniMaxAI/MiniMax-M2.5
Data Modality: Text
Text Training Data Size: Undisclosed
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): The model is described as trained across 10+ programming languages and 200,000+ real-world environments, with extensive reinforcement learning over complex environments.
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): Undisclosed
Benchmark Score: SWE-Bench Verified (80.2%), Multi-SWE-Bench (51.3%), BrowseComp (76.3%)
Data Collection Method by dataset: Automated
Labeling Method by dataset: Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): Evaluated on a mix of coding, tool-use, web-browsing, and multi-step reasoning benchmarks such as SWE-Bench, Terminal Bench 2, VIBE-Pro, BrowseComp, Wide Search, RISE, GDPval-MM, MEWC, Finance Modeling, as well as standard academic benchmarks (AIME25, GPQA-D, HLE w/o tools, SciCode, IFBench, AA-LCR).
| Benchmark | MiniMax-M2.5 |
|---|---|
| AIME25 | 86.3 |
| GPQA-D | 85.2 |
| HLE w/o tools | 19.4 |
| SciCode | 44.4 |
| IFBench | 70.0 |
| AA-LCR | 69.5 |
| Benchmark | Description |
|---|---|
| SWE-bench Verified | Coding agent benchmark |
| SWE-bench Multilingual | Multilingual coding benchmark |
| SWE-bench-pro | Professional coding benchmark |
| Multi-SWE-bench | Combined coding benchmark |
| Terminal Bench 2 | Terminal tool-use benchmark |
| VIBE-Pro | Visual-interactive benchmark |
| BrowseComp | Web-browsing benchmark |
| Wide Search | Search benchmark |
| RISE | Multi-step information-retrieval benchmark |
| GDPval-MM | Multi-modal evaluation benchmark |
| MEWC | Excel-world-championship benchmark |
| Finance Modeling | Financial modeling benchmark |
Acceleration Engine: SGLang
Test Hardware:
The model can be integrated via multiple runtimes: Transformers (loading from Hugging Face with trust_remote_code=True), vLLM, SGLang, KTransformers, and other supported engines.
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.