80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.
Qwen3-Next-80B-A3B-Thinking is a part of the Qwen3-Next series that features the following key enchancements:
For more details, please refer to the Qwen3-Next blog post.
This model is ready for commercial/non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA model card here: Qwen3-Next-80B-A3B-Thinking.
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. ADDITIONAL INFORMATION: Apache 2.0 License.
Global
Qwen3-Next-80B-A3B-Thinking excels in tool calling capabilities and highly complex reasoning tasks.
build.nvidia.com: September 11, 2025 via Qwen3-Next-80B-A3B-Thinking
Hugging Face: September 11, 2025 via Qwen3-Next-80B-A3B-Thinking
References:
Architecture Type: Other (Hybrid Transformer-Mamba)
Network Architecture: Qwen3-Next
Total Parameters: 80B
Active Parameters: 3.9B
Vocabulary Size: 151,936
Input Types: Text
Input Formats: String
Input Parameters: One Dimensional (1D)
Other Input Properties: Qwen3-Next natively supports context lengths of up to 262,144 tokens
Qwen3-Next-80B-A3B-Thinking supports only thinking mode. To enforce model thinking, the default chat template automatically includes
<think>. Therefore, it is normal for the model's output to contain only</think>without an explicit opening<think>tag.
Output Types: Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Output Properties: Qwen3-Next natively supports context lengths of up to 262,144 tokens
Qwen3-Next-80B-A3B-Thinking may generate thinking content longer than its predecessor. Alibaba strongly recommends its use in highly complex reasoning tasks. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engines:
Supported Hardware:
Operating Systems: Linux
Qwen3-Next-80B-A3B-Thinking v1.0 (September 11, 2025)
Training Data Collection: Undisclosed
Training Labeling: Undisclosed
Training Properties: Undisclosed
Testing Data Collection: Undisclosed
Testing Labeling: Undisclosed
Testing Properties: Undisclosed
Evaluation Data Collection: Undisclosed
Evaluation Labeling: Undisclosed
Evaluation Properties: Undisclosed
Evaluation Benchmarks:
| Benchmark | Qwen3-30B-A3B-Thinking-2507 | Qwen3-32B Thinking | Qwen3-235B-A22B-Thinking-2507 | Gemini-2.5-Flash Thinking | Qwen3-Next-80B-A3B-Thinking |
|---|---|---|---|---|---|
| Knowledge | |||||
| MMLU-Pro | 80.9 | 79.1 | 84.4 | 81.9 | 82.7 |
| MMLU-Redux | 91.4 | 90.9 | 93.8 | 92.1 | 92.5 |
| GPQA | 73.4 | 68.4 | 81.1 | 82.8 | 77.2 |
| SuperGPQA | 56.8 | 54.1 | 64.9 | 57.8 | 60.8 |
| Reasoning | |||||
| AIME25 | 85.0 | 72.9 | 92.3 | 72.0 | 87.8 |
| HMMT25 | 71.4 | 51.5 | 83.9 | 64.2 | 73.9 |
| LiveBench 241125 | 76.8 | 74.9 | 78.4 | 74.3 | 76.6 |
| Coding | |||||
| LiveCodeBench v6 (25.02-25.05) | 66.0 | 60.6 | 74.1 | 61.2 | 68.7 |
| CFEval | 2044 | 1986 | 2134 | 1995 | 2071 |
| OJBench | 25.1 | 24.1 | 32.5 | 23.5 | 29.7 |
| Alignment | |||||
| IFEval | 88.9 | 85.0 | 87.8 | 89.8 | 88.9 |
| Arena-Hard v2* | 56.0 | 48.4 | 79.7 | 56.7 | 62.3 |
| WritingBench | 85.0 | 79.0 | 88.3 | 83.9 | 84.6 |
| Agent | |||||
| BFCL-v3 | 72.4 | 70.3 | 71.9 | 68.6 | 72.0 |
| TAU1-Retail | 67.8 | 52.8 | 67.8 | 65.2 | 69.6 |
| TAU1-Airline | 48.0 | 29.0 | 46.0 | 54.0 | 49.0 |
| TAU2-Retail | 58.8 | 49.7 | 71.9 | 66.7 | 67.8 |
| TAU2-Airline | 58.0 | 45.5 | 58.0 | 52.0 | 60.5 |
| TAU2-Telecom | 26.3 | 27.2 | 45.6 | 31.6 | 43.9 |
| Multilingualism | |||||
| MultiIF | 76.4 | 73.0 | 80.6 | 74.4 | 77.8 |
| MMLU-ProX | 76.4 | 74.6 | 81.0 | 80.2 | 78.7 |
| INCLUDE | 74.4 | 73.7 | 81.0 | 83.9 | 78.9 |
| PolyMATH | 52.6 | 47.4 | 60.1 | 49.8 | 56.3 |
*For reproducibility, Alibaba reports the win rates evaluated by GPT-4.1.
Acceleration Engine: SGLang
Test Hardware: NVIDIA H100
Qwen3-Next-80B-A3B-Thinking has the following features:
NVIDIA believes Trustworthy Al is a shared responsibility and we have established policies and practices to enable development for a wide array of Al applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA Al Concerns here.