Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. DeepSeek-R1 sought to address these issues and further enhance reasoning performance by incorporating cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
Description:
DeepSeek-Distill-Qwen-32B is distilled from DeepSeek-R1 based on Qwen2.5-32B. The reasoning patterns of larger models, DeepSeek-R1 in this case, can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. Using the reasoning data generated by DeepSeek-R1, dense models that are widely used in the research community can be fine-tuned. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.
This model is ready for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the DeepSeek-R1-Distill-Qwen-32B Model Card.
The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products; and the use of this model is governed by the NVIDIA Community Model License. ADDITIONAL INFORMATION: MIT License and Apache 2.0 License.
Model Developer: Deepseek-AI
Model Architecture
Architecture Type: Transformer
Network Architecture: Qwen
Version: 2.5
Input
Input Type: Text
Input Format: String
Input Parameters: 1D
Other Properties Related to Input:
DeepSeek recommends adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:
Additionally, the DeepSeek-R1 series models tend to bypass thinking patterns (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance. To ensure that the model engages in thorough reasoning, DeepSeek recommends enforcing the model to initiate its response with "<think>\n" at the beginning of every output.
Output
Output Type: Text
Output Format: String
Output Parameters: 1D
Runtime Engine: TensorRT-LLM
Supported Hardware Microarchitecture Compatibility: NVIDIA Hopper, NVIDIA Lovelace
Preferred/Supported Operating System(s): Linux
Data Collection Method by dataset: Automated
Labelling Method by dataset: Automated
Properties: 800k samples curated with DeepSeek-R1
Testing Dataset:
Data Collection Method by dataset: Automated. Reasoning data generated by DeepSeek-R1.
Labelling Method by dataset: Automated
Link: See Evaluation section of the Hugging Face DeepSeek-R1-Distill-Qwen-32B Model Card
Data Collection Method by dataset: Hybrid: Human, Automated
Labeling Method by dataset: Hybrid: Human, Automated
Engine: TensorRT-LLM
Test Hardware: L20, H20
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.