DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.
DeepSeek-V3.1 is a hybrid model that supports both thinking and non-thinking modes. Compared to the previous version, this upgrade brings improvements in multiple aspects:
This model is ready for commercial/non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA Model Card DeepSeek-V3.1 Model Card.
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: MIT License.
Global
Designed to handle general instruction-following tasks, DeepSeek-V3.1 can be integrated into AI assistants across various domains, including business applications.
Supported Languages: Primarily English and Chinese, with multilingual capabilities.
Extended long-context tasks (up to 128K tokens):
Complex reasoning and problem-solving:
Code generation and software development:
Tool-augmented and agent-based applications:
Build.NVIDIA.com: 08/26/2025 (link)
Hugging Face: 08/20/2025 (link)
Input Type(s): Text
Input Formats: String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Chat Template for different modes, Tool descriptions. Context Length: Supports up to 128K tokens
Output Type(s): Text
Output Formats: String
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Special Features: Supports both thinking and non-thinking response modes.
Our Al models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engine(s): SGLang
Supported Hardware Microarchitecture Compatibility:
Preferred/Supported Operating System(s): Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
DeepSeek-V3.1
Please see the Evaluation section of the HuggingFace DeepSeek-V3 Model Card for more information.
| Category | Benchmark (Metric) | DeepSeek V3.1-NonThinking | DeepSeek V3 0324 | DeepSeek V3.1-Thinking | DeepSeek R1 0528 |
|---|---|---|---|---|---|
| General | |||||
| MMLU-Redux (EM) | 91.8 | 90.5 | 93.7 | 93.4 | |
| MMLU-Pro (EM) | 83.7 | 81.2 | 84.8 | 85.0 | |
| GPQA-Diamond (Pass@1) | 74.9 | 68.4 | 80.1 | 81.0 | |
| Humanity's Last Exam (Pass@1) | - | - | 15.9 | 17.7 | |
| Search Agent | |||||
| BrowseComp | - | - | 30.0 | 8.9 | |
| BrowseComp_zh | - | - | 49.2 | 35.7 | |
| Humanity's Last Exam (Python + Search) | - | - | 29.8 | 24.8 | |
| SimpleQA | - | - | 93.4 | 92.3 | |
| Code | |||||
| LiveCodeBench (2408-2505) (Pass@1) | 56.4 | 43.0 | 74.8 | 73.3 | |
| Codeforces-Div1 (Rating) | - | - | 2091 | 1930 | |
| Aider-Polyglot (Acc.) | 68.4 | 55.1 | 76.3 | 71.6 | |
| Code Agent | |||||
| SWE Verified (Agent mode) | 66.0 | 45.4 | - | 44.6 | |
| SWE-bench Multilingual (Agent mode) | 54.5 | 29.3 | - | 30.5 | |
| Terminal-bench (Terminus 1 framework) | 31.3 | 13.3 | - | 5.7 | |
| Math | |||||
| AIME 2024 (Pass@1) | 66.3 | 59.4 | 93.1 | 91.4 | |
| AIME 2025 (Pass@1) | 49.8 | 51.3 | 88.4 | 87.5 | |
| HMMT 2025 (Pass@1) | 33.5 | 29.2 | 84.2 | 79.4 |
Note:
Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Search agent results of R1-0528 are evaluated with a pre-defined workflow.
SWE-bench is evaluated with our internal code agent framework.
HLE is evaluated with the text-only subset.
Acceleration Engine: SGLang
Test Hardware: NVIDIA B200
The model uses different chat templates for its operational modes. It also supports tool calls and agent functionality with specific formatting requirements.
The details of our chat template is described in tokenizer_config.json and assets/chat_template.jinja. Here is a brief description.
Prefix:
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>
With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token </think>.
Context:
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>
Prefix:
<|User|>{query}<|Assistant|></think>
By concatenating the context and the prefix, we obtain the correct prompt for the query.
Prefix:
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|><think>
The prefix of thinking mode is similar to DeepSeek-R1.
Context:
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>
Prefix:
<|User|>{query}<|Assistant|><think>
The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the </think> is retained in every turn of context.
Toolcall is supported in non-thinking mode. The format is:
<|begin▁of▁sentence|>{system prompt}{tool_description}<|User|>{query}<|Assistant|></think> where the tool_description is
## Tools
You have access to the following tools:
### {tool_name1}
Description: {description}
Parameters: {json.dumps(parameters)}
IMPORTANT: ALWAYS adhere to this exact format for tool use:
<|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|>{{additional_tool_calls}}<|tool▁calls▁end|>
Where:
- `tool_call_name` must be an exact match to one of the available tools
- `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
- For multiple tool calls, chain them directly without separators or spaces
We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in assets/code_agent_trajectory.html.
We design a specific format for searching toolcall in thinking mode, to support search agent.
For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.
Please refer to the assets/search_tool_trajectory.html and assets/search_python_tool_trajectory.html for the detailed template.
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
{"role": "user", "content": "1+1=?"}
]
tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
# '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|><think>'
tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
# '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|></think>'
The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally.
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.