---
title: "breeze-7b-instruct"
publisher: "mediatek"
type: "endpoint"
updated: "2025-05-22T19:14:37.861Z"
description: "LLM for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese."
canonical: "https://build.nvidia.com/mediatek/breeze-7b-instruct"
---

# Model Overview

## Description

Breeze-7B-Instruct derives from the base model Breeze-7B-Base, making the resulting model amenable to be used as-is for commonly seen tasks.
The current release version of Breeze-7B is v1.0, which has undergone a more refined training process compared to Breeze-7B-v0_1, resulting in significantly improved performance in both English and Traditional Chinese.

## Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the [Breeze Model card](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0).

## License and Terms of use
<b>GOVERNING TERMS</b>: Your use of this API is governed by the <a href="https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf" rel="noreferrer" target="_blank">NVIDIA API Trial Service Terms of Use</a>; and the use of this model is governed by the <a href="https://docs.nvidia.com/ai-foundation-models-community-license.pdf" rel="noreferrer" target="_blank">NVIDIA AI Foundation Models Community License</a>.

**Model Developer:** MediaTek Research<br> 
**Model Release Date:** March 5, 2024.

## Features
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese 
- 8k-token context length
- Multi-turn dialogue (without special handling for harmfulness)

## Benchmark Performance
The comparison of Breeze-7B-Instruct-v1_0 with other open-source instruction-tuned language models of similar parameter size, known for their good performance in Chinese, is presented here. 

| Models                                                                                             | #Parameters | ↑ MT-Bench-tw (Score)| TMMLU+ (ACC) | Table (ACC) | MT-Bench (Score) | MMLU (ACC)  | 
|---------------------------------------------------------------------------------------------------------|--------|--------------------|--------------|-------------|------------------|-------------|
|                                                                                                         |        |TC, Chat            |TC, Knowledge |TC, Reasoning|EN, Chat          |EN, Knowledge|
|                                                                                                         |        |0 shot              | 0 shot       | 0 shot      |0 shot            |  0 shot     | 
| [GPT-3.5-Turbo](https://openai.com)                                                                     |        |7.1                 | 43.56        | 45.14       |7.9               |  67.09      |    
| [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)                                          | 7B     |6.4                 | 45.65        | 34.72       |7.6               |  61.85      |    
| [**Breeze-7B-Instruct-v1_0**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)         | 7B     |6.0                 | 42.67        | 39.58       |7.4               |  61.73     |    
| [Mistral-7B-v0.2-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)                   | 7B     |5.6                 | 34.95        | 33.33       |7.6               |    59.97    |                                                  
| [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)                                                   | 6B     |5.0                 | 44.79        | 25.69       |6.0               |    59.45    |    
| [Taiwan-LLM-13B-v2.0-chat](https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat)                  | 13B    |5.0                 | 29.47        | 23.61       |N/A*                |    50.50    |     
| [Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat)                    | 7B     |4.2                 | 28.08        | 31.25       |N/A*               |    42.72    |    

\* Taiwan-LLM models respond to multi-turn questions (English) in Traditional Chinese.    

| Details on MT-Bench-tw (0 shot):<br/>Models         | STEM    |Extraction|Reasoning| Math   | Coding  | Roleplay| Writing |Humanities|       AVG    | 
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|----------|  ---------   | 
| GPT-3.5-Turbo                                       |  7.8    |  6.1    |   5.1   |   6.4   |  6.2    |   8.7   |   7.4   |   9.3    |        7.1   |
| Qwen1.5-7B-Chat                                     |  9      |  5.6    |   4.7   |   2.8   |  3.7    |   8.0   |   8.0   |   9.4    |        6.4   |
| **Breeze-7B-Instruct-v1_0**                         |  7.8    |  5.2    |   4.2   |   4.2   |  4.1    |   7.6   |   5.9   |   9.1    |        6.0   |
| Mistral-7B-v0.2-Instruct                            |  6.9    |  4.6    |   4.3   |   3.3   |  4.4    |   7.2   |   6.2   |   7.8    |        5.6   |                                          
| Yi-6B-Chat                                          |  7.3    |  2.7    |   3.1   |   3.3   |  2.3    |   7.2   |   5.2   |   8.8    |        5.0   |
| Taiwan-LLM-13B-v2.0-chat                            |  6.1    |  3.4    |   4.1   |   2.3   |  3.1    |   7.4   |   6.6   |   6.8    |        5.0   |
| Taiwan-LLM-7B-v2.1-chat                             |  5.2    |  2.6    |   2.3   |   1.2   |  3.4    |   6.6   |   5.7   |   6.8    |        4.2   |

| Details on TMMLU+ (0 shot):<br/>Model               | STEM         | Social Science | Humanities | Other      |   AVG   |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
| GPT-3.5-Turbo                                       | 41.58        | 48.52          | 40.96      | 43.18      | 43.56   |
| Qwen1.5-7B-Chat                                     | 41.48        | 51.66          | 44.05      | 45.40      | 45.65   |
| **Breeze-7B-Instruct-v1_0**                         | 36.46        | 48.38          | 45.11      | 40.75      | 42.67   |
| Mistral-7B-v0.2-Instruct                            | 32.79        | 38.05          | 34.89      | 34.04      | 34.94   |
| Yi-6B-Chat                                          | 37.80        | 51.74          | 45.36      | 44.25      | 44.79   |
| Taiwan-LLM-13B-v2.0-chat                            | 27.74        | 33.69          | 27.03      | 29.43      | 29.47   |
| Taiwan-LLM-7B-v2.1-chat                             | 25.58        | 31.76          | 27.36      | 27.61      | 28.08   |

**Model Architecture** 
* Architecture Type: Causal decoder-only transformer language model <br> 
* Network Architecture: Mistral7b

**Input** 
* Input Type: Text
* Input Format: String
* Input Parameters: max_tokens, temperature, top_p, stop, frequency_penalty, presence_penalty, seed

**Output** 
* Output Type: Text
* Output Format: String

## Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Software Integration:
* Supported Hardware Platform(s): Lovelace <br>

**[Preferred/Supported] Operating System(s):** 
* Linux <br>

## Model Version

Breeze-7B-Instruct-v1_0

## Inference

**Engine:** Triton + TensorRT-LLM <br>
**Test Hardware:** L40 <br>

## Prototype

```python
from openai import OpenAI

client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = "$NVIDIA_API_KEY"
)

completion = client.chat.completions.create(
model="",
messages=[{"role":"user","content":""}],
temperature=,
top_p=,
max_tokens=,
stream=NaN
)

print(completion.choices[0].message)
```

```python
from langchain_nvidia_ai_endpoints import ChatNVIDIA

client = ChatNVIDIA(
model="",
api_key="$NVIDIA_API_KEY", 
temperature=,
top_p=,
max_tokens=,
)

response = client.invoke([{"role":"user","content":""}])
print(response.content)
```

```javascript
import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: '$NVIDIA_API_KEY',
baseURL: 'https://integrate.api.nvidia.com/v1',
})

async function main() {
const completion = await openai.chat.completions.create({
model: "",
messages: [{"role":"user","content":""}],
temperature: ,
top_p: ,
max_tokens: ,
stream: ,
})

process.stdout.write(completion.choices[0]?.message?.content);

}

main();
```

```bash
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "mediatek/breeze-7b-instruct",
"messages": [{"role":"user","content":""}],
"temperature": ,   
"top_p": ,
"max_tokens": ,
"stream":                 
}'
```