---
title: "eurollm-9b-instruct"
publisher: "utter-project"
type: "endpoint"
updated: "2025-06-17T13:21:52.252Z"
description: "State-of-the-art, multilingual model tailored to all 24 official European Union languages."
canonical: "https://build.nvidia.com/utter-project/eurollm-9b-instruct"
---

# EuroLLM-9B-Instruct Overview

## Description:
The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
EuroLLM-9B-Instruct is a 9.154 billion parameter multilingual transformer language model developed to understand and generate text across 35 languages, including all 24 official European Union languages and 11 additional languages. It is instruction-tuned on the EuroBlocks dataset, focusing on general instruction-following and machine translation tasks.  

This model is ready for commercial and non-commercial use.

## Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see the [EuroLLM-9B-Instruct Model Card](https://huggingface.co/utter-project/EuroLLM-9B-Instruct).

### License and Terms of Use:
GOVERNING TERMS: The trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf); and the use of this model is governed by the [NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/). ADDITIONAL INFORMATION: [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md).

### Deployment Geography:
Global

### Use Case:
Designed for multilingual applications such as machine translation, conversational AI, and general-purpose instruction-following tasks across diverse languages.

### Release Date:
- Hugging Face: December 2024 via [link](https://huggingface.co/utter-project/EuroLLM-9B-Instruct)
- Build.NVIDIA.com: 05/14/2025 via [link](https://build.nvidia.com/utter-project/eurollm-9b-instruct)

## Reference(s):
- [arXiv:2202.03799](https://arxiv.org/abs/2202.03799)  
- [arXiv:2402.17733](https://arxiv.org/abs/2402.17733)  
- [arXiv:2506.04079](https://arxiv.org/abs/2506.04079)

## Model Architecture:
- **Architecture Type:** Transformer  
- **Network Architecture:** Dense Transformer with Grouped Query Attention (GQA)  
- **Base Model:** EuroLLM-9B  
- **Model Parameters:** 9.154 billion

## Input:
- **Input Type(s):** Text  
- **Input Format(s):** String  
- **Input Parameters:** 1D
- **Other Properties Related to Input:** Maximum sequence length of 4,096 tokens; tokenized using a custom tokenizer designed for multilingual support.

## Output:
- **Output Type(s):** Text  
- **Output Format:** String  
- **Output Parameters:** 1D
- **Other Properties Related to Output:** NA

## Software Integration: <br>
## Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
- NVIDIA Lovelace
- NVIDIA Pascal

## Operating System(s):
- Linux

## Model Version(s):
- EuroLLM-9B-Instruct v1.0

# Training, Testing, and Evaluation Datasets:

## Training Dataset:
- **Data Collection Method by dataset:** Hybrid: Automated, Human, Synthetic 
- **Labeling Method by dataset:** Hybrid: Automated, Human 
- **Properties:** Trained on 4 trillion tokens across 35 languages.  

## Testing Dataset: 
- **Data Collection Method by dataset:** Hybrid: Automated, Human, Synthetic   
- **Labeling Method by dataset:** Hybrid: Automated, Human  
- **Properties:** Undisclosed

## Evaluation Dataset:
- **Benchmark Score:** EuroLLM-9B-Instruct demonstrates competitive performance on multilingual benchmarks, surpassing many European-developed models and matching the performance of models like Mistral-7B.  
- **Data Collection Method by dataset:** Undisclosed

## Inference:
- **Engine:** TensorRT-LLM  
- **Test Hardware:** NVIDIA Lovelace L40S

## Additional Details:
For pre-training, we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 2,800 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
Here is a summary of the model hyper-parameters:
|                                      |                      |
|--------------------------------------|----------------------|
| Sequence Length                      |      4,096           |
| Number of Layers                     |         42           |
| Embedding Size                       |           4,096      |
| FFN Hidden Size                      |            12,288    |
| Number of Heads                      |        32            |
| Number of KV Heads (GQA)             |         8            |
| Activation Function                  | SwiGLU               |
| Position Encodings                   | RoPE (\Theta=10,000) |
| Layer Norm                           | RMSNorm              |
| Tied Embeddings                      | No                   |
| Embedding Parameters                 | 0.524B               |
| LM Head Parameters                   | 0.524B               |
| Non-embedding Parameters             | 8.105B               |
| Total Parameters                     | 9.154B               |

### Run the model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "utter-project/EuroLLM-9B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [
{
"role": "system",
"content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
},
{
"role": "user", "content": "What is the capital of Portugal? How would you describe it?"
},
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### Results

#### EU Languages

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/ob_1sLM8c7dxuwpv6AAHA.png)
**Table 1:** Comparison of open-weight LLMs on multilingual benchmarks. The borda count corresponds to the average ranking of the models (see  ([Colombo et al., 2022](https://arxiv.org/abs/2202.03799))). For Arc-challenge, Hellaswag, and MMLU we are using Okapi datasets ([Lai et al., 2023](https://aclanthology.org/2023.emnlp-demo.28/)) which include 11 languages. For MMLU-Pro and MUSR we translate the English version with Tower ([Alves et al., 2024](https://arxiv.org/abs/2402.17733)) to 6 EU languages.  
\* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.

The results in Table 1 highlight EuroLLM-9B's superior performance on multilingual tasks compared to other European-developed models (as shown by the Borda count of 1.0), as well as its strong competitiveness with non-European models, achieving results comparable to Gemma-2-9B and outperforming the rest on most benchmarks.

#### English

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/EfilsW_p-JA13mV2ilPkm.png)

**Table 2:** Comparison of open-weight LLMs on English general benchmarks.    
\*  As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.

The results in Table 2 demonstrate EuroLLM's strong performance on English tasks, surpassing most European-developed models and matching the performance of Mistral-7B (obtaining the same Borda count).

### Bias, Risks, and Limitations

This model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Developers should implement appropriate safety measures and conduct thorough evaluations before deploying the model in production environments.

## Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.  

Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Prototype

```python
from openai import OpenAI

client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = "$NVIDIA_API_KEY"
)

completion = client.chat.completions.create(
model="",
messages=[{"role":"user","content":""}],
temperature=,
top_p=,
max_tokens=,
stream=NaN
)

print(completion.choices[0].message)
```

```python
from langchain_nvidia_ai_endpoints import ChatNVIDIA

client = ChatNVIDIA(
model="",
api_key="$NVIDIA_API_KEY", 
temperature=,
top_p=,
max_tokens=,
)

response = client.invoke([{"role":"user","content":""}])
print(response.content)
```

```javascript
import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: '$NVIDIA_API_KEY',
baseURL: 'https://integrate.api.nvidia.com/v1',
})

async function main() {
const completion = await openai.chat.completions.create({
model: "",
messages: [{"role":"user","content":""}],
temperature: ,
top_p: ,
max_tokens: ,
stream: 
})

process.stdout.write(completion.choices[0]?.message?.content);

}

main();
```

```bash
invoke_url='https://integrate.api.nvidia.com/v1/chat/completions'

authorization_header='Authorization: Bearer '
accept_header='Accept: application/json'
content_type_header='Content-Type: application/json'

data=$'{
"messages": [
{
"role": "user",
"content": ""
}
]
}'

response=$(curl --silent -i -w "\n%{http_code}" --request POST \
--url "$invoke_url" \
--header "$authorization_header" \
--header "$accept_header" \
--header "$content_type_header" \
--data "$data"
)

echo "$response"
```