
Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.
Teuken-7B-instruct-commercial-v0.4 generates text as an instruction-tuned 7B-parameter multilingual large language model (LLM) pre-trained on 4 trillion tokens across all 24 official European languages. This model is specifically designed to provide more stable and culturally relevant results across these languages compared to models primarily focused on English.
This model is ready for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. See link to Non-NVIDIA openGPT-X/Teuken-7B-instruct-commercial-v0.4
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. The model is governed by the NVIDIA Community Model License Agreement; ADDITIONAL INFORMATION: Apache License Version 2.0.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.
Global
Teuken-7B-instruct-commercial-v0.4 is designed for a range of multilingual natural language processing tasks, with a strong emphasis on serving both commercial and research needs across the 24 official languages of the European Union. Its primary use case is to provide a powerful and culturally-aware language model that performs reliably across European languages, which are often underrepresented in other large language models.
Specific use cases include:
Limitations
The developers explicitly state that this model is not intended for tasks that require mathematical reasoning or code generation. Its core strength lies in its linguistic capabilities across its trained languages.
Hugging Face 10/25/2024 via
openGPT-X/Teuken-7B-instruct-commercial-v0.4.
Build.NVIDIA.com 07-25-2025 via link
Architecture Type: Transformer
Network Architecture: Teuken-7B-Instruct
This model was developed based on openGPT-X/Teuken-7B-base openGPT-X/Teuken-7B-instruct-commercial-v0.4 · Hugging Face The model is an instruction-tuned version of its base model, fine-tuned on a high-quality, multilingual dataset using the axolotl framework. Key design choices for the training process focused on optimizing for multilingual performance.
Input Type(s): Text
Input Format(s): Strings
Input Parameters: One-Dimensional (1D)
Input Range: [0, 1] (float32) or [0, 255] (uint8, auto-converted)
Other Properties Related to Input: Max Input Tokens: 4,096
Output Type(s)
Output Format(s): Strings
Output parameters: One-Dimensional (1D)
Other Properties Related to Output: Max Input Tokens: 4,096
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engine: vLLM, TensorRT
Linux
Teuken-7B-instruct-commercial-v0.4
The model requires a few libraries that can be installed in your Python environment:
python -m pip install numpy torch huggingface_hub transformers sentencepieceAfter installation, here's an example of how to use the model:
As this model is a fine-tuned model, it must be used with the provided prompt template. Using the model without the prompt template is not intended and is not recommended. The prompt template is defined as follows:
user="Hi!" lang_code = "DE" system_messages={ "EN": "A chat between a human and an artificial intelligence assistant." " The assistant gives helpful and polite answers to the human's questions.", "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz." " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.", } prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:"The prompt template is also directly integrated in the Tokenizer and can be used as follows:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model_name = "openGPT-X/Teuken-7B-instruct-commercial-v0.4" model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16, ) model = model.to(device).eval() tokenizer = AutoTokenizer.from_pretrained( model_name, use_fast=False, trust_remote_code=True, ) messages = [{"role": "User", "content": "Wer bist du?"}] prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt") prediction = model.generate( prompt_ids.to(model.device), max_length=512, do_sample=True, top_k=50, top_p=0.95, temperature=0.7, num_return_sequences=1, ) prediction_text = tokenizer.decode(prediction[0].tolist()) print(prediction_text)This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
Starting the vLLM Server:
vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-codeUse Chat API with vLLM and pass the language of the Chat-Template as extra body:
from openai import OpenAI client = OpenAI( api_key="EMPTY", base_url="http://localhost:8000/v1", ) completion = client.chat.completions.create( model="openGPT-X/Teuken-7B-instruct-commercial-v0.4", messages=[{"role": "User", "content": "Hallo"}], extra_body={"chat_template":"DE"} ) print(f"Assistant: {completion]")The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name lang and the content DE and start the vLLM Server as follows:
vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code --chat-template langLink: Undisclosed
Data Collection Method by dataset: Hybrid: Human, Automated
Labeling Method by dataset: Human
Properties: Undisclosed
Results on multilingual benchmarks for 21 European languages with instruction-tuned models:
| Model | Avg. | EU21-ARC | EU21-HeSw | EU21-TQA | EU21-MMLU |
|---|---|---|---|---|---|
| Meta-Llama-3.1-8B-Instruct | 0.563 | 0.563 | 0.579 | 0.532 | 0.576 |
| Mistral-7B-Instruct-v0.3 | 0.527 | 0.530 | 0.538 | 0.548 | 0.491 |
| Salamandra-7B-Instruct | 0.543 | 0.595 | 0.637 | 0.482 | 0.459 |
| Aya-23-8B | 0.485 | 0.475 | 0.535 | 0.476 | 0.455 |
| Occiglot-7B-eu5-Instruct | 0.475 | 0.484 | 0.519 | 0.471 | 0.428 |
| Pharia-1-LLM-7B-C-A | 0.417 | 0.396 | 0.438 | 0.469 | 0.366 |
| Bloomz-7B1 | 0.358 | 0.316 | 0.354 | 0.461 | 0.302 |
| Teuken-7B-instruct-commercial-v0.4 | 0.531 | 0.569 | 0.620 | 0.503 | 0.430 |
Acceleration Engine: vLLM
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.