speakleash
bielik-11b-v2.3-instruct
State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

speakleash
bielik-11b-v2.3-instruct
PREVIEWState-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.
Bielik-11B-v2.3-Instruct Overview
Description:
Bielik-11B-v2.3-Instruct is a generative text model featuring 11 billion parameters, designed to process and understand the Polish language with high precision. It is a linear merge of the Bielik-11B-v2.0-Instruct, Bielik-11B-v2.1-Instruct, and Bielik-11B-v2.2-Instruct models, which are instruct fine-tuned versions of the Bielik-11B-v2. As a result, the model exhibits an exceptional ability to understand and process the Polish language, providing accurate responses and performing a variety of linguistic tasks with high precision. This model is ready for commercial/non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Bielik-11B-v2.3-Instruct Model Card.
License/Terms of Use:
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: Apache 2.0; Bielik Terms of Use.
Deployment Geography:
Global (primarily Polish language, with English capabilities).
Use Case:
This model can be utilized for a variety of Polish language processing tasks, such as text generation, question answering, language modeling, and chatbots. Its strong performance on benchmarks suggests it could be a valuable resource for natural language processing projects and applications targeting the Polish market or requiring high-quality Polish language understanding and generation. It can also be used for English language tasks.
Release Date:
- Build.Nvidia.com 05/28/2025 via https://build.nvidia.com/speakleash/bielik-11b-v2_3-instruct
- Huggingface 08/30/2024 via https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct
Reference(s):
- Technical report: https://arxiv.org/abs/2505.02410
- Chat Arena: https://arena.speakleash.org.pl/
- ALLaMo framework: Implemented by Krzysztof Ociepa for training LLaMA and Mistral-like models.
Model Architecture:
- Architecture Type: Causal decoder-only (Transformer-based)
- Network Architecture: Based on Mistral, LLaMA-like architecture.
- This model was developed based on a linear merge of Bielik-11B-v2.0-Instruct, Bielik-11B-v2.1-Instruct, and Bielik-11B-v2.2-Instruct, which are fine-tuned versions of Bielik-11B-v2. Bielik-11B-v2 was initialized from Mistral-7B-v0.2.
- This model has 11 billion model parameters.
Input:
- Input Type(s): Text
- Input Format(s): String
- Input Parameters: One-Dimensional (1D) for text.
- Other Properties Related to Input:
- Context Length: 32,768 tokens natively.
Output:
- Output Type(s): Text
- Output Format: String.
- Output Parameters: One-Dimensional (1D) for text.
- Other Properties Related to Output:
- Languages: Polish, English.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- TensorRT-LLM 0.17.2
Runtime Engine(s):
- NVIDIA Lovelace
- NVIDIA Hopper
[Preferred/Supported] Operating System(s):
- [Linux]
- [Windows]
- [macOS] (especially with GGUF and llama.cpp support)
Model Version(s):
Bielik-11B-v2.3-Instruct Other related versions include GGUF, GPTQ, and FP8 quantized versions.
Training, Testing, and Evaluation Datasets:
Training Dataset:
- The model was fine-tuned on a dataset comprising over 20 million instructions (over 10 billion tokens). The DPO-Positive (DPO-P) fine-tuning used a dataset of over 66,000 examples. The base model Bielik-11B-v2 was trained on 400 billion tokens from Polish text corpora (SpeakLeash project) and English texts (SlimPajama dataset).
- Data Collection Method by dataset:
- Hybrid: Automated, Human, Synthetic
- Labeling Method by dataset:
- Hybrid: Automated, Human, Synthetic
- Properties (Quantity, Dataset Descriptions, Sensor(s)):
- Instruction fine-tuning: >20 million instructions, >10 billion tokens.
- DPO-P fine-tuning: >66,000 examples.
- Pre-training (Bielik-11B-v2): 400 billion tokens. The SpeakLeash dataset for Bielik v3 (related models) comprised 237 billion Polish tokens and was supplemented with English texts from SlimPajama, totaling 292 billion tokens from 303 million documents.
Testing Dataset:
- Data Collection Method by dataset: Undisclosed
- Labeling Method by dataset: Undisclosed
- Properties (Quantity, Dataset Descriptions, Sensor(s)): Undisclosed
Evaluation Dataset:
- Benchmark Score:
- Open PL LLM Leaderboard (5-shot): 65.71 (outperforms models <70B parameters, competitive with 70B models).
- Polish MT-Bench: 8.556250 (outperforms GPT-3.5-turbo).
- Polish EQ-Bench: 70.86.
- MixEval: Competitive score, methodology correlates highly with Chatbot Arena.
- Data Collection Method by dataset:
- Hybrid: Automated, Human, Undisclosed
- Labeling Method by dataset
- Hybrid: Automated, Human, Undisclosed
Inference:
- Acceleration Engine: TRT-LLM
- Test Hardware:
- NVIDIA Ada Lovelace
Additional content
- Quantized Models: Available in GGUF (Q4_K_M, Q5_K_M, Q6_K, Q8_0, and experimental IQ imatrix versions), GPTQ (4bit), and FP8 (for vLLM, SGLang - Ada Lovelace, Hopper optimized). Quantized models may offer lower quality compared to full-sized variants.
- Training Improvements:
- Weighted tokens level loss.
- Adaptive learning rate.
- Masked prompt tokens.
- DPO-Positive (DPO-P) methodology with multi-turn conversation introduction.
- Framework: Trained using an original open-source framework called ALLaMo.
- Contact: SpeakLeash team via Discord or Hugging Face discussion tab.
- Responsible for training the model: SpeakLeash & ACK Cyfronet AGH. Computational grant PLG/2024/016951 on Athena and Helios supercomputers (part of PLGrid environment).
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.