Novel recurrent architecture based language model for faster inference when generating long sequences.
Authors: Google
RecurrentGemma is a family of open language models built on a novel recurrent architecture developed at Google. Both pre-trained and instruction-tuned versions are available in English.
Like Gemma, RecurrentGemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Because of its novel architecture, RecurrentGemma requires less memory than Gemma and achieves faster inference when generating long sequences. This model is ready for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the Non-NVIDIA Model Card.
@article{recurrentgemma_2024, title={RecurrentGemma}, url={}, DOI={}, publisher={Kaggle}, author={Griffin Team, Soham De, Samuel L Smith, Anushan Fernando, Alex Botev, George-Christian Muraru, Ruba Haroun, Leonard Berrada et al.}, year={2024} }
Architecture Type: Transformer Decoder Network
Network Architecture: Real-Gated Linear Recurrent Unit (RG-LRU)
Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Output: Text can be question, a prompt, or a document to be
summarized.
Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Output: Generated English-language text in response to the input (e.g.,
an answer to the question, a summary of the document).
Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.
These models have certain limitations that users should be aware of:
RecurrentGemma uses the same training data and data processing as used by the Gemma model family. A full description can be found on the Gemma model card.
Like Gemma, RecurrentGemma was trained on TPUv5e, using JAX and ML Pathways.
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation:
Benchmark | Metric | RecurrentGemma 2B |
---|---|---|
MMLU | 5-shot, top-1 | 38.4 |
HellaSwag | 0-shot | 71.0 |
PIQA | 0-shot | 78.5 |
SocialIQA | 0-shot | 51.8 |
BoolQ | 0-shot | 71.3 |
WinoGrande | partial score | 67.8 |
CommonsenseQA | 7-shot | 63.7 |
OpenBookQA | 47.2 | |
ARC-e | 72.9 | |
ARC-c | 42.3 | |
TriviaQA | 5-shot | 52.5 |
Natural Questions | 5-shot | 11.5 |
HumanEval | pass@1 | 21.3 |
MBPP | 3-shot | 28.8 |
GSM8K | maj@1 | 13.4 |
MATH | 4-shot | 11.0 |
AGIEval | 23.8 | |
BIG-Bench | 35.3 | |
Average | 44.6 |
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:
The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well known safety benchmarks like BBQ, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here.
Benchmark | Metric | RecurrentGemma 2B | RecurrentGemma 2B IT |
---|---|---|---|
RealToxicity | avg | 9.8 | 7.6 |
BOLD | 39.3 | 52.4 | |
CrowS-Pairs | top-1 | 41.1 | 43.4 |
BBQ Ambig | top-1 | 62.6 | 71.1 |
BBQ Disambig | top-1 | 58.4 | 50.8 |
Winogender | top-1 | 55.1 | 54.7 |
TruthfulQA | 35.1 | 42.7 | |
Winobias 1_2 | 58.4 | 56.4 | |
Winobias 2_2 | 90.0 | 75.4 | |
Toxigen | 56.7 | 50.0 |
The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Risks Identified and Mitigations:
At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models.
Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives.
In particular, RecurrentGemma models achieve comparable performance to Gemma models but are faster during inference and require less memory, especially on long sequences.