llama-nemotron-rerank-1b-v2

Description

llama-nemotron-rerank-1b-v2 is optimized to produce a logit score representing how relevant a document (or passage) is to a given query. It is fine-tuned for multilingual and cross-lingual text question-answering retrieval, with support for long documents (up to 8192 tokens). The model was evaluated across 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.

This model is intended to be used as a component of a retrieval system to improve overall accuracy. A text retrieval system typically uses an embedding model (dense) or lexical search (sparse) index to retrieve candidate passages for a query. A reranking model then reranks those candidates into a final order; because it consumes query–passage pairs, it can use cross-attention between tokens. Ranking models are typically deployed in combination with embedding models rather than applied to an entire corpus.

This model is ready for commercial/non-commercial use.

License and Terms of Use:

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Model Developer: NVIDIA

Deployment Geography:

Global

Use Case:

This model is most suitable for users who want to improve multilingual retrieval tasks by reranking a set of candidates for a given question.

Release Date:

Build.NVIDIA.com 2/27/2026 via llama-nemotron-rerank-1b-v2

References:

NVIDIA NeMo Retriever Documentation

Model Architecture:

Architecture Type: Transformer
Network Architecture: Fine-tuned meta-llama/Llama-3.2-1B
Max Sequence Length: 8192
Number of Model Parameters: 1.0 × 10^9

This reranking model is a transformer encoder fine-tuned for ranking. Ranking models for text retrieval are typically trained as a cross-encoder for sentence classification, predicting the relevancy of a sentence pair (for example, question and chunked passages). Cross-entropy loss is used to maximize the likelihood of passages containing information to answer the question and minimize the likelihood for negative passages that do not.

Input:

Input Type: Pair of texts (query + passage)
Input Format: List of text pairs / JSON payload (query + passages)
Input Parameters: One Dimensional (1D)
Other Input Properties: Evaluated to work successfully with up to a sequence length of 8192 tokens. Longer texts should be chunked or truncated.

Output:

Output Type: Floats (logits / scores)
Output Format: List of floats (scores per passage)
Output Parameters: One Dimensional (1D)
Other Output Properties: Users may apply a sigmoid activation function to logits if desired.

Software Integration:

Runtime Engines: TensorRT
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Hopper
NVIDIA Lovelace
Operating Systems: Linux

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s)

llama-nemotron-rerank-1b-v2

Short Name: llama-nemotron-rerank-1b-v2

Training and Evaluation Datasets:

Training Dataset

Data Modality: Text
Training Data Collection: Hybrid: Automated, Undisclosed
Training Labeling: Automated
Training Properties: Trained on ~800k samples from commercially licensed and publicly available question-answering corpora. NVIDIA used a blend of QA datasets with commercial-use-eligible licenses (avoiding datasets such as MSMARCO that restrict commercial use).

Evaluation Dataset

Evaluation Data Collection: Automated
Evaluation Labeling: Automated
Evaluation Properties: Evaluated as part of a pipeline with an embedding retrieval model. Benchmarks include BEIR/TextQA datasets (NQ, HotpotQA, FiQA), TechQA, MIRACL multilingual retrieval, MLQA cross-lingual retrieval, and MLDR long-document retrieval. Metrics reported are primarily Recall@5 (with model performance reported at the pipeline level where applicable).

Selected results (Recall@5):

Open & Commercial Reranker Models	Average Recall@5 on NQ, HotpotQA, FiQA, TechQA
llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2	73.64%
llama-3.2-nv-embedqa-1b-v2	68.60%
nv-embedqa-e5-v5 + nv-rerankQA-mistral-4b-v3	75.45%
nv-embedqa-e5-v5	62.07%
nv-embedqa-e5-v4	57.65%
e5-large_unsupervised	48.03%
BM25	44.67%

Open & Commercial Retrieval Models	Average Recall@5 on MIRACL multilingual
llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2	65.80%
llama-3.2-nv-embedqa-1b-v2	60.75%
nv-embedqa-mistral-7b-v2	50.42%
BM25	26.51%

Open & Commercial Retrieval Models	Average Recall@5 on MLQA (different languages)
llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2	86.83%
llama-3.2-nv-embedqa-1b-v2	79.86%
nv-embedqa-mistral-7b-v2	68.38%
BM25	13.01%

Open & Commercial Retrieval Models	Average Recall@5 on MLDR
llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2	70.69%
llama-3.2-nv-embedqa-1b-v2	59.55%
nv-embedqa-mistral-7b-v2	43.24%
BM25	71.39%

Inference

Acceleration Engine: TensorRT
Test Hardware: NVIDIA A100 PCIe/SXM, NVIDIA A10G

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case, and address unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.