
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
llama-nemotron-rerank-1b-v2 is optimized to produce a logit score representing how relevant a document (or passage) is to a given query. It is fine-tuned for multilingual and cross-lingual text question-answering retrieval, with support for long documents (up to 8192 tokens). The model was evaluated across 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
This model is intended to be used as a component of a retrieval system to improve overall accuracy. A text retrieval system typically uses an embedding model (dense) or lexical search (sparse) index to retrieve candidate passages for a query. A reranking model then reranks those candidates into a final order; because it consumes query–passage pairs, it can use cross-attention between tokens. Ranking models are typically deployed in combination with embedding models rather than applied to an entire corpus.
This model is ready for commercial/non-commercial use.
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.
Model Developer: NVIDIA
Global
This model is most suitable for users who want to improve multilingual retrieval tasks by reranking a set of candidates for a given question.
Build.NVIDIA.com 2/27/2026 via llama-nemotron-rerank-1b-v2
References:
Architecture Type: Transformer
Network Architecture: Fine-tuned meta-llama/Llama-3.2-1B
Max Sequence Length: 8192
Number of Model Parameters: 1.0 × 10^9
This reranking model is a transformer encoder fine-tuned for ranking. Ranking models for text retrieval are typically trained as a cross-encoder for sentence classification, predicting the relevancy of a sentence pair (for example, question and chunked passages). Cross-entropy loss is used to maximize the likelihood of passages containing information to answer the question and minimize the likelihood for negative passages that do not.
Input Type: Pair of texts (query + passage)
Input Format: List of text pairs / JSON payload (query + passages)
Input Parameters: One Dimensional (1D)
Other Input Properties: Evaluated to work successfully with up to a sequence length of 8192 tokens. Longer texts should be chunked or truncated.
Output Type: Floats (logits / scores)
Output Format: List of floats (scores per passage)
Output Parameters: One Dimensional (1D)
Other Output Properties: Users may apply a sigmoid activation function to logits if desired.
Runtime Engines: TensorRT
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Hopper
NVIDIA Lovelace
Operating Systems: Linux
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
llama-nemotron-rerank-1b-v2
Short Name: llama-nemotron-rerank-1b-v2
Data Modality: Text
Training Data Collection: Hybrid: Automated, Undisclosed
Training Labeling: Automated
Training Properties: Trained on ~800k samples from commercially licensed and publicly available question-answering corpora. NVIDIA used a blend of QA datasets with commercial-use-eligible licenses (avoiding datasets such as MSMARCO that restrict commercial use).
Evaluation Data Collection: Automated
Evaluation Labeling: Automated
Evaluation Properties: Evaluated as part of a pipeline with an embedding retrieval model. Benchmarks include BEIR/TextQA datasets (NQ, HotpotQA, FiQA), TechQA, MIRACL multilingual retrieval, MLQA cross-lingual retrieval, and MLDR long-document retrieval. Metrics reported are primarily Recall@5 (with model performance reported at the pipeline level where applicable).
Selected results (Recall@5):
| Open & Commercial Reranker Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA |
|---|---|
| llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2 | 73.64% |
| llama-3.2-nv-embedqa-1b-v2 | 68.60% |
| nv-embedqa-e5-v5 + nv-rerankQA-mistral-4b-v3 | 75.45% |
| nv-embedqa-e5-v5 | 62.07% |
| nv-embedqa-e5-v4 | 57.65% |
| e5-large_unsupervised | 48.03% |
| BM25 | 44.67% |
| Open & Commercial Retrieval Models | Average Recall@5 on MIRACL multilingual |
|---|---|
| llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2 | 65.80% |
| llama-3.2-nv-embedqa-1b-v2 | 60.75% |
| nv-embedqa-mistral-7b-v2 | 50.42% |
| BM25 | 26.51% |
| Open & Commercial Retrieval Models | Average Recall@5 on MLQA (different languages) |
|---|---|
| llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2 | 86.83% |
| llama-3.2-nv-embedqa-1b-v2 | 79.86% |
| nv-embedqa-mistral-7b-v2 | 68.38% |
| BM25 | 13.01% |
| Open & Commercial Retrieval Models | Average Recall@5 on MLDR |
|---|---|
| llama-3.2-nv-embedqa-1b-v2 + llama-3.2-nv-rerankqa-1b-v2 | 70.69% |
| llama-3.2-nv-embedqa-1b-v2 | 59.55% |
| nv-embedqa-mistral-7b-v2 | 43.24% |
| BM25 | 71.39% |
Acceleration Engine: TensorRT
Test Hardware: NVIDIA A100 PCIe/SXM, NVIDIA A10G
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case, and address unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.