---
title: "nv-rerankqa-mistral-4b-v3"
publisher: "nvidia"
type: "endpoint"
updated: "2025-03-21T13:13:39.852Z"
description: "Multilingual text reranking model."
canonical: "https://build.nvidia.com/nvidia/nv-rerankqa-mistral-4b-v3"
---

## Model Overview<a id="model-overview"></a>

### Description<a id="description"></a>

The NVIDIA Retrieval QA Mistral 4B Reranking Model is a model optimized for providing a logit score that represents how relevant a document(s) is to a given query.

The ranking model is a component in a text retrieval system to improve the overall accuracy. A text retrieval system often uses an embedding model (dense) or lexical search (sparse) index to return relevant text passages given the input. A ranking model can be used to rerank the potential candidates into a final order. Ranking model has the question-passage pairs as an input and therefore, can process cross attention between the words. It would not be feasible to apply a Ranking model on all documents in the knowledge base, therefore, ranking models are often deployed in combination with embedding models.

This model is ready for commercial use.

NVIDIA Retrieval QA Mistral 4B Reranking Model is part of the NVIDIA NeMo Retriever, which provides state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for their domain-specific use cases, such as Information Technology, Human Resource help assistants, and Research & Development research assistants.

### License/Terms of use<a id="terms-of-use"></a>

The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license) and the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/).

### Intended use<a id="intended-use"></a>

The NVIDIA Retrieval QA Ranking model is most suitable for users who want to improve their retrieval systems by reranking a set of candidates for a given question.

### Model Architecture: Mistral-4B Ranker<a id="model-architecture"></a>

**Architecture Type:** Transformer
**Network Architecture:** Fine-tuned Mistral 7B foundation model

The NVIDIA Retrieval QA Ranking Model is a transformer encoder - a LoRA finetuned version of [Mistral-7B-v0.1 LLM](https://huggingface.co/mistralai/Mistral-7B-v0.1) that uses only the first 16 layers (resulting in a 4B parameters model) for higher throughput. We employ bi-directional attention when finetuning for higher accuracy. The last embedding output by the decoder model is used with a mean pooling strategy, and a binary classification head is fine-tuned for the ranking task.

Ranking models for text ranking are typically trained as a cross-encoder for sentence classification. This involves predicting relevancy of a sentence pair (for example, question and chunked passages). The CrossEntropy loss is used to maximize the likelihood for passages containing information to answer the question and minimize the likelihood for (negative) passages which do not contain information to answer the question.

The model was trained on public datasets described in the Dataset and Training section.

### Model Version(s)<a id="model-versions"></a>

NVIDIA Retrieval QA Text Reranking Mistral 4B v3

Short name: NV-RerankQA-Mistral-4B-v3

## Training Dataset & Evaluation<a id="training-dataset--evaluation"></a>

### Training Dataset<a id="training-dataset"></a>

The development of large-scale public open-QA datasets has enabled tremendous progress in powerful embedding models. However, one popular dataset named [MSMARCO](https://microsoft.github.io/msmarco/) restricts ‌commercial licensing, limiting the use of these models in commercial settings. To address this, we created our own training dataset blend based on public QA datasets, which each have a license for commercial applications.

The training dataset details are as follows:

**Use Case**: Information retrieval for question and answering over text documents.

**Data Sources**: Public datasets licensed for commercial use.

**Language**: English (US)\
**Volume**: 300k samples from public dataset

**Data Collection Method by dataset**: Unknown\
**Labeling Method by dataset**: Unknown

### Evaluation Results<a id="evaluation-results"></a>

We evaluated the NVIDIA Retrieval QA Ranking Models in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - [NQ](https://huggingface.co/datasets/BeIR/nq), [HotpotQA](https://huggingface.co/datasets/hotpot_qa) and [FiQA(Finance Q\&A)](https://huggingface.co/datasets/BeIR/fiqa) from BEIR benchmark and [TechQA](https://huggingface.co/datasets/PrimeQA/TechQA/tree/main) dataset. In this benchmark, the metric used was Recall@5. As described, we need to apply the ranking model on the output of an embedding model.

|                                              |                                                             |
| :------------------------------------------: | :---------------------------------------------------------: |
|    **Open & Commercial Retrieval Models**    | **Average Recall\@5 on NQ, HotpotQA, FiQA, TechQA dataset** |
| NV-EmbedQA-E5-v5 + NV-RerankQA-Mistral-4B-v3 |                           75.45%                            |
|               NV-EmbedQA-E5-v5               |                           62.07%                            |
|               NV-EmbedQA-E5-v4               |                           57.65%                            |
|            E5-large-unsupervised             |                           48.03%                            |
|                     BM25                     |                           44.67%                            |

**Data Collection Method by dataset**: Unknown

**Labeling Method by dataset**: Unknown

**Properties**:
The evaluation datasets are based on three MTEB/BEIR TextQA datasets, and the TechQA dataset, which are all public datasets. The size ranges between 10,000s up to 5M depending on the dataset.

## Technical Details<a id="technical-details"></a>

### Input<a id="input"></a>

**Input Type:** Pair of Texts\
**Input Format:** List of text pairs\
**Other Properties Related to Input:** The model's maximum context length is 512 tokens. Texts longer than maximum length must either be chunked or truncated.

### Output<a id="output"></a>

**Output Type:** Floats\
**Output Format:** List of float arrays\
**Other Properties Related to Output:** Each the probability score (or raw logits) The user can decide if a Sigmoid activation function is applied to the logits.

### Software Integration<a id="software-integration"></a>

**Runtime:** NeMo Retriever Text Embedding NIM\
**Supported Hardware Microarchitecture Compatibility:** NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace\
**Supported Operating System(s):** Linux\
**Engine:** [TensorRT](https://developer.nvidia.com/tensorrt-getting-started)\
**Test Hardware:** See Support Matrix from [NIM documentation](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html).

We evaluated the models optimized for different hardware on a small sample dataset of 600 queries.

## Ethical Considerations<a id="ethical-considerations"></a>

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ tab for the Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Bias

| Field              | Response |
|:-------------------|:---------|
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing    | None   |
| Measures taken to mitigate against unwanted bias                                                      | None   |

## Explainability

| Field              | Response |
|:-------------------|:---------|
|Intended Application & Domain: | Passage ranking for question and answer retrieval.    |
|Model Type:                    | Transformer encoder                                            |
|Intended User:                 | Generative AI creators building conversational AI models.  |
|Output:                        | List of Floats (Score/Logit indicating if a passage relevant to  a question) |
|Describe how the model works:  | Model provides a score about the likelihood the passage contains the information to answer the question.  |
|Performance Metrics:           | Throughput and Latency                                         |
|Potential Known Risks:         | This model does not always guarantee to retrieve the correct passage(s) for a given query. |
|Licensing:                     | [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license) and the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/).|
|Technical Limitations:         | The model was trained with input length up to 512 tokens, therefore, it may perform poorly on specialized longer text inputs. |

## Privacy

| Field              | Response |
|:-------------------|:---------|
|Generatable or reverse engineerable personally-identifiable information (PII)?    | Neither             |
|Was consent obtained for any personal data used?                                  | Not Applicable       |
|Personal data used to create this model?                                                 | None                 |
|How often is dataset reviewed?                                                    | Before Every Release |
|Is a mechanism in place to honor data subject right of access or deletion of personal data? | No         |
|If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable |
|If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable |
|If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable |
|Is there provenance for all datasets used in training?                          | Yes                  |
|Does data labeling (annotation, metadata) comply with privacy laws?                             | Yes                  |
|Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data|

## Safety & Security

| Field              | Response |
|:-------------------|:---------|
|Verified to have met prescribed quality standards: |  Yes |
|Target Key Performance Indicator(s) (KPI(s)):  |  Accuracy, Latency, Throughput |
|Model Application(s):                                 | Text Reranking for Retrieval   |
|Describe the physical safety impact (if present).     | Not Applicable                  |
|Use Case Restrictions:| Commercial license available from NVIDIA AI Enterprise. |
|Model and dataset restrictions:| The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

## Prototype

```python
import requests

invoke_url = "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking"

headers = {
"Authorization": "Bearer ",
"Accept": "application/json",
}

payload = {
"messages": [
{
"role": "user",
"content": ""
}
]
}

# re-use connections
session = requests.Session()

response = session.post(invoke_url, headers=headers, json=payload)

response.raise_for_status()
response_body = response.json()
print(response_body)
```

```python
import requests

invoke_url = "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking"

headers = {
"Authorization": "Bearer ",
"Accept": "application/json",
}

payload = {
"messages": [
{
"role": "user",
"content": ""
}
]
}

# re-use connections
session = requests.Session()

response = session.post(invoke_url, headers=headers, json=payload)

response.raise_for_status()
response_body = response.json()
print(response_body)
```

```javascript
import fetch from "node-fetch";

const invokeUrl = "https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking"

const headers = {
"Authorization": "Bearer ",
"Accept": "application/json",
}

const payload = {
"messages": [
{
"role": "user",
"content": ""
}
]
}

let response = await fetch(invokeUrl, {
method: "post",
body: JSON.stringify(payload),
headers: { "Content-Type": "application/json", ...headers }
});

let response_body = await response.json()

console.log(JSON.stringify(response_body))
```

```bash
invoke_url='https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking'

authorization_header='Authorization: Bearer '
accept_header='Accept: application/json'
content_type_header='Content-Type: application/json'

data=$'{
"messages": [
{
"role": "user",
"content": ""
}
]
}'

response=$(curl --silent -i -w "\n%{http_code}" --request POST \
--url "$invoke_url" \
--header "$authorization_header" \
--header "$accept_header" \
--header "$content_type_header" \
--data "$data"
)

echo "$response"
```