---
title: "nv-embed-v1"
publisher: "nvidia"
type: "endpoint"
updated: "2025-07-22T18:57:16.656Z"
description: "Generates high-quality numerical embeddings from text inputs."
canonical: "https://build.nvidia.com/nvidia/nv-embed-v1"
---

# Model Overview

### Description

The NV-Embed Model is a generalist embedding model that excels across 56 tasks, including retrieval, reranking, classification, clustering, and semantic textual similarity tasks. NV-Embed achieves the highest score of 59.36 on 15 retrieval tasks within this benchmark.

NV-Embed features several innovative designs, such as latent vectors for improved pooled embedding output and a two-stage instruction tuning method, enhancing the accuracy of both retrieval and non-retrieval tasks. For more technical details, refer to the paper: [NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models](https://arxiv.org/pdf/2405.17428).

### Terms of use

The use of this model is governed by the [license](https://spdx.org/licenses/CC-BY-NC-4.0).

### References(s)

For more details, refer to the [NV-Embed paper](https://arxiv.org/pdf/2405.17428).

### Intended use

The NV-Embed Model is designed for users who need a high-performance generalist embedding model for tasks such as text retrieval, reranking, classification, clustering, and semantic textual similarity.

### Model Architecture

**Architecture Type:** Decoder-only LLM<br>
**Network Architecture:** Mistral-7B-v0.1 with Latent-Attention pooling<br>
**Embedding Dimension:** 4096<br>
**Max Input Tokens:** 32k<br>
**Parameter Count:** 7.1 billion<br>

The NV-Embed Model is based on the Mistral-7B-v0.1 architecture with a unique Latent-Attention pooling mechanism. This allows the model to generate more expressive pooled embeddings by having the LLM attend to latent vectors. It employs a two-stage instruction tuning method to improve performance across various tasks.

### Input

**Input Type:** text<br>
**Input Format:** list of strings with task-specific instructions

### Output

**Output Type:** floats<br>
**Output Format:** list of float arrays, each array containing the embeddings for the corresponding input string

### Model Version(s)

NV-Embed-v1

## Training Dataset & Evaluation

### Training Dataset

The NV-Embed model was trained on a diverse mixture of publicly available datasets, including various retrieval and non-retrieval tasks. The training data did not include any synthetic data from proprietary models like GPT-4, ensuring the model's accessibility and reproducibility.

### Evaluation Results

NV-Embed was evaluated using the Massive Text Embedding Benchmark (MTEB), achieving a record-high score of 69.32 across 56 tasks. It significantly outperforms previous leading embedding models, particularly excelling in retrieval tasks.

**Performance on MTEB benchmark:**
- **Overall Score:** 69.32
- **Score on Retrieval Tasks:** 59.36

## Ethical Considerations

### Bias, Safety & Security, and Privacy

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nv-embed-v1/bias). Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

### Special Training Data Considerations

The model was trained on publicly available data, which may contain toxic language and societal biases. Therefore, the model may amplify those biases, such as associating certain genders with specific social stereotypes.

## Bias

Field                                                                                               |  Response
:---------------------------------------------------------------------------------------------------|:---------------
Participation considerations from adversely impacted groups [protected classes](https://calcivilrights.ca.gov/disputeresolution/protected-characteristics/) in model design and testing:  |  None
Measures taken to mitigate against unwanted bias:                                                   |  None

## Explainability

| Field              |Response  |
|:-------------------|:---------|
| Intended Application & Domain:| Passage and query embedding for question and answer retrieval|
| Model Type:                   | Decoder-only LLM|
| Intended User:                | Generative AI creators working with conversational AI models. |
| Output:                       | Text embedding (An array of float numbers, providing a dense vector representation for the input text)                        |
| Describe how the model works: | The decoder-only LLM transforms the tokenized input text into a dense vector representation using a latent-attention pooling mechanism. |
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable|
| Verified to have met prescribed NVIDIA quality standards:| Yes|
| Performance Metrics:          | Accuracy, Throughput, and Latency|
| Potential Known Risks:        | The model was trained on the data that may contain toxic language and societal biases originally crawled from the Internet. Therefore, the model may amplify those biases, for example, associating certain genders with certain social stereotypes. |
| Licensing:                    | [CC-BY-NC-4.0 License](https://spdx.org/licenses/CC-BY-NC-4.0)|
| Technical Limitations:        | The model's maximum context length is 32k tokens. Texts longer than maximum length must either be chunked or truncated.|

## Privacy

| Field              | Response |
|:-------------------|:---------|
|Generatable or reverse engineerable personally-identifiable information (PII)?    | None              |
|Was consent obtained for any PII used?                                            | Not Applicable       |
|IPII used to create this model?                                                   | None                 |
|How often is dataset reviewed?                                                    | Before Release |
|Is a mechanism in place to honor data subject right of access or deletion of personal data? | No         |
|If PII collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable |
|If PII collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable |
|If PII collected for the development of this AI model, was it minimized to only what was required? | Not Applicable |
|Is there provenance for all datasets used in training?                            | Yes                  |
|Are we able to identify and trace source of dataset?                              | Yes                  |
|Does data labeling (annotation, metadata) comply with privacy laws?               | Yes                  | 
|Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data|

## Safety & Security

| Field              | Response |
|:-------------------|:---------|
|Model Application(s):                                 | Text Embedding for Retrieval    |
|Describe the life-critical impact (if present).     | Not Applicable                  |
|Use Case Restrictions:| Evaluation license for Non-Commerical Use Only. |
|Model and dataset restrictions:| The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

## Prototype

```bash
invoke_url='https://integrate.api.nvidia.com/v1/embeddings'

authorization_header='Authorization: Bearer '
accept_header='Accept: application/json'
content_type_header='Content-Type: application/json'

data=$'{
"encoding_format": "float",
"truncate": "NONE",
"messages": [
{
"role": "user",
"content": ""
}
]
}'

response=$(curl --silent -i -w "\n%{http_code}" --request POST \
--url "$invoke_url" \
--header "$authorization_header" \
--header "$accept_header" \
--header "$content_type_header" \
--data "$data"
)

echo "$response"
```

```python
from openai import OpenAI

client = OpenAI(
api_key="$NVIDIA_API_KEY",
base_url="https://integrate.api.nvidia.com/v1"
)

response = client.embeddings.create(
input=[""],
model="nvidia/nv-embed-v1",
encoding_format="float",
extra_body={"input_type": "", "truncate": "NONE"}
)

print(response.data[0].embedding)
```

```python
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

client = NVIDIAEmbeddings(
model="nvidia/nv-embed-v1", 
api_key="$NVIDIA_API_KEY", 
truncate="NONE", 
)

embedding = client.embed_query("")
print(embedding)
```