Model Overview

Description:

llama-nemotron-embed-1b-v2 is optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka embeddings). The model was evaluated across 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.

llama-nemotron-embed-1b-v2 was developed by NVIDIA as part of the Nemotron retriever family.

In addition to enabling multilingual and cross-lingual retrieval, this model reduces data storage footprint through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently.

Embedding models are a key component of a retrieval system, transforming text into dense vector representations (embeddings) for indexing and similarity search.

This model is ready for commercial/non-commercial use.

License/Terms of Use:

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Model Developer: NVIDIA

Deployment Geography:

Global

Use Case:

This model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging dense retrieval technologies.

Release Date:

Build.NVIDIA.com 2/27/2026 via llama-nemotron-embed-1b-v2

References(s):

NVIDIA NeMo Retriever Documentation

Model Architecture:

Architecture Type: Transformer encoder
Network Architecture: Fine-tuned Llama 3.2 1B retriever
Embedding Dimension: 2048
Max Sequence Length: 8192
Number of Model Parameters: 1B

This embedding model is a transformer encoder fine-tuned for contrastive learning using a bi-encoder setup. Query and passage texts are encoded independently, and contrastive learning maximizes similarity for relevant query–passage pairs while minimizing similarity for negative passages. The model supports Matryoshka embeddings to enable dynamic output dimensions.

Input(s):

Input Type(s): Text
Input Format(s): String / List of strings
Input Parameters: One Dimensional (1D)
Other Properties Related to Input: Text inputs longer than the maximum context length must be truncated or chunked.

Output(s)

Output Type(s): Floats (dense vector embeddings)
Output Format(s): List of floats
Output Parameters: One Dimensional (1D)
Other Properties Related to Output: Embedding vectors can be configured to output dimensions of 384, 512, 768, 1024, or 2048.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engines: TensorRT
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Lovelace
Preferred/Supported Operating System(s): Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

llama-nemotron-embed-1b-v2

Short Name: llama-nemotron-embed-1b-v2

Training, Testing, and Evaluation Datasets:

Training Dataset

Data Modality: Text
Training Data Collection: Hybrid: Automated, Undisclosed
Training Labeling: Hybrid: Automated, Undisclosed
Training Properties: NVIDIA used a training blend of public QA datasets with commercial-use-eligible licenses (avoiding datasets such as MSMARCO that restrict commercial use). Semi-supervised pre-training on ~12M samples from public datasets and fine-tuning on ~1M samples from public datasets.

Evaluation Dataset

Evaluation Data Collection: Hybrid: Automated, Undisclosed
Evaluation Labeling: Hybrid: Automated, Undisclosed
Evaluation Properties: Evaluated in comparison to open and commercial retriever models on academic question-answering benchmarks (NQ, HotpotQA, FiQA from BEIR, and TechQA). Metric used: Recall@5. Additional evaluations include MIRACL multilingual retrieval, MLQA cross-lingual retrieval, and MLDR long-document retrieval.

Selected results (Recall@5):

Open & Commercial Retrieval Models	Average Recall@5 on NQ, HotpotQA, FiQA, TechQA
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048)	68.60%
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384)	64.48%
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048)	68.97%
nv-embedqa-mistral-7b-v2	72.97%
nv-embedqa-mistral-7b-v1	64.93%
nv-embedqa-e5-v5	62.07%
nv-embedqa-e5-v4	57.65%
e5-large-unsupervised	48.03%
BM25	44.67%

Open & Commercial Retrieval Models	Average Recall@5 on MIRACL multilingual
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048)	60.75%
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384)	58.62%
llama-3.2-nv-embedqa-1b-v1	60.07%
nv-embedqa-mistral-7b-v2	50.42%
BM25	26.51%

Open & Commercial Retrieval Models	Average Recall@5 on MLQA (different languages)
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048)	79.86%
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384)	71.61%
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048)	78.77%
nv-embedqa-mistral-7b-v2	68.38%
BM25	13.01%

Open & Commercial Retrieval Models	Average Recall@5 on MLDR
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048)	59.55%
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384)	54.77%
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048)	60.49%
nv-embedqa-mistral-7b-v2	43.24%
BM25	71.39%

Inference:

Acceleration Engine: TensorRT
Test Hardware: NVIDIA L40s

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case, and address unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

nvidia