NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

nvidia

llama-3.2-nv-embedqa-1b-v2

Run Anywhere

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

embeddingnemo retrieverrun on rtxretrieval augmented generationtext-to-embedding
Get API Key
API Reference
Accelerated by DGX Cloud
Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Generate API Key

Step 2
Pull and Run the NIM

$ docker login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>

Pull and run the NVIDIA NIM with the command below. This will download the optimized model for your infrastructure.

export NGC_API_KEY=<PASTE_API_KEY_HERE> export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" docker run -it --rm \ --gpus all \ --shm-size=16GB \ -e NGC_API_KEY \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:latest

Step 3
Test the NIM

You can now make a local API call using this curl command:

curl -X "POST" \ "http://localhost:8000/v1/embeddings" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": ["Hello world"], "model": "nvidia/llama-3.2-nv-embedqa-1b-v2", "input_type": "query" }'

For more details on getting started with this NIM, visit the NVIDIA NIM Docs.