NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

This API will be deprecated on 05/18/2026. It will no longer be supported after 05/18/2026. Please transition to another model to avoid any service interruptions. For more models information, visit our API Reference.

nvidia

llama-3.2-nv-embedqa-1b-v2

Deprecation in 82dDownloadable

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

embeddingnemo retrieverRetrieval Augmented GenerationText-to-Embedding
Get API Key
API ReferenceAPI Reference
Accelerated by DGX Cloud
Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Generate API Key

Step 2
Pull and Run the NIM

$ docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>

Pull and run the NVIDIA NIM with the command below. This will download the optimized model for your infrastructure.

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:latest

Step 3
Test the NIM

You can now make a local API call using this curl command:

curl -X "POST" \
  "http://localhost:8000/v1/embeddings" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "input": ["Hello world"],
    "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
    "input_type": "query"
}'

For more details on getting started with this NIM, visit the NVIDIA NIM Docs.