NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

nvidia

llama-3.2-nv-embedqa-1b-v2

Run Anywhere

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

embeddingnemo retrieverrun-on-rtxRetrieval Augmented GenerationText-to-Embedding
Get API Key
API Reference
Accelerated by DGX Cloud
Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Get API Key and Install the NIM Operator

Install the NVIDIA GPU Operator

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Step 2
Create a ImagePull Secrets

kubectl create ns nim-service

kubectl create secret -n nim-service docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<PASTE_API_KEY_HERE>

kubectl create secret -n nim-service generic ngc-api-secret \
    --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 3
Create a NIM Service

Ensure that a default StorageClass exists in the cluster. If none is present, create an appropriate StorageClass before proceeding.

NOTE:

  • Select model-size based on the model and GPU type as described here.
  • For example, change the nvidia.com/gpu: 1 based on the model and number of GPU requirements
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-32-nv-embedqa-1b-v2
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2
    tag: latest
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      size: "model-size"
      volumeAccessMode: "ReadWriteOnce"
  replicas: 1
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

Step 4
Test the Deployed NIM

kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
curl -X "POST" \
 'http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1/embeddings' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
    "input": ["Hello world"],
    "input_type": "query"
  }'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.