NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

nvidia

llama-3.2-nv-rerankqa-1b-v2

Run Anywhere

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverrerankingrun-on-rtxRetrieval Augmented Generation
Get API Key
API Reference
Accelerated by DGX Cloud
Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Get API Key and Install the NIM Operator

Install the NVIDIA GPU Operator

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Step 2
Create a ImagePull Secrets

kubectl create ns nim-service

kubectl create secret -n nim-service docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<PASTE_API_KEY_HERE>

kubectl create secret -n nim-service generic ngc-api-secret \
    --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 3
Create a NIM Service

Ensure that a default StorageClass exists in the cluster. If none is present, create an appropriate StorageClass before proceeding.

NOTE:

  • Select model-size based on the model and GPU type as described here.
  • For example, change the nvidia.com/gpu: 1 based on the model and number of GPU requirements
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-32-nv-rerankqa-1b-v2
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2
    tag: latest
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      size: "model-size"
      volumeAccessMode: "ReadWriteOnce"
  replicas: 1
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

Step 4
Test the Deployed NIM

kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
curl -X "POST" \
 'http://llama-32-nv-rerankqa-1b-v2.nim-service:8000/v1/ranking' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nvidia/llama-3.2-nv-rerankqa-1b-v2",
  "query": {"text": "which way did the traveler go?"},
  "passages": [
    {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
    {"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
    {"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
    {"text": "i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
  ],
  "truncate": "END"
}'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.