NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

meta

llama-3.1-70b-instruct

Run Anywhere

Powers complex conversations with superior contextual understanding, reasoning and text generation.

Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Get API Key and Install the NIM Operator

Install the NVIDIA GPU Operator

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Step 2
Create a ImagePull Secrets

kubectl create ns nim-service

kubectl create secret -n nim-service docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<PASTE_API_KEY_HERE>

kubectl create secret -n nim-service generic ngc-api-secret \
    --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 3
Create a NIM Service

Ensure that a default StorageClass exists in the cluster. If none is present, create an appropriate StorageClass before proceeding.

NOTE:

  • Select model-size based on the model and GPU type as described here.
  • For example, change the nvidia.com/gpu: 1 based on the model and number of GPU requirements
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-31-70b-instruct
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/meta/llama-3.1-70b-instruct
    tag: latest
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      size: "model-size"
      volumeAccessMode: "ReadWriteOnce"
  replicas: 1
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

Step 4
Test the Deployed NIM

kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
curl -X "POST" \
 'http://llama-31-70b-instruct.nim-service:8000/v1/chat/completions' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "meta/llama-3.1-70b-instruct",
        "messages": [
        {
          "content":"What should I do for a 4 day vacation at Cape Hatteras National Seashore?",
          "role": "user"
        }],
        "top_p": 1,
        "n": 1,
        "max_tokens": 1024,
        "stream": false,
        "frequency_penalty": 0.0,
        "stop": ["STOP"]
      }'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.