NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

deepseek-ai

deepseek-r1

Run Anywhere

State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.

Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Follow the steps below to download and run the NVIDIA NIM inference microservice with NIM Operator on your infrastructure of choice.

Step 1
Install the NIM Operator

  • Prerequisites
    • Install the NVIDIA GPU Operator
    • Generate API Key
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Step 2
Create a ImagePull Secrets

kubectl create secret -n nim-service docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=<PASTE_API_KEY_HERE> kubectl create secret -n nim-service generic ngc-api-secret \ --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 3
Create a NIM Cache with Available storage class on the Cluster

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata: name: meta-llama3-8b-instruct spec: source: ngc: modelPuller: nvcr.io/nim/meta/llama-3.1-8b-instruct:1.8.3 pullSecret: ngc-secret authSecret: ngc-api-secret model: engine: tensorrt_llm tensorParallelism: "1" storage: pvc: create: true storageClass: size: "50Gi" volumeAccessMode: ReadWriteMany resources: {}

Step 4
Create a NIM Service

apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata: name: meta-llama3-8b-instruct spec: image: repository: nvcr.io/nim/meta/llama-3.1-8b-instruct tag: 1.8.3 pullPolicy: IfNotPresent pullSecrets: - ngc-secret authSecret: ngc-api-secret storage: nimCache: name: meta-llama3-8b-instruct profile: '' replicas: 1 resources: limits: nvidia.com/gpu: 1 expose: service: type: ClusterIP port: 8000

Step 5
Test the Deployed NIM

kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
curl -X "POST" \ 'http://meta-llama3-8b-instruct.nim-service:8000/v1/chat/completions' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [ { "content":"What should I do for a 4 day vacation at Cape Hatteras National Seashore?", "role": "user" }], "top_p": 1, "n": 1, "max_tokens": 1024, "stream": false, "frequency_penalty": 0.0, "stop": ["STOP"] }'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.