nvidia

llama-3.3-nemotron-super-49b-v1

Run Anywhere

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Get API Key and Install the NIM Operator

Install the NVIDIA GPU Operator

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Create a ImagePull Secrets

kubectl create ns nim-service kubectl create secret -n nim-service docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=<PASTE_API_KEY_HERE> kubectl create secret -n nim-service generic ngc-api-secret \ --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Create a NIM Service

Ensure that a default StorageClass exists in the cluster. If none is present, create an appropriate StorageClass before proceeding.

NOTE:

  • Select model-size based on the model and GPU type as described here.
  • For example, change the nvidia.com/gpu: 1 based on the model and number of GPU requirements
apiVersion: apps.nvidia.com/v1alpha1 kind: NIMService metadata: name: llama-33-nemotron-super-49b-v1 namespace: nim-service spec: image: repository: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1 tag: latest pullPolicy: IfNotPresent pullSecrets: - ngc-secret authSecret: ngc-api-secret storage: pvc: create: true size: "model-size" volumeAccessMode: "ReadWriteOnce" replicas: 1 resources: limits: nvidia.com/gpu: 1 expose: service: type: ClusterIP port: 8000

Test the Deployed NIM

kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
curl -X "POST" \ 'http://llama-33-nemotron-super-49b-v1.nim-service:8000/v1/chat/completions' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "nvidia/llama-3.3-nemotron-super-49b-v1", "messages": [ { "content":"What should I do for a 4 day vacation at Cape Hatteras National Seashore?", "role": "user" }], "top_p": 1, "n": 1, "max_tokens": 1024, "stream": false, "frequency_penalty": 0.0, "stop": ["STOP"] }'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.