nvidia

llama-3.3-nemotron-super-49b-v1

Run Anywhere

High efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

advanced reasoning function calling instruction following math

Get API Key

API Reference

Select your target environment

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Get API Key and Install the NIM Operator

Install the NVIDIA GPU Operator


helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

helm install nim-operator nvidia/k8s-nim-operator --create-namespace -n nim-operator

Step 2
Create a ImagePull Secrets


kubectl create ns nim-service

kubectl create secret -n nim-service docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<PASTE_API_KEY_HERE>

kubectl create secret -n nim-service generic ngc-api-secret \
    --from-literal=NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 3
Create a NIM Service

Ensure that a default StorageClass exists in the cluster. If none is present, create an appropriate StorageClass before proceeding.

NOTE:

Select model-size based on the model and GPU type as described here.
For example, change the nvidia.com/gpu: 1 based on the model and number of GPU requirements


apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-33-nemotron-super-49b-v1
  namespace: nim-service
spec:
  image:
    repository: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1
    tag: latest
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      size: "model-size"
      volumeAccessMode: "ReadWriteOnce"
  replicas: 1
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

Step 4
Test the Deployed NIM


kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash


curl -X "POST" \
 'http://llama-33-nemotron-super-49b-v1.nim-service:8000/v1/chat/completions' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
        "messages": [
        {
          "content":"What should I do for a 4 day vacation at Cape Hatteras National Seashore?",
          "role": "user"
        }],
        "top_p": 1,
        "n": 1,
        "max_tokens": 1024,
        "stream": false,
        "frequency_penalty": 0.0,
        "stop": ["STOP"]
      }'

For more details on getting started with this NIM, visit the NVIDIA NIM Operator Docs.

nvidia

llama-3.3-nemotron-super-49b-v1

Step 1Get API Key and Install the NIM Operator

Step 2Create a ImagePull Secrets

Step 3Create a NIM Service

Step 4Test the Deployed NIM

Step 1
Get API Key and Install the NIM Operator

Step 2
Create a ImagePull Secrets

Step 3
Create a NIM Service

Step 4
Test the Deployed NIM