llama-3.1-nemoguard-8b-topic-control Model by NVIDIA

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Generate API Key

Step 2
Pull and Run the NIM

$ docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>

Pull and run the NVIDIA NIM with the following command. This command downloads the optimized model for your infrastructure.

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/llama-nemoguard-topiccontrol
mkdir -p "$LOCAL_NIM_CACHE"
docker run -it --rm \
    --runtime=nvidia \
    --gpus=all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -e NIM_SERVED_MODEL_NAME="llama-3.1-nemoguard-8b-topic-control" \
    -e NIM_CUSTOM_MODEL_NAME="llama-3.1-nemoguard-8b-topic-control" \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control:latest

You can now make a local API call using this curl command:

curl -X 'POST' \
  'http://0.0.0.0:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama-3.1-nemoguard-8b-topic-control",
    "messages": [
      {
        "role":"user",
        "content":"Hello! How are you?"
      },
      {
        "role":"assistant",
        "content":"Hi! I am quite well, how can I help you today?"
      },
      {
        "role":"user",
        "content":"Can you write me a song?"
      }
    ],
    "top_p": 1,
    "n": 1,
    "max_tokens": 15,
    "stream": true,
    "frequency_penalty": 1.0,
    "stop": ["hello"]
  }'

For more information about getting started with this NIM, refer to Llama 3.1 NemoGuard 8B TopicControl NIM.

NVIDIA

llama-3.1-nemoguard-8b-topic-control

Step 1Generate API Key

Step 2Pull and Run the NIM

Step 1
Generate API Key

Step 2
Pull and Run the NIM