llama-3.1-nemotron-nano-vl-8b-v1 Model by NVIDIA

import requests invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions" stream = False headers = { "Authorization": "Bearer $NVIDIA_API_KEY", "Accept": "text/event-stream" if stream else "application/json", } payload = { "messages": [ { "role": "user", "content": "" } ], "model": "nvidia/llama-3.1-nemotron-nano-vl-8b-v1", "temperature": 1, "top_p": 0.01, "max_tokens": 1024, "seed": 50, "stream": stream } response = requests.post(invoke_url, headers=headers, json=payload, stream=stream) if stream: for line in response.iter_lines(): if line: print(line.decode("utf-8")) else: print(response.json())

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Step 1
Generate API Key

Step 2
Pull and Run the NIM

Bash

$ docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>

Pull and run the NVIDIA NIM with the command below. This will download the optimized model for your infrastructure.

Bash

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/nvidia/llama-3.1-nemotron-nano-vl-8b-v1:latest

Step 3
Test the NIM

You can now make a local API call using this curl command:

Bash

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
    "temperature": 0.0,
    "top_p": 1.0,
    "model": "nvidia/llama-3.1-nemotron-nano-vl-8b-v1",
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "image_url", "image_url": { "url": "https://assets.ngc.nvidia.com/products/api-catalog/llama-cosmos-nemotron-8b-instruct/performance.png" } },
                { "type": "text", "text": "For MOE Switch XXL training what is speed of H100 over A100 and H100 with NVLink over A100?" }
            ]
        }
    ]
}'

For more details on getting started with this NIM, visit the NVIDIA NIM Docs.

nvidia/llama-3.1-nemotron-nano-vl-8b-v1

Prototype

Deploy

Step 1
Generate API Key

Step 2
Pull and Run the NIM

Step 3
Test the NIM

Step 1Generate API Key

Step 2Pull and Run the NIM

Step 3Test the NIM

Step 1
Generate API Key

Step 2
Pull and Run the NIM

Step 3
Test the NIM