Deploy a NIM on Spark
NVIDIA NIM is containerized software for fast, reliable AI model serving and inference on NVIDIA GPUs. This playbook demonstrates how to run NIM microservices for LLMs on DGX Spark devices, enabling local GPU inference through a simple Docker workflow. You'll authenticate with NVIDIA's registry, launch the NIM inference microservice, and perform basic inference testing to verify functionality.
You'll launch a NIM container on your DGX Spark device to expose a GPU-accelerated HTTP endpoint for text completions. While these instructions feature working with the Llama 3.1 8B NIM, additional NIM including the Qwen3-32 NIM are available for DGX Spark (see them here).
nvidia-smi
docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
echo $NGC_API_KEY | grep -E '^[a-zA-Z0-9]{86}=='
df -h ~
docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>. Remove cached models from ~/.cache/nim if disk space recovery is needed.