NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark
  • VS Code
  • Connect Three DGX Spark in a Ring Topology
  • Connect Multiple DGX Spark through a Switch

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
  • cuTile Kernels
  • CLI Coding Agent
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent
  • Spark & Reachy Photo Booth
  • Secure Long Running AI Agents with OpenShell on DGX Spark
  • OpenClaw 🦞

inference

  • LM Studio on DGX Spark
  • Speculative Decoding
  • Run models with llama.cpp on DGX Spark
  • Nemotron-3-Nano with llama.cpp
  • SGLang for Inference
  • TRT LLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
  • vLLM for Inference

NIM on Spark

30 MIN

Deploy a NIM on Spark

DGXSpark
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting

Basic idea

NVIDIA NIM is containerized software for fast, reliable AI model serving and inference on NVIDIA GPUs. This playbook demonstrates how to run NIM microservices for LLMs on DGX Spark devices, enabling local GPU inference through a simple Docker workflow. You'll authenticate with NVIDIA's registry, launch the NIM inference microservice, and perform basic inference testing to verify functionality.

What you'll accomplish

You'll launch a NIM container on your DGX Spark device to expose a GPU-accelerated HTTP endpoint for text completions. While these instructions feature working with the Llama 3.1 8B NIM, additional NIM including the Qwen3-32 NIM are available for DGX Spark (see them here).

What to know before starting

  • Working in a terminal environment
  • Using Docker commands and GPU-enabled containers
  • Basic familiarity with REST APIs and curl commands
  • Understanding of NVIDIA GPU environments and CUDA

Prerequisites

  • DGX Spark device with NVIDIA drivers installed
    nvidia-smi
    
  • Docker with NVIDIA Container Toolkit configured, instructions here
    docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
    
  • NGC account with API key from here
    echo $NGC_API_KEY | grep -E '^[a-zA-Z0-9]{86}=='
    
  • Sufficient disk space for model caching (varies by model, typically 10-50GB)
    df -h ~
    

Time & risk

  • Estimated time: 15-30 minutes for setup and validation
  • Risks:
    • Large model downloads may take significant time depending on network speed
    • GPU memory requirements vary by model size
    • Container startup time depends on model loading
  • Rollback: Stop and remove containers with docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>. Remove cached models from ~/.cache/nim if disk space recovery is needed.
  • Last Updated: 12/22/2025
    • Update docker container version to cuda:13.0.1-devel-ubuntu24.04
    • Add docker container permission setup instructioins

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation