NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • VS Code
  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • Spark & Reachy Photo Booth
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent

inference

  • LM Studio on DGX Spark
  • Nemotron-3-Nano with llama.cpp
  • Speculative Decoding
  • SGLang for Inference
  • TRT LLM for Inference
  • vLLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Nanochat on Dual-Spark

5 days

Setup Nanochat on Dual-Spark

View on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting

Basic idea

This playbook shows you how to run Andrej Karpathy’s Nanochat on Spark. Nanochat is popularized as being the best ChatGPT that $100 can buy. This playbook makes it possible to train and run Nanochat locally on your dual-Spark setup.

What you'll accomplish

You’ll set up a local, end-to-end ChatGPT-like training pipeline, including pre-training, mid-training, post-training, and optional reinforcement learning. You will also be able to chat with your model through a simple web UI.

What to know before starting

  • Working with Docker containers and GPU passthrough
  • Command-line tools for GPU workloads
  • Basic understanding of training foundation LLM models

Prerequisites

  • Dual-Spark setup with QSFP cable
  • Docker installed and accessible to current user
  • NVIDIA Container Runtime configured
  • Hugging Face token and WandB API key
  • Verify GPU access: nvidia-smi
  • Check Docker GPU integration: docker run --rm --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi

Ancillary files

The reference training scripts can be found in the Nanochat repository here on GitHub

  • Dockerfile - Build custom docker to serup the environment
  • setup.sh - Setup the docker image on both Spark machines
  • speedrun_spark.sh - Modified version of speedrun.sh to support distributed training on dual-spark
  • launch.sh - Launch the nanochat training on both Spark machines

Time & risk

  • Duration: Upto 5 days depending on model size and number of training stages.

  • Risks:

    • Model instantiation and training are memory-intensive
    • Modifying hyperparameters such as batch size, model dimensions, or precision settings can increase memory usage and may result in OOM
    • Downloading large datasets and storing the trained checkpoints can take up storage space
  • Rollback:

    • Delete the downloaded dataset and checkpoints from $HOME/.cache/nanochat
    • Then exit the container environment

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide