Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra
Ensure your DGX Station has Docker with NVIDIA runtime and GPU access. Nanochat uses Weights & Biases (W&B) for training visualization and a Hugging Face token for evaluation datasets.
# Verify GPU and Docker
nvidia-smi
docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.04-py3 nvidia-smi
Create a W&B account and a Hugging Face token if you don't have them. Export both keys in your shell:
export WANDB_API_KEY=<YOUR_WANDB_API_KEY>
export HF_TOKEN=<YOUR_HF_TOKEN>
Clone the playbook repository and navigate to the assets directory:
git clone https://github.com/NVIDIA/dgx-spark-playbooks
cd dgx-spark-playbooks/nvidia/station-nanochat/assets
Run the setup script. It clones nanochat, checks out the supported commit, copies the station-adapted speedrun_station.sh, and builds the nanochat Docker image (PyTorch NGC base with dependencies):
./setup.sh
You should see the nanochat image listed if you run docker images. Your directory structure after setup should look like this:
assets/
├── Dockerfile
├── launch.sh
├── setup.sh
├── speedrun_station.sh
└── nanochat/
Ensure your API keys are exported, then launch:
./launch.sh
The training runs inside the nanochat container and executes the full pipeline automatically:
report.md with metrics and samplesTraining on a single GB300 Ultra takes on the order of 12+ hours for the full d24 run.
W&B dashboard:
Track training at wandb.ai under the nanochat project. The exact link to the wandb run would be provided in the training logs. Key metrics:
After training, checkpoints are saved under the nanochat_cache/ directory. Run inference from inside the container or interactively:
Web UI (recommended):
docker run --rm --gpus all --net=host \
-v $(pwd)/nanochat:/workspace/nanochat \
-v $(pwd)/nanochat_cache:/root/.cache/nanochat \
-w /workspace/nanochat \
nanochat \
python -m scripts.chat_web
Open a browser to http://<STATION_IP>:8000 where <STATION_IP> is your DGX Station’s IP address.
CLI:
docker run --rm -it --gpus all \
-v $(pwd)/nanochat:/workspace/nanochat \
-v $(pwd)/nanochat_cache:/root/.cache/nanochat \
-w /workspace/nanochat \
nanochat \
python -m scripts.chat_cli -p "Why is the sky blue?"
To stop training early, interrupt the launch script or stop the container:
WARNING
This stops the training run and any in-progress work in the container.
# If launch.sh is running: press Ctrl+C
# Or stop the container directly
docker stop $(docker ps -q --filter ancestor=nanochat)
To free disk space:
rm -rf ./nanochat_cache ./hf_cache
docker system prune -a
Smaller/faster run: Edit speedrun_station.sh before running setup to reduce data and model size:
# Fewer data shards (10 instead of default)
python -m nanochat.dataset -n 10 &
# Smaller model (d4 instead of d24), smaller batch size
python -m scripts.base_train --depth=4 --device-batch-size=32
Batch size: The default --device-batch-size=64 is tuned for the GB300's 288GB VRAM. Feel free to change the batch size if utilization is low or the training OOMs.
Then re-run ./setup.sh to rebuild with the changes.