Basic idea
This playbook demonstrates training of nanochat on DGX Station with the GB300 Ultra Superchip. You run the full pipeline on a single system: custom BPE tokenizer training, base model pretraining, midtraining (conversation format), supervised fine-tuning (SFT), and inference via CLI or web UI.
The project uses the PyTorch NGC container, FineWeb for pretraining, SmolTalk for SFT, and Weights & Biases for logging. The default speedrun configuration trains a 561M-parameter (d20) model suitable for learning and experimentation.
What you'll accomplish
You will have a working nanochat setup that trains a small LLM and serves it for chat.
- Environment: Docker image with PyTorch and nanochat dependencies on your DGX Station.
- Training pipeline: Tokenizer (65K BPE), pretraining (~11.2B tokens), midtraining, SFT, and automated report generation.
- Inference: ChatGPT-style web UI and CLI to chat with the base, mid, or SFT checkpoints.
- Monitoring: W&B dashboards and
nanochat/report.mdwith metrics and samples.
What to know before starting
- Basic Linux command line and shell usage.
- Familiarity with Docker and GPU containers (e.g.
docker run --gpus all). - Optional: understanding of LLM training (tokenizer, pretraining, fine-tuning).
Prerequisites
Hardware:
- NVIDIA DGX Station with GB300 Ultra Superchip.
- Sufficient GPU memory for the chosen model (the GB300 Ultra provides ample memory for the d20 speedrun).
- Adequate storage for cache (~24GB+ for FineWeb data and checkpoints).
Software:
- Docker with NVIDIA Container Toolkit:
docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.01-py3 nvidia-smi - Network access to download datasets (Hugging Face, FineWeb) and container images.
- Weights & Biases account and API key.
- Hugging Face token for evaluation datasets.
Ancillary files
All required assets are in the playbook directory nvidia/station-nanochat/assets (see the dgx-station-playbooks repository).
assets/Dockerfile– PyTorch NGC image plus nanochat dependencies and venv.assets/setup.sh– Clones nanochat, checks out the supported commit, and builds the Docker image.assets/launch.sh– Runs the training container on your DGX Station (runs the full pipeline: tokenizer, pretrain, midtrain, SFT, and report generation).assets/README.md– Additional detail on training stages, inference, and troubleshooting.
Time & risk
- Estimated time: About 30 minutes for clone, setup, and launching the run. Full d20 speedrun training time depends on your DGX Station configuration (hours to a day or more).
- Risk level: Medium
- Large downloads (FineWeb) can fail or be slow; ensure stable network and disk space.
- API keys (W&B, HF) must be set or the launch script will exit.
- Rollback: Stop containers with
docker stop, remove caches under~/.cache/nanochat(or paths inlaunch.sh), and rundocker system prune -aif needed.
- Last Updated: 03/02/2026
- First Publication