Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra
This playbook demonstrates training of nanochat on DGX Station with the GB300 Ultra Superchip. You run the full pipeline on a single system: custom BPE tokenizer training, base model pretraining, midtraining (conversation format), supervised fine-tuning (SFT), and inference via CLI or web UI.
The project uses the PyTorch NGC container, FineWeb for pretraining, SmolTalk for SFT, and Weights & Biases for logging. The default speedrun configuration trains a 561M-parameter (d20) model suitable for learning and experimentation.
You will have a working nanochat setup that trains a small LLM and serves it for chat.
nanochat/report.md with metrics and samples.docker run --gpus all).Hardware:
Software:
docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.01-py3 nvidia-smiAll required assets are in the playbook directory nvidia/station-nanochat/assets (see the dgx-station-playbooks repository).
assets/Dockerfile – PyTorch NGC image plus nanochat dependencies and venv.assets/setup.sh – Clones nanochat, checks out the supported commit, and builds the Docker image.assets/launch.sh – Runs the training container on your DGX Station (runs the full pipeline: tokenizer, pretrain, midtrain, SFT, and report generation).assets/README.md – Additional detail on training stages, inference, and troubleshooting.docker stop, remove caches under ~/.cache/nanochat (or paths in launch.sh), and run docker system prune -a if needed.