Nanochat Training

30 MIN

Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra

Basic idea

This playbook demonstrates training of nanochat on DGX Station with the GB300 Ultra Superchip. You run the full pipeline on a single system: custom BPE tokenizer training, base model pretraining, midtraining (conversation format), supervised fine-tuning (SFT), and inference via CLI or web UI.

The project uses the PyTorch NGC container, FineWeb for pretraining, SmolTalk for SFT, and Weights & Biases for logging. The default speedrun configuration trains a 561M-parameter (d20) model suitable for learning and experimentation.

What you'll accomplish

You will have a working nanochat setup that trains a small LLM and serves it for chat.

  • Environment: Docker image with PyTorch and nanochat dependencies on your DGX Station.
  • Training pipeline: Tokenizer (65K BPE), pretraining (~11.2B tokens), midtraining, SFT, and automated report generation.
  • Inference: ChatGPT-style web UI and CLI to chat with the base, mid, or SFT checkpoints.
  • Monitoring: W&B dashboards and nanochat/report.md with metrics and samples.

What to know before starting

  • Basic Linux command line and shell usage.
  • Familiarity with Docker and GPU containers (e.g. docker run --gpus all).
  • Optional: understanding of LLM training (tokenizer, pretraining, fine-tuning).

Prerequisites

Hardware:

  • NVIDIA DGX Station with GB300 Ultra Superchip.
  • Sufficient GPU memory for the chosen model (the GB300 Ultra provides ample memory for the d20 speedrun).
  • Adequate storage for cache (~24GB+ for FineWeb data and checkpoints).

Software:

  • Docker with NVIDIA Container Toolkit: docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.01-py3 nvidia-smi
  • Network access to download datasets (Hugging Face, FineWeb) and container images.
  • Weights & Biases account and API key.
  • Hugging Face token for evaluation datasets.

Ancillary files

All required assets are in the playbook directory nvidia/station-nanochat/assets (see the dgx-station-playbooks repository).

  • assets/Dockerfile – PyTorch NGC image plus nanochat dependencies and venv.
  • assets/setup.sh – Clones nanochat, checks out the supported commit, and builds the Docker image.
  • assets/launch.sh – Runs the training container on your DGX Station (runs the full pipeline: tokenizer, pretrain, midtrain, SFT, and report generation).
  • assets/README.md – Additional detail on training stages, inference, and troubleshooting.

Time & risk

  • Estimated time: About 30 minutes for clone, setup, and launching the run. Full d20 speedrun training time depends on your DGX Station configuration (hours to a day or more).
  • Risk level: Medium
    • Large downloads (FineWeb) can fail or be slow; ensure stable network and disk space.
    • API keys (W&B, HF) must be set or the launch script will exit.
  • Rollback: Stop containers with docker stop, remove caches under ~/.cache/nanochat (or paths in launch.sh), and run docker system prune -a if needed.
  • Last Updated: 03/02/2026
    • First Publication