NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • MIG on DGX Station

data science

  • Topic Modeling
  • Text to Knowledge Graph on DGX Station

tools

  • NVFP4 Quantization

fine tuning

  • Nanochat Training

use case

  • Secure Long Running AI Agents with OpenShell on DGX Station
  • Local Coding Agent

inference

  • Serve Qwen3-235B with vLLM

Nanochat Training

30 MIN

Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra

DGX StationFine-tuningGB300LLMPyTorchTrainingnanochat
View on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting

Basic idea

This playbook demonstrates training of nanochat on DGX Station with the GB300 Ultra Superchip. You run the full pipeline on a single system: custom BPE tokenizer training, base model pretraining, midtraining (conversation format), supervised fine-tuning (SFT), and inference via CLI or web UI.

The project uses the PyTorch NGC container, FineWeb for pretraining, SmolTalk for SFT, and Weights & Biases for logging. The default speedrun configuration trains a 561M-parameter (d20) model suitable for learning and experimentation.

What you'll accomplish

You will have a working nanochat setup that trains a small LLM and serves it for chat.

  • Environment: Docker image with PyTorch and nanochat dependencies on your DGX Station.
  • Training pipeline: Tokenizer (65K BPE), pretraining (~11.2B tokens), midtraining, SFT, and automated report generation.
  • Inference: ChatGPT-style web UI and CLI to chat with the base, mid, or SFT checkpoints.
  • Monitoring: W&B dashboards and nanochat/report.md with metrics and samples.

What to know before starting

  • Basic Linux command line and shell usage.
  • Familiarity with Docker and GPU containers (e.g. docker run --gpus all).
  • Optional: understanding of LLM training (tokenizer, pretraining, fine-tuning).

Prerequisites

Hardware:

  • NVIDIA DGX Station with GB300 Ultra Superchip.
  • Sufficient GPU memory for the chosen model (the GB300 Ultra provides ample memory for the d20 speedrun).
  • Adequate storage for cache (~24GB+ for FineWeb data and checkpoints).

Software:

  • Docker with NVIDIA Container Toolkit: docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.01-py3 nvidia-smi
  • Network access to download datasets (Hugging Face, FineWeb) and container images.
  • Weights & Biases account and API key.
  • Hugging Face token for evaluation datasets.

Ancillary files

All required assets are in the playbook directory nvidia/station-nanochat/assets (see the dgx-station-playbooks repository).

  • assets/Dockerfile – PyTorch NGC image plus nanochat dependencies and venv.
  • assets/setup.sh – Clones nanochat, checks out the supported commit, and builds the Docker image.
  • assets/launch.sh – Runs the training container on your DGX Station (runs the full pipeline: tokenizer, pretrain, midtrain, SFT, and report generation).
  • assets/README.md – Additional detail on training stages, inference, and troubleshooting.

Time & risk

  • Estimated time: About 30 minutes for clone, setup, and launching the run. Full d20 speedrun training time depends on your DGX Station configuration (hours to a day or more).
  • Risk level: Medium
    • Large downloads (FineWeb) can fail or be slow; ensure stable network and disk space.
    • API keys (W&B, HF) must be set or the launch script will exit.
  • Rollback: Stop containers with docker stop, remove caches under ~/.cache/nanochat (or paths in launch.sh), and run docker system prune -a if needed.
  • Last Updated: 03/02/2026
    • First Publication

Resources

  • nanochat (GitHub)
  • Weights & Biases
  • Hugging Face (datasets / token)
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation