NVFP4 Pretraining with Megatron Bridge

Symptom	Cause	Fix
`RuntimeError: NVFP4 is not supported on this GPU` or similar FP4 error	GPU is not Blackwell architecture	NVFP4 requires Blackwell GPUs (GB200, GB300). Check with `nvidia-smi`
`ModuleNotFoundError: No module named 'megatron.bridge'`	Megatron Bridge not installed	Run `pip install megatron-bridge` or use the NGC container
`CUDA out of memory` during model init	Insufficient GPU memory for Llama 3.1 8B + optimizer states	Reduce `micro_batch_size` or use `--nproc_per_node` for model parallelism
`torchrun` hangs or times out	NCCL communication failure between GPUs	Check `NCCL_DEBUG=INFO torchrun ...` for details; verify all GPUs are visible
Training loss is NaN	Precision instability	Increase `num_layers_at_end_in_bf16` (e.g., from 4 to 8) or reduce learning rate
`--disable-fp4` works but NVFP4 crashes	Transformer Engine version mismatch	Ensure Transformer Engine supports NVFP4; update with `pip install --upgrade transformer-engine`
Slow training throughput	Not using Tensor Cores efficiently	Ensure batch dimensions are multiples of 8; check that `nvidia-smi` shows high GPU utilization
Permission denied on Docker	User not in docker group	Run `sudo usermod -aG docker $USER && newgrp docker`