Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • MIG on DGX Station

data science

  • Topic Modeling
  • Text to Knowledge Graph on DGX Station

tools

  • NVFP4 Quantization

fine tuning

  • NVFP4 Pretraining with Megatron Bridge
  • Nanochat Training

use case

  • Run NemoClaw with a Local LLM
  • DGX Station AI Skills for Coding Agents
  • Profiler-Driven Kernel Optimization for Fine-Tuning
  • Local Healthcare Agent on DGX Station
  • Secure Long Running AI Agents with OpenShell on DGX Station
  • Local Coding Agent

inference

  • vLLM for Inference
  • Image & Video Generation with ComfyUI
  • Isaac GR00T N1.6 Fine-Tuning
  • LLM Inference with SGLang

NVFP4 Pretraining with Megatron Bridge

30 MIN

Pretrain Llama 3.1 8B with NVFP4 mixed precision on DGX Station using Megatron Bridge

Megatron BridgeNVFP4Training
OverviewOverviewPretrain with NVFP4Pretrain with NVFP4TroubleshootingTroubleshooting
SymptomCauseFix
RuntimeError: NVFP4 is not supported on this GPU or similar FP4 errorGPU is not Blackwell architectureNVFP4 requires Blackwell GPUs (GB200, GB300). Check with nvidia-smi
ModuleNotFoundError: No module named 'megatron.bridge'Megatron Bridge not installedRun pip install megatron-bridge or use the NGC container
CUDA out of memory during model initInsufficient GPU memory for Llama 3.1 8B + optimizer statesReduce micro_batch_size or use --nproc_per_node for model parallelism
torchrun hangs or times outNCCL communication failure between GPUsCheck NCCL_DEBUG=INFO torchrun ... for details; verify all GPUs are visible
Training loss is NaNPrecision instabilityIncrease num_layers_at_end_in_bf16 (e.g., from 4 to 8) or reduce learning rate
--disable-fp4 works but NVFP4 crashesTransformer Engine version mismatchEnsure Transformer Engine supports NVFP4; update with pip install --upgrade transformer-engine
Slow training throughputNot using Tensor Cores efficientlyEnsure batch dimensions are multiples of 8; check that nvidia-smi shows high GPU utilization
Permission denied on DockerUser not in docker groupRun sudo usermod -aG docker $USER && newgrp docker

Resources

  • Megatron Bridge Documentation
  • Mixed Precision Training Guide
  • Megatron Bridge GitHub
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation