NVFP4 Quantization

Symptom	Cause	Fix
"Permission denied" when accessing Hugging Face	Missing or invalid HF token	Run `huggingface-cli login` with valid token
Container exits with CUDA out of memory	Insufficient GPU memory	Reduce batch size or use a machine with more GPU memory
Model files not found in output directory	Volume mount failed or wrong path	Verify `$(pwd)/output_models` resolves correctly
Git clone fails inside container	Network connectivity issues	Check internet connection and retry
Quantization process hangs	Container resource limits	Increase Docker memory limits or use `--ulimit` flags
Cannot access gated repo for URL	Certain HuggingFace models have restricted access	Regenerate your HuggingFace token; and request access to the gated model on your web browser

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

NVFP4 Quantization

Resources