NVFP4 Quantization

Symptom	Cause	Fix
"Permission denied" when accessing Hugging Face	Missing or invalid HF token	Run `huggingface-cli login` with valid token
Container exits with CUDA out of memory	Insufficient GPU memory	Reduce batch size or use a machine with more GPU memory
Model files not found in output directory	Volume mount failed or wrong path	Verify `$(pwd)/output_models` resolves correctly
Git clone fails inside container	Network connectivity issues	Check internet connection and retry
Quantization process hangs	Container resource limits	Increase Docker memory limits or use `--ulimit` flags
Cannot access gated repo for URL	Certain HuggingFace models have restricted access	Regenerate your HuggingFace token; and request access to the gated model on your web browser
Log ends with MPI or `ModuleNotFoundError: No module named 'mpi4py'`	TensorRT-LLM / runner step uses MPI; quantization may have already succeeded	Check that the quantization output (e.g. encoder config, saved model under `output_models/`) was produced. The final runner step can fail with an MPI error even when NVFP4 quantization completed successfully. Install `mpi4py` or use a container that includes it if you need the full pipeline.