| "Permission denied" when accessing Hugging Face | Missing or invalid HF token | Run huggingface-cli login with valid token |
| Container exits with CUDA out of memory | Insufficient GPU memory | Reduce batch size or use a machine with more GPU memory |
| Model files not found in output directory | Volume mount failed or wrong path | Verify $(pwd)/output_models resolves correctly |
| Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use --ulimit flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
Log ends with MPI or ModuleNotFoundError: No module named 'mpi4py' | TensorRT-LLM / runner step uses MPI; quantization may have already succeeded | Check that the quantization output (e.g. encoder config, saved model under output_models/) was produced. The final runner step can fail with an MPI error even when NVFP4 quantization completed successfully. Install mpi4py or use a container that includes it if you need the full pipeline. |