NVFP4 Quantization

1 HR

Quantize a model to NVFP4 to run on DGX Station using TensorRT Model Optimizer

SymptomCauseFix
"Permission denied" when accessing Hugging FaceMissing or invalid HF tokenRun huggingface-cli login with valid token
Container exits with CUDA out of memoryInsufficient GPU memoryReduce batch size or use a machine with more GPU memory
Model files not found in output directoryVolume mount failed or wrong pathVerify $(pwd)/output_models resolves correctly
Git clone fails inside containerNetwork connectivity issuesCheck internet connection and retry
Quantization process hangsContainer resource limitsIncrease Docker memory limits or use --ulimit flags
Cannot access gated repo for URLCertain HuggingFace models have restricted accessRegenerate your HuggingFace token; and request access to the gated model on your web browser
Log ends with MPI or ModuleNotFoundError: No module named 'mpi4py'TensorRT-LLM / runner step uses MPI; quantization may have already succeededCheck that the quantization output (e.g. encoder config, saved model under output_models/) was produced. The final runner step can fail with an MPI error even when NVFP4 quantization completed successfully. Install mpi4py or use a container that includes it if you need the full pipeline.