Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • MIG on DGX Station

data science

  • Topic Modeling
  • Text to Knowledge Graph on DGX Station

tools

  • NVFP4 Quantization

fine tuning

  • NVFP4 Pretraining with Megatron Bridge
  • Nanochat Training

use case

  • Run NemoClaw with a Local LLM
  • DGX Station AI Skills for Coding Agents
  • Profiler-Driven Kernel Optimization for Fine-Tuning
  • Local Healthcare Agent on DGX Station
  • Secure Long Running AI Agents with OpenShell on DGX Station
  • Local Coding Agent

inference

  • vLLM for Inference
  • Image & Video Generation with ComfyUI
  • Isaac GR00T N1.6 Fine-Tuning
  • LLM Inference with SGLang
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

NVFP4 Quantization

1 HR

Quantize a model to NVFP4 to run on DGX Station using TensorRT Model Optimizer

DGXStation
View on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting
SymptomCauseFix
"Permission denied" when accessing Hugging FaceMissing or invalid HF tokenRun huggingface-cli login with valid token
Container exits with CUDA out of memoryInsufficient GPU memoryReduce batch size or use a machine with more GPU memory
Model files not found in output directoryVolume mount failed or wrong pathVerify $(pwd)/output_models resolves correctly
Git clone fails inside containerNetwork connectivity issuesCheck internet connection and retry
Quantization process hangsContainer resource limitsIncrease Docker memory limits or use --ulimit flags
Cannot access gated repo for URLCertain HuggingFace models have restricted accessRegenerate your HuggingFace token; and request access to the gated model on your web browser
Log ends with MPI or ModuleNotFoundError: No module named 'mpi4py'TensorRT-LLM / runner step uses MPI; quantization may have already succeededCheck that the quantization output (e.g. encoder config, saved model under output_models/) was produced. The final runner step can fail with an MPI error even when NVFP4 quantization completed successfully. Install mpi4py or use a container that includes it if you need the full pipeline.

Resources

  • TensorRT Model Optimizer Documentation
  • TensorRT-LLM Documentation