NVIDIA
Explore
Models
Blueprints
GPUs
Docs
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • CUDA-X Data Science
  • Optimized JAX
  • Text to Knowledge Graph

tools

  • VS Code
  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • NCCL for Two Sparks
  • Connect Two Sparks
  • Build a Video Search and Summarization (VSS) Agent

inference

  • Multi-modal Inference
  • NIM on Spark
  • NVFP4 Quantization
  • Speculative Decoding
  • TRT LLM for Inference
  • Install and Use vLLM for Inference
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

NVFP4 Quantization

1 HR

Quantize a model to NVFP4 to run on Spark using TensorRT Model Optimizer

View on GitHub
OverviewInstructionsTroubleshooting
SymptomCauseFix
"Permission denied" when accessing Hugging FaceMissing or invalid HF tokenRun huggingface-cli login with valid token
Container exits with CUDA out of memoryInsufficient GPU memoryReduce batch size or use a machine with more GPU memory
Model files not found in output directoryVolume mount failed or wrong pathVerify $(pwd)/output_models resolves correctly
Git clone fails inside containerNetwork connectivity issuesCheck internet connection and retry
Quantization process hangsContainer resource limitsIncrease Docker memory limits or use --ulimit flags
Cannot access gated repo for URLCertain HuggingFace models have restricted accessRegenerate your HuggingFace token; and request access to the gated model on your web browser

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • TensorRT Model Optimizer Documentation
  • TensorRT-LLM Documentation