Fine-tune with NeMo

1 HR

Use NVIDIA NeMo to fine-tune models locally

SymptomCauseFix
nvcc: command not foundCUDA toolkit not in PATHAdd CUDA toolkit to PATH: export PATH=/usr/local/cuda/bin:$PATH
pip install uv permission deniedSystem-level pip restrictionsUse pip3 install --user uv and update PATH
GPU not detected in trainingCUDA driver/runtime mismatchVerify driver compatibility: nvidia-smi and reinstall CUDA if needed
Out of memory during trainingModel too large for available GPU memoryReduce batch size, enable gradient checkpointing, or use model parallelism
ARM64 package compatibility issuesPackage not available for ARM architectureUse source installation or build from source with ARM64 flags
Cannot access gated repo for URLCertain HuggingFace models have restricted accessRegenerate your HuggingFace token; and request access to the gated model on your web browser

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'