Run cuTile kernel benchmarks, FMHA implementation, and LLM inference on DGX Spark and B300
| Symptom | Cause | Fix |
|---|---|---|
docker: permission denied | User not in docker group | sudo usermod -aG docker $USER && newgrp docker |
401 Client Error: Unauthorized | Missing HuggingFace token | export HF_TOKEN=<your_token> |
ModuleNotFoundError: tilegym | TileGym not installed | cd TileGym && pip install . |
RuntimeError: CUDA out of memory | Model too large | Reduce batch size or use smaller model |
Killed during model load | Out of system memory | Clear cache: sync; echo 3 > /proc/sys/vm/drop_caches |
| Slow first run | JIT compilation | Normal - cuTile compiles kernels on first run |
FileNotFoundError: input_prompt_small.txt | Missing input file | Run from modeling/transformers directory |
torch.cuda.OutOfMemoryError | Insufficient GPU memory | Reduce --batch_size parameter |
ImportError: cuda.tile | Missing Tile IR | Install: apt-get install cuda-tile-ir-13-1 |
| Benchmark hangs | GPU busy or locked | Check nvidia-smi for other processes |
NOTE
DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
TIP
First run of cuTile kernels includes JIT compilation overhead. Subsequent runs will be faster as compiled kernels are cached.
For the latest known issues, please review the DGX Spark User Guide.