SGLang Inference Server

30 MIN

Install and use SGLang on DGX Spark

Common issues and their resolutions:

SymptomCauseFix
Container fails to start with GPU errorsNVIDIA drivers/toolkit missingInstall nvidia-container-toolkit, restart Docker
Server responds with 404 or connection refusedServer not fully initializedWait 60 seconds, check container logs
Out of memory errors during model loadingInsufficient GPU memoryUse smaller model or increase --tp parameter
Model download failsNetwork connectivity issuesCheck internet connection, retry download
Permission denied accessing /tmpVolume mount issuesUse full path: -v /tmp:/tmp or create dedicated directory

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'