| "permission denied" when running docker | User not in docker group | Run sudo usermod -aG docker $USER && newgrp docker |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run nvidia-ctk runtime configure --runtime=docker and restart Docker |
| "Token is required" or 401 error | Missing HuggingFace token | Ensure HF_TOKEN is exported before running docker command |
| Model download hangs or fails | Network or authentication issue | Check internet connection, verify HF_TOKEN is valid |
| CUDA out of memory | Context length too large | Reduce MAX_MODEL_LEN or lower --gpu-memory-utilization |
| Server not responding on port 8000 | Port already in use | Check with lsof -i :8000, use -p 8001:8000 for different port |
| Model runs on wrong GPU | Default GPU selection | Use --gpus '"device=0"' to select specific GPU |
| NGC authentication fails | Invalid or missing credentials | Run docker login nvcr.io with NGC API key |
| EngineCore failed / FlashInfer "Buffer overflow when allocating memory for batch_prefill_tmp_v" | Known issue with vLLM 25.10 on some DGX Station setups during CUDA graph capture | Use the 26.01 container image: nvcr.io/nvidia/vllm:26.01-py3 instead of 25.10. |