Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station
| Symptom | Cause | Fix |
|---|---|---|
| "permission denied" when running docker | User not in docker group | Run sudo usermod -aG docker $USER && newgrp docker |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run nvidia-ctk runtime configure --runtime=docker and restart Docker |
| ComfyUI web UI not accessible | Firewall blocking port or wrong IP | Verify with docker logs comfyui, check that port 8188 is open, use http://<STATION_IP>:8188 |
| "Model file not found" when running workflow | Model not downloaded or wrong path | Verify models are in ./models/ and the volume mount is correct in the docker run command |
| HuggingFace download fails with 401 | Invalid or missing HF token | Verify HF_TOKEN is exported and valid at huggingface.co/settings/tokens |
| CUDA out of memory during video generation | Frame count or resolution too high | Reduce frame count or resolution. At 720p with Wan 2.1 14B, keep clips under 5 seconds initially |
| CUDA out of memory during 1080p HunyuanVideo | Model + video tensors exceed GPU memory | Use fewer frames (e.g., 49 instead of 97). HunyuanVideo at 1080p needs ~100-120 GB |
| Workflow loads but nodes show red "missing" | Custom node not installed | Use ComfyUI-Manager (click Manager → Install Missing Custom Nodes) or rebuild the Docker image |
| Video output is a black screen | VAE decode issue or wrong model variant | Ensure you are using the correct model variant (T2V vs I2V) and the VAE is loaded |
| Very slow generation, GPU utilization low | PyTorch not using GPU or wrong CUDA version | Run nvidia-smi inside container: docker exec comfyui nvidia-smi. Ensure GPU is visible |
| "No module named ..." error on startup | Custom node dependency not installed | Exec into container and install: docker exec comfyui pip install <module> then restart |
Docker build fails on ARM64 with Could not find a version that satisfies the requirement onnxruntime-gpu | onnxruntime-gpu has no aarch64 wheel on PyPI | Already handled by the shipped Dockerfile, which sed-substitutes onnxruntime-gpu → onnxruntime (CPU build) in every custom_node requirements.txt before pip install. If you see this error, you are building from a Dockerfile predating that fix — pull the latest assets and rebuild. |
| Docker build fails on ARM64 (other packages) | Some custom-node dependencies have no aarch64 wheel | Find the failing package in the build log. The custom-node install loop is wrapped in || true, so the build still completes but the affected node will be missing modules at runtime. Either skip the node (remove its directory from custom_nodes/ in the Dockerfile clone block) or install via ComfyUI-Manager after launch with a manually built wheel. |
| NGC image pull requires authentication | NGC registry needs login | Run docker login nvcr.io with your NGC API key |
device >= 0 && device < num_gpus INTERNAL ASSERT FAILED on startup | Using --gpus all on a multi-GPU system causes a PyTorch assertion | Use --gpus '"device=N"' to target the GB300 specifically (check index with nvidia-smi) |
No HiDream models available warning on startup | HiDream custom node reports no models found | This is a warning, not an error. It clears once HiDream model files are downloaded (Tier 2) |
| Web UI: "Error: the workflow does not contain any nodes" when using Load | The file is API format (flat node_id → {class_type, inputs}), not a UI workflow | In the playbook, use assets/workflows/<name>.json in the Load dialog (under user/default/workflows inside the container). For curl / HTTP API, use assets/workflow_api/<name>.api.json inside {"prompt": ...}. |
huggingface-cli: command not found or download script errors | Deprecated CLI name | Install huggingface_hub and use hf download (the script does this automatically). |
Download script exits but models/diffusion_models/ is empty | Silent failure in older scripts or wrong token | Re-run with bash -x assets/scripts/download-models.sh 1; confirm HF_TOKEN and license acceptance on Hugging Face. The script now fails fast if a file is missing after hf download. |
Container exits on startup with ModuleNotFoundError: torchaudio | Container was built from a Dockerfile predating the torchaudio shim | Rebuild the image: docker build -t comfyui-gb300 -f assets/Dockerfile .. The shipped Dockerfile creates an import-only torchaudio stub (NGC PyTorch's custom NVFP4 ABI is incompatible with PyPI torchaudio wheels). Lightricks audio VAE workflows are not supported in this image; no other workflow needs torchaudio. |
OSError: ... undefined symbol: torch_dtype_float4_e2m1fn_x2 from torchaudio | Real torchaudio installed on top of NGC PyTorch | Same fix as above — rebuild from the shipped Dockerfile. Do not pip install torchaudio manually inside the container. |
DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device | Expected on aarch64. PyPI has no onnxruntime-gpu wheel for arm64; the Dockerfile substitutes the CPU onnxruntime package | Informational warning, not an error. DWPose preprocessing runs on CPU (slower than GPU) but produces correct output. |
aimdo: ... funchook_prepare(cuMemFree_v2) failed: 8 Failed to allocate memory in unused regions at startup | NGC PyTorch's CUDA-hooks diagnostic tool (aimdo) cannot install hooks under default container caps and falls back to no-op | Benign. ComfyUI works normally; the message is informational from the NGC base image. No action required. |
RequestsDependencyWarning: urllib3 (...) or charset_normalizer (...) doesn't match a supported version! at startup | Version skew between requests and the NGC-pinned urllib3 / charset_normalizer wheels | Benign. ComfyUI's HTTP traffic still works. Suppress with PYTHONWARNINGS=ignore::requests.RequestsDependencyWarning if it bothers you. |
NOTE
ComfyUI logs are visible with docker logs -f comfyui. Most errors (missing models, node failures) are reported in these logs with clear messages.