Image & Video Generation with ComfyUI

Common issues

Symptom	Cause	Fix
"permission denied" when running docker	User not in docker group	Run `sudo usermod -aG docker $USER && newgrp docker`
Container fails to start with GPU error	NVIDIA Container Toolkit not configured	Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker
ComfyUI web UI not accessible	Firewall blocking port or wrong IP	Verify with `docker logs comfyui`, check that port 8188 is open, use `http://<STATION_IP>:8188`
"Model file not found" when running workflow	Model not downloaded or wrong path	Verify models are in `./models/` and the volume mount is correct in the docker run command
HuggingFace download fails with 401	Invalid or missing HF token	Verify `HF_TOKEN` is exported and valid at huggingface.co/settings/tokens
CUDA out of memory during video generation	Frame count or resolution too high	Reduce frame count or resolution. At 720p with Wan 2.1 14B, keep clips under 5 seconds initially
CUDA out of memory during 1080p HunyuanVideo	Model + video tensors exceed GPU memory	Use fewer frames (e.g., 49 instead of 97). HunyuanVideo at 1080p needs ~100-120 GB
Workflow loads but nodes show red "missing"	Custom node not installed	Use ComfyUI-Manager (click Manager → Install Missing Custom Nodes) or rebuild the Docker image
Video output is a black screen	VAE decode issue or wrong model variant	Ensure you are using the correct model variant (T2V vs I2V) and the VAE is loaded
Very slow generation, GPU utilization low	PyTorch not using GPU or wrong CUDA version	Run `nvidia-smi` inside container: `docker exec comfyui nvidia-smi`. Ensure GPU is visible
"No module named ..." error on startup	Custom node dependency not installed	Exec into container and install: `docker exec comfyui pip install <module>` then restart
Docker build fails on ARM64 with `Could not find a version that satisfies the requirement onnxruntime-gpu`	`onnxruntime-gpu` has no aarch64 wheel on PyPI	Already handled by the shipped Dockerfile, which `sed`-substitutes `onnxruntime-gpu` → `onnxruntime` (CPU build) in every custom_node `requirements.txt` before `pip install`. If you see this error, you are building from a Dockerfile predating that fix — pull the latest assets and rebuild.
Docker build fails on ARM64 (other packages)	Some custom-node dependencies have no aarch64 wheel	Find the failing package in the build log. The custom-node install loop is wrapped in `\|\| true`, so the build still completes but the affected node will be missing modules at runtime. Either skip the node (remove its directory from `custom_nodes/` in the Dockerfile clone block) or install via ComfyUI-Manager after launch with a manually built wheel.
NGC image pull requires authentication	NGC registry needs login	Run `docker login nvcr.io` with your NGC API key
`device >= 0 && device < num_gpus INTERNAL ASSERT FAILED` on startup	Using `--gpus all` on a multi-GPU system causes a PyTorch assertion	Use `--gpus '"device=N"'` to target the GB300 specifically (check index with `nvidia-smi`)
`No HiDream models available` warning on startup	HiDream custom node reports no models found	This is a warning, not an error. It clears once HiDream model files are downloaded (Tier 2)
Web UI: "Error: the workflow does not contain any nodes" when using Load	The file is API format (flat `node_id → {class_type, inputs}`), not a UI workflow	In the playbook, use `assets/workflows/<name>.json` in the Load dialog (under user/default/workflows inside the container). For `curl` / HTTP API, use `assets/workflow_api/<name>.api.json` inside `{"prompt": ...}`.
`huggingface-cli: command not found` or download script errors	Deprecated CLI name	Install `huggingface_hub` and use `hf download` (the script does this automatically).
Download script exits but `models/diffusion_models/` is empty	Silent failure in older scripts or wrong token	Re-run with `bash -x assets/scripts/download-models.sh 1`; confirm `HF_TOKEN` and license acceptance on Hugging Face. The script now fails fast if a file is missing after `hf download`.
Container exits on startup with `ModuleNotFoundError: torchaudio`	Container was built from a Dockerfile predating the torchaudio shim	Rebuild the image: `docker build -t comfyui-gb300 -f assets/Dockerfile .`. The shipped Dockerfile creates an import-only `torchaudio` stub (NGC PyTorch's custom NVFP4 ABI is incompatible with PyPI torchaudio wheels). Lightricks audio VAE workflows are not supported in this image; no other workflow needs torchaudio.
`OSError: ... undefined symbol: torch_dtype_float4_e2m1fn_x2` from torchaudio	Real torchaudio installed on top of NGC PyTorch	Same fix as above — rebuild from the shipped Dockerfile. Do not `pip install torchaudio` manually inside the container.
`DWPose: Onnxruntime not found or doesn't come with acceleration providers, switch to OpenCV with CPU device`	Expected on aarch64. PyPI has no `onnxruntime-gpu` wheel for arm64; the Dockerfile substitutes the CPU `onnxruntime` package	Informational warning, not an error. DWPose preprocessing runs on CPU (slower than GPU) but produces correct output.
`aimdo: ... funchook_prepare(cuMemFree_v2) failed: 8 Failed to allocate memory in unused regions` at startup	NGC PyTorch's CUDA-hooks diagnostic tool (`aimdo`) cannot install hooks under default container caps and falls back to no-op	Benign. ComfyUI works normally; the message is informational from the NGC base image. No action required.
`RequestsDependencyWarning: urllib3 (...) or charset_normalizer (...) doesn't match a supported version!` at startup	Version skew between `requests` and the NGC-pinned `urllib3` / `charset_normalizer` wheels	Benign. ComfyUI's HTTP traffic still works. Suppress with `PYTHONWARNINGS=ignore::requests.RequestsDependencyWarning` if it bothers you.

NOTE

ComfyUI logs are visible with docker logs -f comfyui. Most errors (missing models, node failures) are reported in these logs with clear messages.

Image & Video Generation with ComfyUI

Common issues

Resources