Image & Video Generation with ComfyUI

Basic idea

ComfyUI is a node-based visual interface for building image and video generation workflows using diffusion models. Instead of a single text box, you connect processing nodes — model loaders, text encoders, samplers, decoders — into a graph that gives full control over every generation step.

Node-based workflows let you build, modify, and share complex generation pipelines visually.
Multi-model support covers the latest architectures: FLUX for images, Wan 2.1 and HunyuanVideo for video, and NVIDIA Cosmos for world generation.
Full precision on GB300 — with 252 GB of HBM3e, you can run 12–17B image models and 13–14B video models at bf16 with no quantization or offloading, which is impossible on consumer hardware.

What you'll accomplish

Deploy ComfyUI on DGX Station and run image and video generation workflows using six state-of-the-art models:

FLUX.1 [dev] (12B) — high-quality text-to-image generation
HiDream-I1 Full (17B) — the largest open image model, with four text encoders including Llama-3.1-8B
Wan 2.1 T2V/I2V 14B — text-to-video and image-to-video at 720p
HunyuanVideo (13B) — 1080p video generation leveraging the full GB300 memory (~100–120 GB VRAM)
NVIDIA Cosmos-Predict2 (14B) — NVIDIA's world foundation model for video-to-world generation

You will also learn advanced techniques including ControlNet-guided generation and combined image-to-video pipelines.

What to know before starting

Basic Docker container usage
Familiarity with generative AI concepts (prompts, diffusion models) is helpful but not required

Prerequisites

NVIDIA DGX Station with GB300 GPU
Docker installed: docker --version
NVIDIA Container Toolkit configured: nvidia-smi should show the GB300
HuggingFace account with access token: https://huggingface.co/settings/tokens
At least 200 GB free disk space for model weights
Network access to HuggingFace and GitHub

Ancillary files

All required assets can be found in the ComfyUI playbook repository.

assets/Dockerfile — Builds the ComfyUI container image from NGC PyTorch base (ARM64)
assets/scripts/download-models.sh — Downloads all model weights from Hugging Face using the hf CLI (huggingface-hub package)
assets/workflows/*.json — Eight UI workflows (ComfyUI 0.4 graph with nodes / links) for Load in the web UI
assets/workflow_api/*.api.json — The same eight graphs in API format for /prompt and automation (curl, scripts)
assets/scripts/api_to_ui_workflow.py — Regenerates UI JSON from API JSON if you edit a graph programmatically

Time & risk

Duration: 45 minutes (excluding model downloads, which may take 30–60 minutes depending on network speed)
Risks:
- Model downloads require HuggingFace authentication and substantial bandwidth (~150 GB total)
- Port 8188 must be accessible for the ComfyUI web interface
Rollback: Stop and remove the Docker container. Delete the models/ directory to reclaim disk space.
Last Updated: 05/26/2026
- First Publication