Spark & Reachy Photo Booth

2 HOURS

AI augmented photo booth using the DGX Spark and Reachy Mini.

Basic idea

Spark & Reachy Photo Booth is an interactive and event-driven photo booth demo that combines the DGX Spark™ with the Reachy Mini robot to create an engaging multimodal AI experience. The system showcases:

  • A multi-modal agent built with the NeMo Agent Toolkit
  • A ReAct loop driven by the openai/gpt-oss-20b LLM powered by TensorRT-LLM
  • Voice interaction based on nvidia/riva-parakeet-ctc-1.1B and hexgrad/Kokoro-82M
  • Image generation with black-forest-labs/FLUX.1-Kontext-dev for image-to-image restyling
  • User position tracking built with facebookresearch/detectron2 and FoundationVision/ByteTrack
  • MinIO for storing captured/generated images as well as sharing them via QR-code

The demo is based on a several services that communicate through a message bus.

NOTE

This playbook applies to both the Reachy Mini and Reachy Mini Lite robots. For simplicity, we’ll refer to the robot as Reachy throughout this playbook.

What you'll accomplish

You'll deploy a complete photo booth system on DGX Spark running multiple inference models locally — LLM, image generation, speech recognition, speech generation, and computer vision — all without cloud dependencies. The Reachy robot interacts with users through natural conversation, captures photos, and generates custom images based on prompts, demonstrating real-time multimodal AI processing on edge hardware.

What to know before starting

  • Basic Docker and Docker Compose knowledge
  • Basic network configuration skills

Prerequisites

Hardware Requirements:

TIP

Make sure your Reachy robot firmware is up to date. You can find instructions to update it here. Software Requirements:

  • The official DGX Spark OS image including all required utilities such as Git, Docker, NVIDIA drivers, and the NVIDIA Container Toolkit
  • An internet connection for the DGX Spark
  • NVIDIA NGC Personal API Key (NVIDIA_API_KEY). Create a key if necessary. Make sure to enable the NGC Catalog scope when creating the key.
  • Hugging Face access token (HF_TOKEN). Create a token if necessary. Make sure to create a token with Read access to contents of all public gated repos you can access permission.

Ancillary files

All required assets can be found in the Spark & Reachy Photo Booth repository.

  • The Docker Compose application
  • Various configuration files
  • Source code for all the services
  • Detailed documentation

Time & risk

  • Estimated time: 2 hours including hardware setup, container building, and model downloads
  • Risk level: Medium
  • Rollback: Docker containers can be stopped and removed to free resources. Downloaded models can be deleted from cache directories. Robot and peripheral connections can be safely disconnected. Network configurations can be reverted by removing custom settings.
  • Last Updated: 01/27/2026
    • 1.0.0 First Publication

Governing terms

Your use of the Spark Playbook scripts is governed by Apache License, Version 2.0 and enables use of separate open source and proprietary software governed by their respective licenses: Flux.1-Kontext NIM, Parakeet 1.1b CTC en-US ASR NIM, TensorRT-LLM, minio/minio, arizephoenix/phoenix, grafana/otel-lgtm, Python, Node.js, nginx, busybox, UV Python Packager, Redpanda, Redpanda Console, gpt-oss-20b, FLUX.1-Kontext-dev, FLUX.1-Kontext-dev-onnx.

NOTE

FLUX.1-Kontext-dev and FLUX.1-Kontext-dev-onnx are models released for non-commercial use. Contact sales@blackforestlabs.ai for commercial terms. You are responsible for accepting the applicable License Agreements and Acceptable Use Policies, and for ensuring your HF token has the correct permissions.