Spark & Reachy Photo Booth

Basic idea

Spark & Reachy Photo Booth is an interactive and event-driven photo booth demo that combines the DGX Spark™ with the Reachy Mini robot to create an engaging multimodal AI experience. The system showcases:

A multi-modal agent built with the NeMo Agent Toolkit
A ReAct loop driven by the openai/gpt-oss-20b LLM powered by TensorRT-LLM
Voice interaction based on nvidia/riva-parakeet-ctc-1.1B and hexgrad/Kokoro-82M
Image generation with black-forest-labs/FLUX.1-Kontext-dev for image-to-image restyling
User position tracking built with facebookresearch/detectron2 and FoundationVision/ByteTrack
MinIO for storing captured/generated images as well as sharing them via QR-code

The demo is based on a several services that communicate through a message bus.

NOTE

This playbook applies to both the Reachy Mini and Reachy Mini Lite robots. For simplicity, we’ll refer to the robot as Reachy throughout this playbook.

What you'll accomplish

You'll deploy a complete photo booth system on DGX Spark running multiple inference models locally — LLM, image generation, speech recognition, speech generation, and computer vision — all without cloud dependencies. The Reachy robot interacts with users through natural conversation, captures photos, and generates custom images based on prompts, demonstrating real-time multimodal AI processing on edge hardware.

What to know before starting

Basic Docker and Docker Compose knowledge
Basic network configuration skills

Prerequisites

Hardware Requirements:

NVIDIA DGX Spark
A monitor, a keyboard, and a mouse to run this playbook directly on the DGX Spark.
Reachy Mini or Reachy Mini Lite robot

TIP

Make sure your Reachy robot firmware is up to date. You can find instructions to update it here. Software Requirements:

The official DGX Spark OS image including all required utilities such as Git, Docker, NVIDIA drivers, and the NVIDIA Container Toolkit
An internet connection for the DGX Spark
NVIDIA NGC Personal API Key (NVIDIA_API_KEY). Create a key if necessary. Make sure to enable the NGC Catalog scope when creating the key.
Hugging Face access token (HF_TOKEN). Create a token if necessary. Make sure to create a token with Read access to contents of all public gated repos you can access permission.

Ancillary files

All required assets can be found in the Spark & Reachy Photo Booth repository.

The Docker Compose application
Various configuration files
Source code for all the services
Detailed documentation

Time & risk

Estimated time: 2 hours including hardware setup, container building, and model downloads
Risk level: Medium
Rollback: Docker containers can be stopped and removed to free resources. Downloaded models can be deleted from cache directories. Robot and peripheral connections can be safely disconnected. Network configurations can be reverted by removing custom settings.
Last Updated: 01/27/2026
- 1.0.0 First Publication

Governing terms

Your use of the Spark Playbook scripts is governed by Apache License, Version 2.0 and enables use of separate open source and proprietary software governed by their respective licenses: Flux.1-Kontext NIM, Parakeet 1.1b CTC en-US ASR NIM, TensorRT-LLM, minio/minio, arizephoenix/phoenix, grafana/otel-lgtm, Python, Node.js, nginx, busybox, UV Python Packager, Redpanda, Redpanda Console, gpt-oss-20b, FLUX.1-Kontext-dev, FLUX.1-Kontext-dev-onnx.