Connect two Spark devices and setup them up for inference and fine-tuning
Configure two DGX Spark systems for high-speed inter-node communication using 200GbE direct QSFP connections. This setup enables distributed workloads across multiple DGX Spark nodes by establishing network connectivity and configuring SSH authentication.
You will physically connect two DGX Spark devices with a QSFP cable, configure network interfaces for cluster communication, and establish passwordless SSH between nodes to create a functional distributed computing environment.
sudo whoami
All required files for this playbook can be found here on GitHub
Duration: 1 hour including validation
Risk level: Medium - involves network reconfiguration
Rollback: Network changes can be reversed by removing netplan configs or IP assignments