NVIDIA
Explore
Models
Blueprints
GPUs
Docs
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • CUDA-X Data Science
  • Optimized JAX
  • Text to Knowledge Graph

tools

  • VS Code
  • DGX Dashboard
  • Comfy UI
  • RAG application in AI Workbench
  • Set up Tailscale on your Spark

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine tune with Pytorch
  • Unsloth on DGX Spark
  • Vision-Language Model Fine-tuning

use case

  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • NCCL for Two Sparks
  • Connect Two Sparks
  • Video Search and Summarization

inference

  • Multi-modal Inference
  • NIM on Spark
  • NVFP4 Quantization
  • Speculative Decoding
  • TRT LLM for Inference
  • Install and Use vLLM for Inference
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

NCCL for Two Sparks

30 MIN

Install and test NCCL on two Sparks

OverviewRun on two SparksTroubleshooting

Common issues for running on two Spark

IssueCauseSolution
mpirun hangs or times outSSH connectivity issues1. Test basic SSH connectivity: ssh <remote_ip> should work without password prompts
2. Try a simple mpirun test: mpirun -np 2 -H <IP for Node 1>:1,<IP for Node 2>:1 hostname
3. Verify SSH keys are setup correctly for all nodes
Network interface not foundWrong interface name or down statusCheck interface status with ibdev2netdev and verify IP configuration
NCCL build failsMissing dependencies such as OpenMPI or incorrect CUDA versionVerify CUDA installation and required libraries are present

Resources

  • NCCL Documentation
  • DGX Spark Documentation
  • DGX Spark Forum