| Symptom | Cause | Fix |
|---|---|---|
nemoclaw: command not found after install | Shell PATH not updated | Run source ~/.bashrc (or source ~/.zshrc for zsh), or open a new terminal window. |
| Installer fails with Node.js version error | Node.js version below 20 | Install Node.js 20+: curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - && sudo apt-get install -y nodejs then re-run the installer. |
npm install fails with EACCES permission error | npm global directory not writable | mkdir -p ~/.npm-global && npm config set prefix ~/.npm-global && export PATH=~/.npm-global/bin:$PATH then re-run the installer. Add the export line to ~/.bashrc to make it permanent. |
| Docker permission denied | User not in docker group | sudo usermod -aG docker $USER, then log out and back in. |
| Gateway fails with cgroup / "Failed to start ContainerManager" errors | Docker not configured for host cgroup namespace on DGX Spark | Run the cgroup fix: sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)" then sudo systemctl restart docker. Alternatively, run sudo nemoclaw setup-spark which applies this fix automatically. |
| Gateway fails with "port 8080 is held by container..." | Another OpenShell gateway or container is using port 8080 | Stop the conflicting container: openshell gateway destroy -g <old-gateway-name> or docker stop <container-name> && docker rm <container-name>, then retry nemoclaw onboard. |
| Sandbox creation fails | Stale gateway state or DNS not propagated | Run openshell gateway destroy && openshell gateway start, then re-run the installer or nemoclaw onboard. |
| CoreDNS crash loop | Known issue on some DGX Spark configurations | Run sudo ./scripts/fix-coredns.sh from the NemoClaw repo directory. |
| "No GPU detected" during onboard | DGX Spark GB10 reports unified memory differently | Expected on DGX Spark. The wizard still works and uses Ollama for inference. |
| Inference timeout or hangs | Ollama not running or not reachable | Check Ollama: curl http://localhost:11434. If not running: ollama serve &. If running but unreachable from sandbox, ensure Ollama is configured to listen on 0.0.0.0 (see Step 2 in Instructions). |
| Agent gives no response or is very slow | Normal for 120B model running locally | Nemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: nemoclaw my-assistant status. |
| Port 18789 already in use | Another process is bound to the port | lsof -i :18789 then kill <PID>. If needed, kill -9 <PID> to force-terminate. |
| Web UI port forward dies or dashboard unreachable | Port forward not active | openshell forward stop 18789 my-assistant then openshell forward start 18789 my-assistant --background. |
Web UI shows origin not allowed | Accessing via localhost instead of 127.0.0.1 | Use http://127.0.0.1:18789/#token=... in the browser. The gateway origin check requires 127.0.0.1 exactly. |
| Telegram bridge does not start | Missing environment variables | Ensure TELEGRAM_BOT_TOKEN and SANDBOX_NAME are set on the host. SANDBOX_NAME must match the sandbox name from onboarding. |
Telegram bridge needs restart but nemoclaw stop does not work | Known bug in nemoclaw stop | Find the PID from the nemoclaw start output, force-kill with kill -9 <PID>, then run nemoclaw start again. |
| Telegram bot receives messages but does not reply | Telegram policy not added to sandbox | Run nemoclaw my-assistant policy-add, type telegram, hit Y. Then restart the bridge with nemoclaw start. |
NOTE
DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
For the latest known issues, please review the DGX Spark User Guide.