NemoClaw with Nemotron-3-Super and Telegram on DGX Spark

30 MINS

Install NemoClaw on DGX Spark with local Ollama inference and Telegram bot integration

SymptomCauseFix
nemoclaw: command not found after installShell PATH not updatedRun source ~/.bashrc (or source ~/.zshrc for zsh), or open a new terminal window.
Installer fails with Node.js version errorNode.js version below 20Install Node.js 20+: curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - && sudo apt-get install -y nodejs then re-run the installer.
npm install fails with EACCES permission errornpm global directory not writablemkdir -p ~/.npm-global && npm config set prefix ~/.npm-global && export PATH=~/.npm-global/bin:$PATH then re-run the installer. Add the export line to ~/.bashrc to make it permanent.
Docker permission deniedUser not in docker groupsudo usermod -aG docker $USER, then log out and back in.
Gateway fails with cgroup / "Failed to start ContainerManager" errorsDocker not configured for host cgroup namespace on DGX SparkRun the cgroup fix: sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)" then sudo systemctl restart docker. Alternatively, run sudo nemoclaw setup-spark which applies this fix automatically.
Gateway fails with "port 8080 is held by container..."Another OpenShell gateway or container is using port 8080Stop the conflicting container: openshell gateway destroy -g <old-gateway-name> or docker stop <container-name> && docker rm <container-name>, then retry nemoclaw onboard.
Sandbox creation failsStale gateway state or DNS not propagatedRun openshell gateway destroy && openshell gateway start, then re-run the installer or nemoclaw onboard.
CoreDNS crash loopKnown issue on some DGX Spark configurationsRun sudo ./scripts/fix-coredns.sh from the NemoClaw repo directory.
"No GPU detected" during onboardDGX Spark GB10 reports unified memory differentlyExpected on DGX Spark. The wizard still works and uses Ollama for inference.
Inference timeout or hangsOllama not running or not reachableCheck Ollama: curl http://localhost:11434. If not running: ollama serve &. If running but unreachable from sandbox, ensure Ollama is configured to listen on 0.0.0.0 (see Step 2 in Instructions).
Agent gives no response or is very slowNormal for 120B model running locallyNemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: nemoclaw my-assistant status.
Port 18789 already in useAnother process is bound to the portlsof -i :18789 then kill <PID>. If needed, kill -9 <PID> to force-terminate.
Web UI port forward dies or dashboard unreachablePort forward not activeopenshell forward stop 18789 my-assistant then openshell forward start 18789 my-assistant --background.
Web UI shows origin not allowedAccessing via localhost instead of 127.0.0.1Use http://127.0.0.1:18789/#token=... in the browser. The gateway origin check requires 127.0.0.1 exactly.
Telegram bridge does not startMissing environment variablesEnsure TELEGRAM_BOT_TOKEN and SANDBOX_NAME are set on the host. SANDBOX_NAME must match the sandbox name from onboarding.
Telegram bridge needs restart but nemoclaw stop does not workKnown bug in nemoclaw stopFind the PID from the nemoclaw start output, force-kill with kill -9 <PID>, then run nemoclaw start again.
Telegram bot receives messages but does not replyTelegram policy not added to sandboxRun nemoclaw my-assistant policy-add, type telegram, hit Y. Then restart the bridge with nemoclaw start.

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For the latest known issues, please review the DGX Spark User Guide.