Run NemoClaw with a Local LLM

Symptom	Cause	Fix
`nemoclaw: command not found` after install	Shell PATH not updated	Run `source ~/.bashrc` (or `source ~/.zshrc` for zsh), or open a new terminal window.
Installer fails with Node.js version error	Node.js version below 22.16	Install Node.js 22.16+: `curl -fsSL https://deb.nodesource.com/setup_22.x \| sudo -E bash - && sudo apt-get install -y nodejs` then re-run the installer.
npm install fails with `EACCES` permission error	npm global directory not writable	`mkdir -p ~/.npm-global && npm config set prefix ~/.npm-global && export PATH=~/.npm-global/bin:$PATH` then re-run the installer. Add the `export` line to `~/.bashrc` to make it permanent.
Docker permission denied	User not in docker group	`sudo usermod -aG docker $USER`, then log out and back in.
Gateway fails with cgroup / "Failed to start ContainerManager" errors	Older OpenShell or Docker still using a private cgroup namespace for the gateway so kubelet cannot see cgroup v2 controllers	First upgrade OpenShell (re-run the Phase 1 `nemoclaw.sh` install so you get a build that sets host cgroupns on the gateway container). If it still fails, force Docker's default to host mode by running the daemon.json cgroup fix below, then run `sudo systemctl restart docker`.
Gateway fails with "port 8080 is held by container..."	Another OpenShell gateway or container is using port 8080	Run `nemoclaw onboard` (or `nemoclaw onboard --resume`) again. NemoClaw probes the existing managed gateway, reuses it if healthy, and recreates stale gateway state when it can do so safely. See the NemoClaw commands documentation.
Sandbox creation fails	Stale gateway state or DNS not propagated	Run `nemoclaw onboard` (or `nemoclaw onboard --resume`) again. NemoClaw probes the existing managed gateway, reuses it if healthy, and recreates stale gateway state when it can do so safely. See the NemoClaw commands documentation.
CoreDNS crash loop	Known issue on some DGX Spark configurations	Re-run the NemoClaw installer (`curl -fsSL https://www.nvidia.com/nemoclaw.sh \| bash`) which includes the CoreDNS fix. If the issue persists, see NemoClaw troubleshooting.
"No GPU detected" during onboard	DGX Spark GB10 reports unified memory differently	Expected on DGX Spark. The wizard still works and uses vLLM for inference.
Inference timeout or hangs	vLLM not running or not reachable	Check the vLLM server: `curl http://127.0.0.1:8000/v1/models` should list `nvidia/Qwen3.6-35B-A3B-NVFP4`. If it hangs, the model may still be loading — wait for `Application startup complete`. Then check `nemoclaw my-assistant status` for the Inference health line.
Agent gives no response or is very slow	First response can be slow, especially with larger models	Response time depends on model size (30B: a few seconds, 120B: 30–90 seconds). Verify inference route: `nemoclaw my-assistant status`.
Port 18789 already in use	Another process is bound to the port	`lsof -i :18789` then `kill <PID>`. If needed, `kill -9 <PID>` to force-terminate.
Web UI port forward dies or dashboard unreachable	Port forward not active	`openshell forward stop 18789 my-assistant` then `openshell forward start 18789 my-assistant --background`.
Web UI shows `origin not allowed`	Accessing via `localhost` instead of `127.0.0.1`	Use `http://127.0.0.1:18789/#token=...` in the browser. The gateway origin check requires `127.0.0.1` exactly.
Telegram bridge does not start	Telegram channel is not configured, the sandbox gateway is unhealthy, or Telegram startup/config failed	Run `nemoclaw <name> status` and `nemoclaw <name> logs` to confirm the failure. If the sandbox gateway is unhealthy, run `nemoclaw <name> recover`. If Telegram is not configured, rerun `nemoclaw onboard` and enable Telegram during onboarding. See the NemoClaw Troubleshooting guide.
Telegram stops responding after sandbox rebuild	Duplicate bot-token consumer, missing DM allowlist, BotFather group privacy mode, inference failure, policy denial, or rebuilt channel config issue	Run `nemoclaw <name> status` and `nemoclaw <name> logs`. Look for Telegram 409 Conflict, allowlist warnings, privacy-mode issues, inference errors, or policy denials. If configuration needs to change, rerun `nemoclaw onboard`. See the NemoClaw Troubleshooting guide.
Telegram bot receives messages but does not reply	Inbound Telegram delivery works, but the agent turn, inference call, policy check, allowlist/mention gate, or outbound reply failed	Run `nemoclaw <name> status` and `nemoclaw <name> logs`. Check for inbound Telegram update, outbound send, inference, and policy-denial messages. Fix the logged cause; run `nemoclaw <name> recover` only if the sandbox gateway is unhealthy. For Telegram configuration changes, rerun `nemoclaw onboard`. See the NemoClaw Troubleshooting guide.

daemon.json cgroup fix

Use this script as the fallback for the cgroup / "Failed to start ContainerManager" row above. It validates any existing /etc/docker/daemon.json, writes a .bak backup, sets default-cgroupns-mode to host, and atomically replaces the file. It exits non-zero with an error on stderr if anything fails, leaving the original daemon.json untouched.

sudo python3 - <<'PY'
import json, os, shutil, sys, tempfile

path = '/etc/docker/daemon.json'
try:
    if os.path.exists(path):
        with open(path) as f:
            data = json.load(f)
        if not isinstance(data, dict):
            raise ValueError(f'{path} is not a JSON object')
    else:
        data = {}
except (json.JSONDecodeError, ValueError, OSError) as e:
    print(f'error: failed to read {path}: {e}', file=sys.stderr)
    sys.exit(1)

if os.path.exists(path):
    try:
        shutil.copy2(path, path + '.bak')
    except OSError as e:
        print(f'error: failed to back up {path}: {e}', file=sys.stderr)
        sys.exit(1)

data['default-cgroupns-mode'] = 'host'

target_dir = os.path.dirname(path) or '/'
fd, tmp = tempfile.mkstemp(prefix='daemon.json.', dir=target_dir)
try:
    with os.fdopen(fd, 'w') as f:
        json.dump(data, f, indent=2)
        f.write('\n')
    os.chmod(tmp, 0o644)
    os.replace(tmp, path)
except OSError as e:
    if os.path.exists(tmp):
        try:
            os.unlink(tmp)
        except OSError:
            pass
    print(f'error: failed to write {path}: {e}', file=sys.stderr)
    sys.exit(1)
PY

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For the latest known issues, please review the DGX Spark User Guide.

Run NemoClaw with a Local LLM

daemon.json cgroup fix

Resources