NemoClaw with Nemotron-3-Super and vLLM on DGX Station

Symptom	Cause	Fix
`openclaw agent --local` fails or is blocked inside the sandbox	`--local` bypasses the NemoClaw gateway and is disallowed in the OpenShell sandbox	Use gateway mode: `openclaw agent --agent main -m "hello" --session-id test` (no `--local`).
Onboard fails with “K8s namespace not ready” (or similar) with no clear reason	Often low disk space on `/` or Docker’s data root; image push / k3s need headroom	Run `df -h / /var/lib/docker`. Free at least ~40 GB (see NemoClaw quickstart prerequisites); prune Docker (`docker system prune`) or expand disk, then retry onboard.
vLLM warns about mixed devices or loads on an unexpected GPU	Multiple GPUs visible; default visibility does not match intent	Pin one GPU: `--gpus '"device=0"'` and `-e CUDA_VISIBLE_DEVICES=0` with `--tensor-parallel-size 1`, or use two GPUs explicitly with `--tensor-parallel-size 2` and `-e CUDA_VISIBLE_DEVICES=0,1` (see Step 3 in instructions).
`nemoclaw: command not found` after install	Shell PATH not updated	Run `source ~/.bashrc` (or `source ~/.zshrc` for zsh), or open a new terminal window.
`pip: command not found`	pip not installed on DGX Station by default	Install pip: `sudo apt install -y python3-pip`. Then use `pip3 install --break-system-packages huggingface-hub`.
`huggingface-cli` is deprecated	Hugging Face CLI was renamed	Use `hf download` instead of `huggingface-cli download`.
vLLM container won't start or crashes	GPU memory issue or wrong image	Check logs: `docker logs vllm-nemotron`. If CUDA OOM, reduce context: recreate the container with `--max-model-len 8192`. Ensure you are using the NVIDIA container image (`nvcr.io/nvidia/vllm:26.03-py3`), not the community `vllm/vllm-openai` image.
vLLM logs show `Application startup complete.` but `curl` times out	vLLM still compiling CUDA graphs after startup	Wait 1--2 minutes after `Application startup complete.` before sending requests. The first request compiles CUDA graphs and may take 30--90 seconds.
NemoClaw onboard fails with "endpoint validation failed"	vLLM model not warmed up or validation timeout too short	Warm up the model first: `curl -s --max-time 120 http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'`. Then re-run with `NEMOCLAW_EXPERIMENTAL=1 NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 nemoclaw onboard`.
NemoClaw reports "provider 'vllm' is not available"	Missing experimental flag	Set `NEMOCLAW_EXPERIMENTAL=1` before running the installer or `nemoclaw onboard`. The vLLM provider is currently an experimental feature.
Docker permission denied	User not in docker group	`sudo usermod -aG docker $USER`, then log out and back in.
Gateway fails with cgroup / "Failed to start ContainerManager" errors	Docker not configured for host cgroup namespace on DGX Station	Run the cgroup fix: `sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)"` then `sudo systemctl restart docker`.
Gateway fails with "port 8080 is held by container..."	Another OpenShell gateway or container is using port 8080	Stop the conflicting container: `openshell gateway destroy -g <old-gateway-name>` or `docker stop <container-name> && docker rm <container-name>`, then retry `nemoclaw onboard`.
Sandbox cannot reach the inference server	Using `localhost` instead of `host.openshell.internal` in endpoint URL	Inside the sandbox, `localhost` refers to the sandbox container, not the host. The onboard wizard configures `host.openshell.internal` automatically. Verify from inside the sandbox: `curl -sf https://inference.local/v1/models`. If this fails, check that vLLM is reachable from the host: `curl -s http://localhost:8000/v1/models`.
Agent gives no response or is very slow	Normal for 120B model running locally	Nemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: `nemoclaw my-assistant status`.
vLLM API returns empty or errors on tool calls	Missing tool-call flags	Verify that `--enable-auto-tool-choice` and `--tool-call-parser qwen3_xml` are set: `docker inspect vllm-nemotron --format '{{.Config.Cmd}}'`.
Port 18789 already in use	Another process is bound to the port	`lsof -i :18789` then `kill <PID>`. If needed, `kill -9 <PID>` to force-terminate.
Web UI port forward dies or dashboard unreachable	Port forward not active	`openshell forward stop 18789 my-assistant` then `openshell forward start 18789 my-assistant --background`. Always pass port and sandbox name to `openshell forward stop`.
Web UI shows `origin not allowed`	Browser origin does not match what the gateway expects	On the DGX Station local desktop, open `http://127.0.0.1:18789/#token=...` (not `localhost`). Through an SSH tunnel on another machine, `localhost` vs `127.0.0.1` in the client browser usually both work because the check applies to how you reach the forwarded port locally.
Telegram does not work after install; `nemoclaw start` does nothing for Telegram	`nemoclaw start` starts optional host services (e.g. cloudflared), not the Telegram bridge	Configure Telegram during onboard, or on the host run `nemoclaw my-assistant channels add telegram` (and rebuild), after `policy-add` for the `telegram` preset. See Set up Telegram bridge.
Telegram bot receives messages but does not reply	Telegram policy not added to sandbox	Run `nemoclaw my-assistant policy-add`, type `telegram`, hit Y. Ensure the channel was added with `nemoclaw my-assistant channels add telegram` so the image includes Telegram.
`docker: Error response from daemon: Conflict. The container name "/vllm-nemotron" is already in use`	Previous cleanup used `docker stop` only	`docker rm -f vllm-nemotron` (or `docker update --restart=no` then `docker stop` and `docker rm`). The playbook uses `--restart unless-stopped`; stopping alone leaves a restart policy and reserved name.

Model variant guidance:

Variant	Size	VRAM Required	When to Use
`NVFP4`	~60 GB	~80 GB	Default for DGX Station (GB300). Fits on single GPU with room for large KV cache.
`FP8`	~120 GB	~140 GB	Higher accuracy, still fits on GB300. Add `--kv-cache-dtype fp8` to the vLLM command.
`BF16`	~240 GB	~260 GB	Highest accuracy. Fits on GB300 but leaves little room for KV cache. Reduce `--max-model-len`.

For the latest known issues, see DGX Station documentation.

Symptom

Cause

Fix

openclaw agent --local fails or is blocked inside the sandbox

--local bypasses the NemoClaw gateway and is disallowed in the OpenShell sandbox

Use gateway mode: openclaw agent --agent main -m "hello" --session-id test (no --local).

Onboard fails with “K8s namespace not ready” (or similar) with no clear reason

Often low disk space on / or Docker’s data root; image push / k3s need headroom

Run df -h / /var/lib/docker. Free at least ~40 GB (see NemoClaw quickstart prerequisites); prune Docker (docker system prune) or expand disk, then retry onboard.

vLLM warns about mixed devices or loads on an unexpected GPU

Multiple GPUs visible; default visibility does not match intent

Pin one GPU: --gpus '"device=0"' and -e CUDA_VISIBLE_DEVICES=0 with --tensor-parallel-size 1, or use two GPUs explicitly with --tensor-parallel-size 2 and -e CUDA_VISIBLE_DEVICES=0,1 (see Step 3 in instructions).

nemoclaw: command not found after install

Shell PATH not updated

Run source ~/.bashrc (or source ~/.zshrc for zsh), or open a new terminal window.

pip: command not found

pip not installed on DGX Station by default

Install pip: sudo apt install -y python3-pip. Then use pip3 install --break-system-packages huggingface-hub.

huggingface-cli is deprecated

Hugging Face CLI was renamed

Use hf download instead of huggingface-cli download.

vLLM container won't start or crashes

GPU memory issue or wrong image

Check logs: docker logs vllm-nemotron. If CUDA OOM, reduce context: recreate the container with --max-model-len 8192. Ensure you are using the NVIDIA container image (nvcr.io/nvidia/vllm:26.03-py3), not the community vllm/vllm-openai image.

vLLM logs show Application startup complete. but curl times out

vLLM still compiling CUDA graphs after startup

Wait 1--2 minutes after Application startup complete. before sending requests. The first request compiles CUDA graphs and may take 30--90 seconds.

NemoClaw onboard fails with "endpoint validation failed"

vLLM model not warmed up or validation timeout too short

Warm up the model first:

curl -s --max-time 120 http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'

. Then re-run with NEMOCLAW_EXPERIMENTAL=1 NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 nemoclaw onboard.

NemoClaw reports "provider 'vllm' is not available"

Missing experimental flag

Set NEMOCLAW_EXPERIMENTAL=1 before running the installer or nemoclaw onboard. The vLLM provider is currently an experimental feature.

Docker permission denied

User not in docker group

sudo usermod -aG docker $USER, then log out and back in.

Gateway fails with cgroup / "Failed to start ContainerManager" errors

Docker not configured for host cgroup namespace on DGX Station

Run the cgroup fix:

sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)"

then sudo systemctl restart docker.

Gateway fails with "port 8080 is held by container..."

Another OpenShell gateway or container is using port 8080

Stop the conflicting container: openshell gateway destroy -g <old-gateway-name> or docker stop <container-name> && docker rm <container-name>, then retry nemoclaw onboard.

Sandbox cannot reach the inference server

Using localhost instead of host.openshell.internal in endpoint URL

Inside the sandbox, localhost refers to the sandbox container, not the host. The onboard wizard configures host.openshell.internal automatically. Verify from inside the sandbox: curl -sf https://inference.local/v1/models. If this fails, check that vLLM is reachable from the host: curl -s http://localhost:8000/v1/models.

Agent gives no response or is very slow

Normal for 120B model running locally

Nemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: nemoclaw my-assistant status.

vLLM API returns empty or errors on tool calls

Missing tool-call flags

Verify that --enable-auto-tool-choice and --tool-call-parser qwen3_xml are set: docker inspect vllm-nemotron --format '{{.Config.Cmd}}'.

Port 18789 already in use

Another process is bound to the port

lsof -i :18789 then kill <PID>. If needed, kill -9 <PID> to force-terminate.

Web UI port forward dies or dashboard unreachable

Port forward not active

openshell forward stop 18789 my-assistant then openshell forward start 18789 my-assistant --background. Always pass port and sandbox name to openshell forward stop.

Web UI shows origin not allowed

Browser origin does not match what the gateway expects

On the DGX Station local desktop, open http://127.0.0.1:18789/#token=... (not localhost). Through an SSH tunnel on another machine, localhost vs 127.0.0.1 in the client browser usually both work because the check applies to how you reach the forwarded port locally.

Telegram does not work after install; nemoclaw start does nothing for Telegram

nemoclaw start starts optional host services (e.g. cloudflared), not the Telegram bridge

Configure Telegram during onboard, or on the host run nemoclaw my-assistant channels add telegram (and rebuild), after policy-add for the telegram preset. See Set up Telegram bridge.

Telegram bot receives messages but does not reply

Telegram policy not added to sandbox

Run nemoclaw my-assistant policy-add, type telegram, hit Y. Ensure the channel was added with nemoclaw my-assistant channels add telegram so the image includes Telegram.

docker: Error response from daemon: Conflict. The container name "/vllm-nemotron" is already in use

Previous cleanup used docker stop only

docker rm -f vllm-nemotron (or docker update --restart=no then docker stop and docker rm). The playbook uses --restart unless-stopped; stopping alone leaves a restart policy and reserved name.

Variant

Size

VRAM Required

When to Use

NVFP4

~60 GB

~80 GB

Default for DGX Station (GB300). Fits on single GPU with room for large KV cache.

FP8

~120 GB

~140 GB

Higher accuracy, still fits on GB300. Add --kv-cache-dtype fp8 to the vLLM command.

BF16

~240 GB

~260 GB

Highest accuracy. Fits on GB300 but leaves little room for KV cache. Reduce --max-model-len.

NemoClaw with Nemotron-3-Super and vLLM on DGX Station

Resources

NemoClaw with Nemotron-3-Super and vLLM on DGX Station

Resources