Install NemoClaw on DGX Station with local vLLM inference and Telegram bot integration
| Symptom | Cause | Fix |
|---|---|---|
openclaw agent --local fails or is blocked inside the sandbox | --local bypasses the NemoClaw gateway and is disallowed in the OpenShell sandbox | Use gateway mode: openclaw agent --agent main -m "hello" --session-id test (no --local). |
| Onboard fails with “K8s namespace not ready” (or similar) with no clear reason | Often low disk space on / or Docker’s data root; image push / k3s need headroom | Run df -h / /var/lib/docker. Free at least ~40 GB (see NemoClaw quickstart prerequisites); prune Docker (docker system prune) or expand disk, then retry onboard. |
| vLLM warns about mixed devices or loads on an unexpected GPU | Multiple GPUs visible; default visibility does not match intent | Pin one GPU: --gpus '"device=0"' and -e CUDA_VISIBLE_DEVICES=0 with --tensor-parallel-size 1, or use two GPUs explicitly with --tensor-parallel-size 2 and -e CUDA_VISIBLE_DEVICES=0,1 (see Step 3 in instructions). |
nemoclaw: command not found after install | Shell PATH not updated | Run source ~/.bashrc (or source ~/.zshrc for zsh), or open a new terminal window. |
pip: command not found | pip not installed on DGX Station by default | Install pip: sudo apt install -y python3-pip. Then use pip3 install --break-system-packages huggingface-hub. |
huggingface-cli is deprecated | Hugging Face CLI was renamed | Use hf download instead of huggingface-cli download. |
| vLLM container won't start or crashes | GPU memory issue or wrong image | Check logs: docker logs vllm-nemotron. If CUDA OOM, reduce context: recreate the container with --max-model-len 8192. Ensure you are using the NVIDIA container image (nvcr.io/nvidia/vllm:26.03-py3), not the community vllm/vllm-openai image. |
vLLM logs show Application startup complete. but curl times out | vLLM still compiling CUDA graphs after startup | Wait 1--2 minutes after Application startup complete. before sending requests. The first request compiles CUDA graphs and may take 30--90 seconds. |
| NemoClaw onboard fails with "endpoint validation failed" | vLLM model not warmed up or validation timeout too short | Warm up the model first: curl -s --max-time 120 http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'. Then re-run with NEMOCLAW_EXPERIMENTAL=1 NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 nemoclaw onboard. |
| NemoClaw reports "provider 'vllm' is not available" | Missing experimental flag | Set NEMOCLAW_EXPERIMENTAL=1 before running the installer or nemoclaw onboard. The vLLM provider is currently an experimental feature. |
| Docker permission denied | User not in docker group | sudo usermod -aG docker $USER, then log out and back in. |
| Gateway fails with cgroup / "Failed to start ContainerManager" errors | Docker not configured for host cgroup namespace on DGX Station | Run the cgroup fix: sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)" then sudo systemctl restart docker. |
| Gateway fails with "port 8080 is held by container..." | Another OpenShell gateway or container is using port 8080 | Stop the conflicting container: openshell gateway destroy -g <old-gateway-name> or docker stop <container-name> && docker rm <container-name>, then retry nemoclaw onboard. |
| Sandbox cannot reach the inference server | Using localhost instead of host.openshell.internal in endpoint URL | Inside the sandbox, localhost refers to the sandbox container, not the host. The onboard wizard configures host.openshell.internal automatically. Verify from inside the sandbox: curl -sf https://inference.local/v1/models. If this fails, check that vLLM is reachable from the host: curl -s http://localhost:8000/v1/models. |
| Agent gives no response or is very slow | Normal for 120B model running locally | Nemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: nemoclaw my-assistant status. |
| vLLM API returns empty or errors on tool calls | Missing tool-call flags | Verify that --enable-auto-tool-choice and --tool-call-parser qwen3_xml are set: docker inspect vllm-nemotron --format '{{.Config.Cmd}}'. |
| Port 18789 already in use | Another process is bound to the port | lsof -i :18789 then kill <PID>. If needed, kill -9 <PID> to force-terminate. |
| Web UI port forward dies or dashboard unreachable | Port forward not active | openshell forward stop 18789 my-assistant then openshell forward start 18789 my-assistant --background. Always pass port and sandbox name to openshell forward stop. |
Web UI shows origin not allowed | Browser origin does not match what the gateway expects | On the DGX Station local desktop, open http://127.0.0.1:18789/#token=... (not localhost). Through an SSH tunnel on another machine, localhost vs 127.0.0.1 in the client browser usually both work because the check applies to how you reach the forwarded port locally. |
Telegram does not work after install; nemoclaw start does nothing for Telegram | nemoclaw start starts optional host services (e.g. cloudflared), not the Telegram bridge | Configure Telegram during onboard, or on the host run nemoclaw my-assistant channels add telegram (and rebuild), after policy-add for the telegram preset. See Set up Telegram bridge. |
| Telegram bot receives messages but does not reply | Telegram policy not added to sandbox | Run nemoclaw my-assistant policy-add, type telegram, hit Y. Ensure the channel was added with nemoclaw my-assistant channels add telegram so the image includes Telegram. |
docker: Error response from daemon: Conflict. The container name "/vllm-nemotron" is already in use | Previous cleanup used docker stop only | docker rm -f vllm-nemotron (or docker update --restart=no then docker stop and docker rm). The playbook uses --restart unless-stopped; stopping alone leaves a restart policy and reserved name. |
Model variant guidance:
| Variant | Size | VRAM Required | When to Use |
|---|---|---|---|
NVFP4 | ~60 GB | ~80 GB | Default for DGX Station (GB300). Fits on single GPU with room for large KV cache. |
FP8 | ~120 GB | ~140 GB | Higher accuracy, still fits on GB300. Add --kv-cache-dtype fp8 to the vLLM command. |
BF16 | ~240 GB | ~260 GB | Highest accuracy. Fits on GB300 but leaves little room for KV cache. Reduce --max-model-len. |
For the latest known issues, see DGX Station documentation.