NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • MIG on DGX Station

data science

  • Topic Modeling
  • Text to Knowledge Graph on DGX Station

tools

  • NVFP4 Quantization

fine tuning

  • Nanochat Training

use case

  • NemoClaw with Nemotron-3-Super and vLLM on DGX Station
  • Local Coding Agent
  • Secure Long Running AI Agents with OpenShell on DGX Station

inference

  • Serve Qwen3-235B with vLLM

NemoClaw with Nemotron-3-Super and vLLM on DGX Station

30 MINS

Install NemoClaw on DGX Station with local vLLM inference and Telegram bot integration

AI AgentDGXDGX StationGB300NemoClawNemotron-3-SuperOpenShellTelegramvLLM
NemoClaw on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting
SymptomCauseFix
openclaw agent --local fails or is blocked inside the sandbox--local bypasses the NemoClaw gateway and is disallowed in the OpenShell sandboxUse gateway mode: openclaw agent --agent main -m "hello" --session-id test (no --local).
Onboard fails with “K8s namespace not ready” (or similar) with no clear reasonOften low disk space on / or Docker’s data root; image push / k3s need headroomRun df -h / /var/lib/docker. Free at least ~40 GB (see NemoClaw quickstart prerequisites); prune Docker (docker system prune) or expand disk, then retry onboard.
vLLM warns about mixed devices or loads on an unexpected GPUMultiple GPUs visible; default visibility does not match intentPin one GPU: --gpus '"device=0"' and -e CUDA_VISIBLE_DEVICES=0 with --tensor-parallel-size 1, or use two GPUs explicitly with --tensor-parallel-size 2 and -e CUDA_VISIBLE_DEVICES=0,1 (see Step 3 in instructions).
nemoclaw: command not found after installShell PATH not updatedRun source ~/.bashrc (or source ~/.zshrc for zsh), or open a new terminal window.
pip: command not foundpip not installed on DGX Station by defaultInstall pip: sudo apt install -y python3-pip. Then use pip3 install --break-system-packages huggingface-hub.
huggingface-cli is deprecatedHugging Face CLI was renamedUse hf download instead of huggingface-cli download.
vLLM container won't start or crashesGPU memory issue or wrong imageCheck logs: docker logs vllm-nemotron. If CUDA OOM, reduce context: recreate the container with --max-model-len 8192. Ensure you are using the NVIDIA container image (nvcr.io/nvidia/vllm:26.03-py3), not the community vllm/vllm-openai image.
vLLM logs show Application startup complete. but curl times outvLLM still compiling CUDA graphs after startupWait 1--2 minutes after Application startup complete. before sending requests. The first request compiles CUDA graphs and may take 30--90 seconds.
NemoClaw onboard fails with "endpoint validation failed"vLLM model not warmed up or validation timeout too shortWarm up the model first: curl -s --max-time 120 http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'. Then re-run with NEMOCLAW_EXPERIMENTAL=1 NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 nemoclaw onboard.
NemoClaw reports "provider 'vllm' is not available"Missing experimental flagSet NEMOCLAW_EXPERIMENTAL=1 before running the installer or nemoclaw onboard. The vLLM provider is currently an experimental feature.
Docker permission deniedUser not in docker groupsudo usermod -aG docker $USER, then log out and back in.
Gateway fails with cgroup / "Failed to start ContainerManager" errorsDocker not configured for host cgroup namespace on DGX StationRun the cgroup fix: sudo python3 -c "import json, os; path='/etc/docker/daemon.json'; d=json.load(open(path)) if os.path.exists(path) else {}; d['default-cgroupns-mode']='host'; json.dump(d, open(path,'w'), indent=2)" then sudo systemctl restart docker.
Gateway fails with "port 8080 is held by container..."Another OpenShell gateway or container is using port 8080Stop the conflicting container: openshell gateway destroy -g <old-gateway-name> or docker stop <container-name> && docker rm <container-name>, then retry nemoclaw onboard.
Sandbox cannot reach the inference serverUsing localhost instead of host.openshell.internal in endpoint URLInside the sandbox, localhost refers to the sandbox container, not the host. The onboard wizard configures host.openshell.internal automatically. Verify from inside the sandbox: curl -sf https://inference.local/v1/models. If this fails, check that vLLM is reachable from the host: curl -s http://localhost:8000/v1/models.
Agent gives no response or is very slowNormal for 120B model running locallyNemotron 3 Super 120B can take 30--90 seconds per response. Verify inference route: nemoclaw my-assistant status.
vLLM API returns empty or errors on tool callsMissing tool-call flagsVerify that --enable-auto-tool-choice and --tool-call-parser qwen3_xml are set: docker inspect vllm-nemotron --format '{{.Config.Cmd}}'.
Port 18789 already in useAnother process is bound to the portlsof -i :18789 then kill <PID>. If needed, kill -9 <PID> to force-terminate.
Web UI port forward dies or dashboard unreachablePort forward not activeopenshell forward stop 18789 my-assistant then openshell forward start 18789 my-assistant --background. Always pass port and sandbox name to openshell forward stop.
Web UI shows origin not allowedBrowser origin does not match what the gateway expectsOn the DGX Station local desktop, open http://127.0.0.1:18789/#token=... (not localhost). Through an SSH tunnel on another machine, localhost vs 127.0.0.1 in the client browser usually both work because the check applies to how you reach the forwarded port locally.
Telegram does not work after install; nemoclaw start does nothing for Telegramnemoclaw start starts optional host services (e.g. cloudflared), not the Telegram bridgeConfigure Telegram during onboard, or on the host run nemoclaw my-assistant channels add telegram (and rebuild), after policy-add for the telegram preset. See Set up Telegram bridge.
Telegram bot receives messages but does not replyTelegram policy not added to sandboxRun nemoclaw my-assistant policy-add, type telegram, hit Y. Ensure the channel was added with nemoclaw my-assistant channels add telegram so the image includes Telegram.
docker: Error response from daemon: Conflict. The container name "/vllm-nemotron" is already in usePrevious cleanup used docker stop onlydocker rm -f vllm-nemotron (or docker update --restart=no then docker stop and docker rm). The playbook uses --restart unless-stopped; stopping alone leaves a restart policy and reserved name.

Model variant guidance:

VariantSizeVRAM RequiredWhen to Use
NVFP4~60 GB~80 GBDefault for DGX Station (GB300). Fits on single GPU with room for large KV cache.
FP8~120 GB~140 GBHigher accuracy, still fits on GB300. Add --kv-cache-dtype fp8 to the vLLM command.
BF16~240 GB~260 GBHighest accuracy. Fits on GB300 but leaves little room for KV cache. Reduce --max-model-len.

For the latest known issues, see DGX Station documentation.

Resources

  • NemoClaw
  • NemoClaw Documentation
  • OpenClaw Documentation
  • vLLM Documentation
  • Nemotron-3-Super on Hugging Face
  • DGX Station Documentation
  • DGX Station Forum
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation