Run OpenClaw in an NVIDIA OpenShell sandbox on DGX Station
| Symptom | Cause | Fix |
|---|---|---|
openshell gateway start fails with "connection refused" or Docker errors | Docker is not running | Start Docker with sudo systemctl start docker or launch Docker Desktop, then retry openshell gateway start |
openshell status shows gateway as unhealthy | Gateway container crashed or failed to initialize | Run openshell gateway destroy and then openshell gateway start to recreate it. Check Docker logs with docker ps -a and docker logs <container-id> for details |
openshell sandbox create --from openclaw fails to build | Network issue pulling the community sandbox or Dockerfile build failure | Check internet connectivity. Retry the command. If the build fails on a specific package, check if the base image is compatible with your Docker version |
Sandbox is in Error phase after creation | Policy validation failed or container startup crashed | Run openshell logs <sandbox-name> to see error details. Common causes: invalid policy YAML, missing provider credentials, or port conflicts |
Agent cannot reach inference.local inside the sandbox | Inference routing not configured or provider unreachable | Run openshell inference get to verify the provider and model are set. Test Ollama is accessible from the host: curl http://localhost:11434/api/tags. Ensure the provider URL uses host.docker.internal instead of localhost |
| 503 verification failed or timeout when OpenClaw (in sandbox) accesses Ollama on the host | Ollama bound only to localhost, or host firewall blocking the gateway container | 1. Run Ollama so it listens on all interfaces so the host accepts connections from the Docker network (e.g. gateway on 172.17.x.x): OLLAMA_HOST=0.0.0.0 ollama serve & 2. If using UFW, allow port 11434 so the gateway container can reach Ollama: sudo ufw allow 11434/tcp comment 'Ollama for OpenShell Gateway' then sudo ufw reload if needed. |
| Cannot access dashboard URL (e.g. http://127.0.0.1:18789) from a remote system or host | Port 18789 is forwarded into the sandbox only; no host forwarding to the client | Create an SSH tunnel that forwards your local port 18789 to the sandbox. Use openshell ssh-proxy with your gateway URL, sandbox ID, token, and gateway name in the SSH ProxyCommand; then use -L 18789:127.0.0.1:18789 to forward. See Step 9 "Accessing the dashboard from the host" in the Instructions. |
| Agent's outbound connections are all denied | Default policy does not include the required endpoints | Monitor denials with openshell logs <sandbox-name> --tail --source sandbox. Pull the current policy with openshell policy get <sandbox-name> --full, add the needed host/port under network_policies, and push with openshell policy set <sandbox-name> --policy <file> --wait |
| "Permission denied" or Landlock errors inside the sandbox | Agent trying to access a path not in read_only or read_write filesystem policy | Pull the current policy and add the path to read_write (or read_only if read access is sufficient). Push the updated policy. Note: filesystem policy is static and requires sandbox recreation |
| Ollama OOM or very slow inference | Model too large for GPU memory or GPU contention | On DGX Station with GB300: free GPU memory (close other GPU workloads), try a smaller model (e.g., nemotron-3-nano, or gpt-oss:20b), or reduce context length. Monitor with nvidia-smi |
openshell sandbox connect hangs or times out | Sandbox not in Ready phase | Run openshell sandbox get <sandbox-name> to check the phase. If stuck in Provisioning, wait or check logs. If in Error, delete and recreate the sandbox |
| Policy push returns exit code 1 (validation failed) | Malformed YAML or invalid policy fields | Check the YAML syntax. Common issues: paths not starting with /, .. traversal in paths, root as run_as_user, or endpoints missing required host/port fields. Fix and re-push |
openshell gateway start fails with "K8s namespace not ready" / timed out waiting for namespace | The k3s cluster inside the Docker container takes longer to bootstrap than the CLI timeout allows. The internal components (TLS secrets, Helm chart, namespace creation) may need extra time, especially on first run when images are pulled inside the container. | First, check whether the container is still running and progressing: docker ps --filter name=openshell (look for health: starting). Inspect k3s state inside the container: docker exec <container> sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get ns" and kubectl get pods -A. If pods are in ContainerCreating and TLS secrets are missing (navigator-server-tls, openshell-server-tls), the cluster is still bootstrapping — wait a few minutes and run openshell status again. If it does not recover, destroy with openshell gateway destroy (and docker rm -f <container> if needed) and retry openshell gateway start. Ensure Docker has enough resources (memory and disk) for the k3s cluster. |
openshell status says "No gateway configured" even though the Docker container is running | The gateway start command failed or timed out before it could save the gateway configuration to the local config store | The container may still be healthy — check with docker ps --filter name=openshell. If the container is running and healthy, try openshell gateway start again (it should detect the existing container). If the container is unhealthy or stuck, remove it with docker rm -f <container> and then openshell gateway destroy followed by openshell gateway start. |
NOTE
On DGX Station with GB300, ensure sufficient GPU VRAM for your chosen model. Use nvidia-smi to check memory usage. Default recommended model is nemotron-3-super; for larger models, ensure no other GPU workloads are consuming VRAM.