OpenClaw with OpenShell

30 MINS

Run OpenClaw in an NVIDIA OpenShell sandbox on DGX Spark

SymptomCauseFix
openshell gateway start fails with "connection refused" or Docker errorsDocker is not runningStart Docker with sudo systemctl start docker or launch Docker Desktop, then retry openshell gateway start
openshell status shows gateway as unhealthyGateway container crashed or failed to initializeRun openshell gateway destroy and then openshell gateway start to recreate it. Check Docker logs with docker ps -a and docker logs <container-id> for details
openshell sandbox create --from openclaw fails to buildNetwork issue pulling the community sandbox or Dockerfile build failureCheck internet connectivity. Retry the command. If the build fails on a specific package, check if the base image is compatible with your Docker version
Sandbox is in Error phase after creationPolicy validation failed or container startup crashedRun openshell logs <sandbox-name> to see error details. Common causes: invalid policy YAML, missing provider credentials, or port conflicts
Agent cannot reach inference.local inside the sandboxInference routing not configured or provider unreachableRun openshell inference get to verify the provider and model are set. Test Ollama is accessible from the host: curl http://localhost:11434/api/tags. Ensure the provider URL uses host.docker.internal instead of localhost
503 verification failed or timeout when gateway/sandbox accesses Ollama on the hostOllama bound only to localhost, or host firewall blocking port 11434Make Ollama listen on all interfaces so the gateway container (e.g. on Docker network 172.17.x.x) can reach it: OLLAMA_HOST=0.0.0.0 ollama serve &. Allow port 11434 through the host firewall: sudo ufw allow 11434/tcp comment 'Ollama for OpenShell Gateway' (then sudo ufw reload if needed).
Agent's outbound connections are all deniedDefault policy does not include the required endpointsMonitor denials with openshell logs <sandbox-name> --tail --source sandbox. Pull the current policy with openshell policy get <sandbox-name> --full, add the needed host/port under network_policies, and push with openshell policy set <sandbox-name> --policy <file> --wait
"Permission denied" or Landlock errors inside the sandboxAgent trying to access a path not in read_only or read_write filesystem policyPull the current policy and add the path to read_write (or read_only if read access is sufficient). Push the updated policy. Note: filesystem policy is static and requires sandbox recreation
Ollama OOM or very slow inferenceModel too large for available memory or GPU contentionFree GPU memory (close other GPU workloads), try a smaller model (e.g., gpt-oss:20b), or reduce context length. Monitor with nvidia-smi
openshell sandbox connect hangs or times outSandbox not in Ready phaseRun openshell sandbox get <sandbox-name> to check the phase. If stuck in Provisioning, wait or check logs. If in Error, delete and recreate the sandbox
Policy push returns exit code 1 (validation failed)Malformed YAML or invalid policy fieldsCheck the YAML syntax. Common issues: paths not starting with /, .. traversal in paths, root as run_as_user, or endpoints missing required host/port fields. Fix and re-push
openshell gateway start fails with "K8s namespace not ready" / timed out waiting for namespaceThe k3s cluster inside the Docker container takes longer to bootstrap than the CLI timeout allows. The internal components (TLS secrets, Helm chart, namespace creation) may need extra time, especially on first run when images are pulled inside the container.First, check whether the container is still running and progressing: docker ps --filter name=openshell (look for health: starting). Inspect k3s state inside the container: docker exec <container> sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get ns" and kubectl get pods -A. If pods are in ContainerCreating and TLS secrets are missing (navigator-server-tls, openshell-server-tls), the cluster is still bootstrapping — wait a few minutes and run openshell status again. If it does not recover, destroy with openshell gateway destroy (and docker rm -f <container> if needed) and retry openshell gateway start. Ensure Docker has enough resources (memory and disk) for the k3s cluster.
openshell status says "No gateway configured" even though the Docker container is runningThe gateway start command failed or timed out before it could save the gateway configuration to the local config storeThe container may still be healthy — check with docker ps --filter name=openshell. If the container is running and healthy, try openshell gateway start again (it should detect the existing container). If the container is unhealthy or stuck, remove it with docker rm -f <container> and then openshell gateway destroy followed by openshell gateway start.

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For the latest known issues, please review the DGX Spark User Guide.