| Symptom | Cause | Fix |
|---|---|---|
openshell gateway start fails with "connection refused" or Docker errors | Docker is not running | Start Docker with sudo systemctl start docker or launch Docker Desktop, then retry openshell gateway start |
openshell status shows gateway as unhealthy | Gateway container crashed or failed to initialize | Run openshell gateway destroy and then openshell gateway start to recreate it. Check Docker logs with docker ps -a and docker logs <container-id> for details |
openshell sandbox create --from openclaw fails to build | Network issue pulling the community sandbox or Dockerfile build failure | Check internet connectivity. Retry the command. If the build fails on a specific package, check if the base image is compatible with your Docker version |
Sandbox is in Error phase after creation | Policy validation failed or container startup crashed | Run openshell logs <sandbox-name> to see error details. Common causes: invalid policy YAML, missing provider credentials, or port conflicts |
Agent cannot reach inference.local inside the sandbox | Inference routing not configured or provider unreachable | Run openshell inference get to verify the provider and model are set. Test Ollama is accessible from the host: curl http://localhost:11434/api/tags. Ensure the provider URL uses host.docker.internal instead of localhost |
| 503 verification failed or timeout when gateway/sandbox accesses Ollama on the host | Ollama bound only to localhost, or host firewall blocking port 11434 | Make Ollama listen on all interfaces so the gateway container (e.g. on Docker network 172.17.x.x) can reach it: OLLAMA_HOST=0.0.0.0 ollama serve &. Allow port 11434 through the host firewall: sudo ufw allow 11434/tcp comment 'Ollama for OpenShell Gateway' (then sudo ufw reload if needed). |
| Agent's outbound connections are all denied | Default policy does not include the required endpoints | Monitor denials with openshell logs <sandbox-name> --tail --source sandbox. Pull the current policy with openshell policy get <sandbox-name> --full, add the needed host/port under network_policies, and push with openshell policy set <sandbox-name> --policy <file> --wait |
| "Permission denied" or Landlock errors inside the sandbox | Agent trying to access a path not in read_only or read_write filesystem policy | Pull the current policy and add the path to read_write (or read_only if read access is sufficient). Push the updated policy. Note: filesystem policy is static and requires sandbox recreation |
| Ollama OOM or very slow inference | Model too large for available memory or GPU contention | Free GPU memory (close other GPU workloads), try a smaller model (e.g., gpt-oss:20b), or reduce context length. Monitor with nvidia-smi |
openshell sandbox connect hangs or times out | Sandbox not in Ready phase | Run openshell sandbox get <sandbox-name> to check the phase. If stuck in Provisioning, wait or check logs. If in Error, delete and recreate the sandbox |
| Policy push returns exit code 1 (validation failed) | Malformed YAML or invalid policy fields | Check the YAML syntax. Common issues: paths not starting with /, .. traversal in paths, root as run_as_user, or endpoints missing required host/port fields. Fix and re-push |
openshell gateway start fails with "K8s namespace not ready" / timed out waiting for namespace | The k3s cluster inside the Docker container takes longer to bootstrap than the CLI timeout allows. The internal components (TLS secrets, Helm chart, namespace creation) may need extra time, especially on first run when images are pulled inside the container. | First, check whether the container is still running and progressing: docker ps --filter name=openshell (look for health: starting). Inspect k3s state inside the container: docker exec <container> sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get ns" and kubectl get pods -A. If pods are in ContainerCreating and TLS secrets are missing (navigator-server-tls, openshell-server-tls), the cluster is still bootstrapping — wait a few minutes and run openshell status again. If it does not recover, destroy with openshell gateway destroy (and docker rm -f <container> if needed) and retry openshell gateway start. Ensure Docker has enough resources (memory and disk) for the k3s cluster. |
openshell status says "No gateway configured" even though the Docker container is running | The gateway start command failed or timed out before it could save the gateway configuration to the local config store | The container may still be healthy — check with docker ps --filter name=openshell. If the container is running and healthy, try openshell gateway start again (it should detect the existing container). If the container is unhealthy or stuck, remove it with docker rm -f <container> and then openshell gateway destroy followed by openshell gateway start. |
NOTE
DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
For the latest known issues, please review the DGX Spark User Guide.