NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • MIG on DGX Station

data science

  • Topic Modeling
  • Text to Knowledge Graph on DGX Station

tools

  • NVFP4 Quantization

fine tuning

  • Nanochat Training

use case

  • NemoClaw with Nemotron-3-Super and vLLM on DGX Station
  • Local Coding Agent
  • Secure Long Running AI Agents with OpenShell on DGX Station

inference

  • Serve Qwen3-235B with vLLM

Secure Long Running AI Agents with OpenShell on DGX Station

30 MINS

Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Station

AI AgentDGX StationGB300OpenShellSecurity
OpenShell on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting
SymptomCauseFix
openshell gateway start fails with "connection refused" or Docker errorsDocker is not runningStart Docker with sudo systemctl start docker or launch Docker Desktop, then retry openshell gateway start
openshell status shows gateway as unhealthyGateway container crashed or failed to initializeRun openshell gateway destroy and then openshell gateway start to recreate it. Check Docker logs with docker ps -a and docker logs <container-id> for details
openshell sandbox create --from openclaw fails to buildNetwork issue pulling the community sandbox or Dockerfile build failureCheck internet connectivity. Retry the command. If the build fails on a specific package, check if the base image is compatible with your Docker version
Sandbox is in Error phase after creationPolicy validation failed or container startup crashedRun openshell logs <sandbox-name> to see error details. Common causes: invalid policy YAML, missing provider credentials, or port conflicts
Agent cannot reach inference.local inside the sandboxInference routing not configured or provider unreachableRun openshell inference get to verify the provider and model are set. From the host, test vLLM: curl -s http://localhost:8000/v1/models. The provider base URL must use the host’s real IP (not 127.0.0.1/localhost) so the gateway container can reach vLLM (see instructions.md Step 6).
503 verification failed or timeout when the gateway validates vLLMvLLM not listening on all interfaces, firewall blocking port 8000, model still loading, or first-request CUDA graph compileEnsure the vLLM server was started with --host 0.0.0.0 and port 8000 mapped (see Step 5). Warm up with a chat completion request before openshell inference set. Allow port 8000 if you use a host firewall: sudo ufw allow 8000/tcp comment 'vLLM for OpenShell Gateway' (then sudo ufw reload if needed). For very large models, try openshell inference set ... --no-verify after confirming vLLM works from the host.
Agent's outbound connections are all deniedDefault policy does not include the required endpointsMonitor denials with openshell logs <sandbox-name> --tail --source sandbox. Pull the current policy with openshell policy get <sandbox-name> --full, add the needed host/port under network_policies, and push with openshell policy set <sandbox-name> --policy <file> --wait
"Permission denied" or Landlock errors inside the sandboxAgent trying to access a path not in read_only or read_write filesystem policyPull the current policy and add the path to read_write (or read_only if read access is sufficient). Push the updated policy. Note: filesystem policy is static and requires sandbox recreation
vLLM OOM or very slow inferenceModel too large for available VRAM, --max-model-len too high, or GPU contentionFree GPU memory (close other GPU workloads), use a smaller Hugging Face model or quantized variant, or lower --max-model-len. Check docker logs for the vLLM container. Monitor with nvidia-smi
openshell sandbox connect hangs or times outSandbox not in Ready phaseRun openshell sandbox get <sandbox-name> to check the phase. If stuck in Provisioning, wait or check logs. If in Error, delete and recreate the sandbox
Policy push returns exit code 1 (validation failed)Malformed YAML or invalid policy fieldsCheck the YAML syntax. Common issues: paths not starting with /, .. traversal in paths, root as run_as_user, or endpoints missing required host/port fields. Fix and re-push
openshell gateway start fails with "K8s namespace not ready" / timed out waiting for namespaceThe k3s cluster inside the Docker container takes longer to bootstrap than the CLI timeout allows. The internal components (TLS secrets, Helm chart, namespace creation) may need extra time, especially on first run when images are pulled inside the container.First, check whether the container is still running and progressing: docker ps --filter name=openshell (look for health: starting). Inspect k3s state inside the container: docker exec <container> sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get ns" and kubectl get pods -A. If pods are in ContainerCreating and TLS secrets are missing (navigator-server-tls, openshell-server-tls), the cluster is still bootstrapping — wait a few minutes and run openshell status again. If it does not recover, destroy with openshell gateway destroy (and docker rm -f <container> if needed) and retry openshell gateway start. Ensure Docker has enough resources (memory and disk) for the k3s cluster.
openshell status says "No gateway configured" even though the Docker container is runningThe gateway start command failed or timed out before it could save the gateway configuration to the local config storeThe container may still be healthy — check with docker ps --filter name=openshell. If the container is running and healthy, try openshell gateway start again (it should detect the existing container). If the container is unhealthy or stuck, remove it with docker rm -f <container> and then openshell gateway destroy followed by openshell gateway start.

Resources

  • NVIDIA OpenShell Documentation
  • OpenShell PyPI
  • OpenClaw Documentation
  • OpenClaw Gateway Security
  • DGX Station Documentation
  • DGX Station and GB300
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation