Run OpenClaw with local models in an NVIDIA OpenShell sandbox on DGX Station
Verify the OS, GPU, Docker, and Python are available before installing anything.
head -n 2 /etc/os-release
nvidia-smi
docker info --format '{{.ServerVersion}}'
python3 --version
Expected output should show Ubuntu 24.04 (DGX OS), a detected GPU (e.g. NVIDIA GB300 on DGX Station), a Docker server version, and Python 3.12+. If you access the DGX Station remotely, ensure port 18789 is available for the OpenClaw dashboard.
First, verify that the local user has Docker permissions using the following command.
docker ps
If you get a permission denied error (permission denied while trying to connect to the docker API at unix:///var/run/docker.sock), add your user to the system's Docker group. This will enable you to run Docker commands without requiring sudo. The command to do so is as follows:
sudo usermod -aG docker $USER
newgrp docker
Now that we have verified the user's Docker permission, we must configure Docker so that it can use the NVIDIA Container Runtime.
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Run a sample workload to verify the setup:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Create a virtual environment and install the openshell CLI.
cd ~
uv venv openshell-env && source openshell-env/bin/activate
uv pip install openshell
openshell --help
If you don't have uv installed yet:
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
Expected output should show the openshell command tree with subcommands like gateway, sandbox, provider, and inference.
The gateway is the control plane that manages sandboxes. Since you are running directly on the DGX Station, it deploys locally inside Docker.
openshell gateway start
openshell status
openshell status should report the gateway as Connected. The first run may take a few minutes while Docker pulls the required images and the internal k3s cluster bootstraps.
NOTE
Remote gateway deployment requires passwordless SSH access. Ensure your SSH public key is added to ~/.ssh/authorized_keys on the DGX Station before using the --remote flag.
TIP
If you want to manage the DGX Station gateway from a separate workstation, run openshell gateway start --remote <username>@<dgx-station-ip-or-hostname> from that workstation instead. All subsequent commands will route through the SSH tunnel.
vLLM path only.
docker pull nvcr.io/nvidia/vllm:26.03-py3
The OpenShell gateway must reach this service using the host’s real IP address (not localhost from inside other containers). Binding --host 0.0.0.0 and publishing -p 8000:8000 makes the API available on all interfaces.
The Nemotron weights may require a Hugging Face account and token. Create your own read token at huggingface.co/settings/tokens, keep it private (do not paste real tokens into shared docs, tickets, or git), then export it in your shell before docker run so the command below only references the variable:
export HF_TOKEN=your_actual_token_here
Replace your_actual_token_here with your real token value. If you do not need Hugging Face authentication for this model, skip the export and remove the -e HF_TOKEN="$HF_TOKEN" line from the docker run command.
We are going to use the nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 model as it fits in DGX Station VRAM with KV headroom at --max-model-len 32768
WARNING
The --trust-remote-code flag in the following docker run command allows execution of arbitrary code from the model repository. Only use this with trusted models.
docker run -d --name vllm-openshell \
--runtime nvidia --gpus all \
-e HF_TOKEN="$HF_TOKEN" \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--restart unless-stopped \
nvcr.io/nvidia/vllm:26.03-py3 \
python3 -m vllm.entrypoints.openai.api_server \
--model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--trust-remote-code \
--max-model-len 32768 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_xml \
--reasoning-parser nemotron_v3
Watch logs until the server is ready (first start can take several minutes while weights load). Then, in a new terminal window, run:
docker logs -f vllm-openshell
Wait for logs to output Application startup complete., then verify the API using:
curl -s http://localhost:8000/v1/models
You should see JSON listing nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Warm up with a short completion so CUDA graphs compile before OpenClaw validates the route (first request may take 30–90 seconds):
curl -s --max-time 120 http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4","messages":[{"role":"user","content":"Say hello."}],"max_tokens":16}'
Create an OpenShell provider that points at the vLLM OpenAI-compatible API on the host (/v1 on port 8000).
First, find the IP address of your DGX Station:
hostname -I | awk '{print $1}'
Then create the provider, replacing {Machine_IP} with the IP address from the command above (e.g. 10.110.106.169):
openshell provider create \
--name local-vllm \
--type openai \
--credential OPENAI_API_KEY=not-needed \
--config OPENAI_BASE_URL=http://{Machine_IP}:8000/v1
IMPORTANT
Do not use localhost or 127.0.0.1 here. The OpenShell gateway runs inside a Docker container, so it cannot reach the host via localhost. Use the machine's actual IP address.
Some Linux Docker setups can use http://host.docker.internal:8000/v1 instead of the host IP; if your gateway resolves that hostname, it is equivalent.
Verify the provider was created:
openshell provider list
Point the inference.local endpoint (available inside every sandbox) at vLLM. The model id must match what /v1/models returns (for the default Step 5 command, use the Hugging Face id below). If you changed --model in Step 5, use that same string here.
openshell inference set \
--provider local-vllm \
--model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
The output should confirm the route and show a validated endpoint URL, for example: http://10.110.106.169:8000/v1/chat/completions (openai_chat_completions).
NOTE
If you see failed to verify inference endpoint or failed to connect, ensure vLLM is healthy (docker logs vllm-openshell) and you completed at least one chat completion so cold-start compilation has finished. You can add --no-verify to skip verification: openshell inference set --provider local-vllm --model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --no-verify.
Verify the configuration:
openshell inference get
Expected output should show provider: local-vllm and model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 (or whichever model you configured in Step 5).
Create a sandbox using the pre-built OpenClaw community sandbox. This pulls the OpenClaw Dockerfile, the default policy, and startup scripts from the OpenShell Community catalog:
openshell sandbox create \
--keep \
--forward 18789 \
--name dgx-demo \
--from openclaw \
-- openclaw-start
NOTE
Do not pass --policy with a local file path (e.g. openclaw-policy.yaml) when using --from openclaw. The policy is bundled with the community sandbox; a local file path can cause "file not found."
The --keep flag keeps the sandbox running after the initial process exits, so you can reconnect later. This is the default behavior. To terminate the sandbox when the initial process exits, use the --no-keep flag instead.
NOTE
The sandbox name is displayed in the creation output. You can also set it explicitly with --name <your-name>. To find it later, run openshell sandbox list.
The CLI will:
openclaw against the community catalogThe sandbox container will spin up and the OpenClaw onboarding wizard will launch automatically in your terminal.
IMPORTANT
The onboarding wizard is fully interactive — it requires arrow-key navigation and Enter to select options. It cannot be completed from a non-interactive session (e.g. a script or automation tool). You must run openshell sandbox create from a terminal with full TTY support.
If the wizard did not complete during sandbox creation, reconnect to the sandbox to re-run it:
openshell sandbox connect dgx-demo
Use the arrow keys and Enter key to interact with the installation.
not-needed (or any placeholder; vLLM is not checking the key unless you enabled API-key auth in the server).nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4).
docker logs vllm-openshell).It might take 1-2 minutes to get through the final stages. Afterwards, you should see a URL with a token you can use to connect to the gateway.
The expected output will be similar, but the token will be unique.
OpenClaw gateway starting in background.
Logs: /tmp/gateway.log
UI: http://127.0.0.1:18789/?token=9b4c9a9c9f6905131327ce55b6d044bd53e0ec423dd6189e
In order to verify the default policy enabled for your sandbox, please run the following command:
openshell sandbox get dgx-demo
Accessing the dashboard from DGX station as the primary device: right-click on the URL in the UI section and select Open Link.
Accessing the dashboard from the host or a remote system: The dashboard URL (e.g. http://127.0.0.1:18789/?token=...) is inside the sandbox, so the host does not forward port 18789 by default. To reach it from your host or another machine, use SSH local port forwarding. From a machine that can reach the OpenShell gateway, run (replace gateway URL, sandbox-id, token, and gateway-name with values from your environment):
ssh -o ProxyCommand='/usr/local/bin/openshell ssh-proxy --gateway https://127.0.0.1:8080/connect/ssh --sandbox-id <sandbox-id> --token <token> --gateway-name openshell' -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR -N -L 18789:127.0.0.1:18789 sandbox
Then open http://127.0.0.1:18789/?token=<your-token> in your local browser.
To access from another machine, use the SSH tunnel described above, or open the dashboard URL in your browser (e.g. after port forwarding or from the DGX Station's local browser).
From this page, you can now Chat with your OpenClaw agent within the protected confines of the runtime OpenShell provides.
Now that OpenClaw has been configured within the OpenShell protected runtime, you can connect directly into the sandbox environment via:
openshell sandbox connect dgx-demo
Once loaded into the sandbox terminal, you can test connectivity to vLLM via inference.local with this command:
curl https://inference.local/v1/responses \
-H "Content-Type: application/json" \
-d '{
"instructions": "You are a helpful assistant.",
"input": "Hello!"
}'
Open a second terminal and check the sandbox status and live logs:
source ~/openshell-env/bin/activate
openshell term
The terminal dashboard shows:
allow, deny, inspect_for_inference), and inference interceptionsVerify that the OpenClaw agent can reach inference.local for model requests and that unauthorized outbound traffic is denied.
TIP
Press f to follow live output, s to filter by source, and q to quit the terminal dashboard.
If you exit the sandbox session, reconnect at any time:
openshell sandbox connect dgx-demo
NOTE
openshell sandbox connect is interactive-only — it opens a terminal session inside the sandbox. There is no way to pass a command for non-interactive execution. Use openshell sandbox upload/download for file transfers, or use the SSH proxy for scripted access (see Step 9).
To transfer files in or out (replace dgx-demo with your sandbox name if you used a different one):
openshell sandbox upload dgx-demo ./local-file /sandbox/destination
openshell sandbox download dgx-demo /sandbox/file ./local-destination
Stop and remove the sandbox (use the name you gave it, e.g. dgx-demo):
openshell sandbox delete dgx-demo
Stop the gateway (preserves state for later):
openshell gateway stop
WARNING
The following command permanently removes the gateway cluster and all its data.
openshell gateway destroy
Remove the inference provider you created in Step 6:
openshell provider delete local-vllm
Stop and remove the vLLM container started in Step 5:
docker stop vllm-openshell
docker rm vllm-openshell
(Optional) Remove the container image to free disk:
docker rmi nvcr.io/nvidia/vllm:26.03-py3
openshell provider create. When creating the sandbox, pass the provider name(s) with --provider <name> (e.g. --provider my-github) to inject those credentials into the sandbox securely.openshell sandbox create --from base or --from sdg for other pre-built environments.openshell sandbox ssh-config <sandbox-name> and append the output to ~/.ssh/config to connect VS Code Remote-SSH directly into the sandbox.openshell logs <sandbox-name> --tail or openshell term to continuously monitor agent activity and policy decisions.