DGX Station AI Skills for Coding Agents
Give your coding agent (Claude Code, Codex, Gemini CLI, Cursor) DGX Station expertise via an AGENTS.md and on-demand Agent Skills
Install your coding agent
Pick whichever agent you prefer — the rest of this playbook works the same regardless. Install commands:
| Agent | Install |
|---|---|
| Claude Code | curl -fsSL https://claude.ai/install.sh | sh |
| OpenAI Codex CLI | npm i -g @openai/codex |
| Gemini CLI | npm i -g @google/gemini-cli |
| Cursor | Download from https://cursor.com/ |
Verify with claude --version, codex --version, gemini --version, or by launching Cursor.
Install the skills into your project
Navigate to the project where you want DGX Station expertise, then run the installer with the harness you use:
cd ~/your-project
# Pick one:
/path/to/this/playbook/assets/install.sh claude
/path/to/this/playbook/assets/install.sh codex
/path/to/this/playbook/assets/install.sh gemini
/path/to/this/playbook/assets/install.sh cursor
# Or install for all four at once:
/path/to/this/playbook/assets/install.sh all
If you downloaded the playbook as a zip, the path is relative to the extracted directory:
station-ai-skills/assets/install.sh claude ~/your-project
The installer is additive for skill directories (won't clobber existing skills you've written) and refuses to overwrite an existing context file (AGENTS.md, CLAUDE.md, GEMINI.md) unless you pass --force.
Resulting layout (per harness):
your-project/
AGENTS.md or CLAUDE.md or GEMINI.md # context file (named for your agent)
.claude/skills/<name>/SKILL.md # claude
.codex/prompts/<name>.md # codex
.gemini/commands/<name>.md # gemini
.cursor/rules/<name>.mdc # cursor
Where <name> is each of vllm-setup, sglang-setup, mig-configure, dgx-diagnose.
NOTE
Every supported agent automatically reads the context file from the working directory at startup. Skills/prompts/rules in the harness-specific directory are discovered automatically — no additional configuration needed.
Verify the setup
Start your agent in the project directory and ask a question that requires constraint knowledge:
Can I use --gpus all to run my CUDA workload on DGX Station?
The agent should immediately warn about the mixed-coherency constraint and recommend --gpus '"device=N"' targeting. If you don't get the warning, the context file isn't being loaded — see Troubleshooting.
Then verify the skills are discoverable:
| Agent | How to check |
|---|---|
| Claude Code | Type / — vllm-setup, sglang-setup, mig-configure, dgx-diagnose should appear in the autocomplete |
| Codex CLI | Type /prompts: — same four names appear |
| Gemini CLI | Type / — same four names appear |
| Cursor | Open the Rules panel — same four rules appear |
Use vllm-setup to deploy an inference server
Invoke the skill in your agent:
| Agent | Invocation |
|---|---|
| Claude Code | /vllm-setup (slash command) or just describe the task ("deploy vllm with Qwen3-8B") |
| Codex CLI | /prompts:vllm-setup |
| Gemini CLI | /vllm-setup |
| Cursor | In chat: "use the vllm-setup rule to deploy a vllm server" |
The agent will walk you through deploying a vLLM server with a validated container image, correct GPU targeting, and recommended parameters. It will check your GPU index, ask which model you want to serve, and generate the full docker run command.
Use sglang-setup to deploy SGLang
Same invocation pattern, but for SGLang with the cu130 container, RadixAttention prefix caching, and structured JSON output support.
Use mig-configure to partition the GB300
The agent will query your current MIG state, show available profiles, help you choose a layout for your workloads, and execute the partitioning commands.
Use dgx-diagnose to troubleshoot issues
If you encounter problems, invoke dgx-diagnose. The agent will check GPU status, driver version, running processes, MIG state, and Fabric Manager to identify the issue.
Customize
Both the AGENTS.md and the skills are plain markdown — extend them freely.
Add project-specific constraints to AGENTS.md (or your harness-specific context file):
## Project-specific
- Our production MIG layout is 3g.139gb + 2g.70gb + 2g.70gb
- Always use port 8080 for inference (nginx proxy on 443)
- Model weights are cached at /data/models, mount with -v /data/models:/root/.cache/huggingface/hub
Create new skills by adding a directory and SKILL.md to assets/skills/, then re-run install.sh:
mkdir -p assets/skills/run-benchmarks
cat > assets/skills/run-benchmarks/SKILL.md << 'EOF'
---
name: run-benchmarks
description: Run our standard inference benchmark suite against the running vLLM or SGLang server and compare against the baseline.
---
# Run benchmarks
1. Check which inference server is running (vLLM on port 8000 or SGLang on port 30000)
2. Run the appropriate benchmark script from ./benchmarks/
3. Report throughput (tokens/sec), latency (TTFT, ITL), and memory utilization
4. Compare against the baseline in ./benchmarks/baseline.json
EOF
TIP
Keep AGENTS.md focused on constraints and pitfalls (things that break). Put procedural workflows in skills (things you do step-by-step).