Basic idea
Use Ollama on DGX Station (NVIDIA GB300) to run local coding models and connect a CLI coding agent. This playbook uses Claude Code to talk to Ollama for local inference, so you can work without external cloud APIs.
The DGX Station GPU (reported as NVIDIA GB300 in nvidia-smi) provides ample memory to run glm-4.7-flash (fast loading and testing) and larger models such as unsloth/GLM-4.7-GGUF
CLI agent
This playbook uses Claude Code as the CLI agent, connected to a local Ollama model for inference.
What you'll accomplish
You will run a local coding model on your DGX Station (NVIDIA GB300) with Ollama, connect Claude Code to it, and complete a small coding task end-to-end. Use glm-4.7-flash (including high-quality variants) or unsloth/GLM-4.7-GGUF
What to know before starting
- Comfort with Linux command line basics
- Experience running terminal-based tools and editors
- Familiarity with Python for the short coding task
Prerequisites
- DGX Station with NVIDIA GB300 (Grace Blackwell) and NVIDIA driver;
nvidia-smitypically shows "NVIDIA GB300" - Internet access to download model weights
- Ollama 0.15.0 or newer (required for GLM-4.7-Flash; do not pin to 0.14.3)
- GPU memory on GB300 supports both recommended models:
- glm-4.7-flash: ~19 GB (
latest) to ~60 GB (bf16) — recommended for fast loading and testing - unsloth/GLM-4.7-GGUF
(Hugging Face on Ollama): larger model — recommended for best quality - Other variants (e.g.
glm-4.7-flash:bf16,glm-4.7-flash:q8_0) fit on GB300
- glm-4.7-flash: ~19 GB (
- Disk space for model downloads: plan for ~19 GB for
glm-4.7-flash:latest, plus additional space for the Q8_0 or bf16 variants if you use them
Time & risk
- Duration: ~20–30 minutes (includes model download)
- Risk level: Low
- Large model downloads can fail if network connectivity is unstable
- Older Ollama versions will not load newer models
- Rollback: Stop Ollama and delete the downloaded model from
~/.ollama/models - Last Updated: 03/06/2026
- Model set to glm-4.7-flash; Ollama 0.15.0+; cleanup order and docs refresh