Local Coding Agent

Basic idea

Use Ollama on DGX Station (NVIDIA GB300) to run local coding models and connect a CLI coding agent. This playbook uses Claude Code to talk to Ollama for local inference, so you can work without external cloud APIs.

The DGX Station GPU (reported as NVIDIA GB300 in nvidia-smi) provides ample memory to run glm-4.7-flash (fast loading and testing) and larger models such as unsloth/GLM-4.7-GGUF (best quality), both supported on Ollama.

CLI agent

This playbook uses Claude Code as the CLI agent, connected to a local Ollama model for inference.

What you'll accomplish

You will run a local coding model on your DGX Station (NVIDIA GB300) with Ollama, connect Claude Code to it, and complete a small coding task end-to-end. Use glm-4.7-flash (including high-quality variants) or unsloth/GLM-4.7-GGUF for best quality.

What to know before starting

Comfort with Linux command line basics
Experience running terminal-based tools and editors
Familiarity with Python for the short coding task

Prerequisites

DGX Station with NVIDIA GB300 (Grace Blackwell) and NVIDIA driver; nvidia-smi typically shows "NVIDIA GB300"
Internet access to download model weights
Ollama 0.15.0 or newer (required for GLM-4.7-Flash; do not pin to 0.14.3)
GPU memory on GB300 supports both recommended models:
- glm-4.7-flash: ~19 GB (latest) to ~60 GB (bf16) — recommended for fast loading and testing
- unsloth/GLM-4.7-GGUF (Hugging Face on Ollama): larger model — recommended for best quality
- Other variants (e.g. glm-4.7-flash:bf16, glm-4.7-flash:q8_0) fit on GB300
Disk space for model downloads: plan for ~19 GB for glm-4.7-flash:latest, plus additional space for the Q8_0 or bf16 variants if you use them

Time & risk

Duration: ~20–30 minutes (includes model download)
Risk level: Low
- Large model downloads can fail if network connectivity is unstable
- Older Ollama versions will not load newer models
Rollback: Stop Ollama and delete the downloaded model from ~/.ollama/models
Last Updated: 03/06/2026
- Model set to glm-4.7-flash; Ollama 0.15.0+; cleanup order and docs refresh