Run local CLI coding agents with Ollama on DGX Station (NVIDIA GB300) using glm-4.7-flash (fast) or unsloth/GLM-4.7-GGUF:Q8_0 (best quality)
Use Ollama on DGX Station (NVIDIA GB300) to run local coding models and connect a CLI coding agent. This playbook uses Claude Code to talk to Ollama for local inference, so you can work without external cloud APIs.
The DGX Station GPU (reported as NVIDIA GB300 in nvidia-smi) provides ample memory to run glm-4.7-flash (fast loading and testing) and larger models such as unsloth/GLM-4.7-GGUF
This playbook uses Claude Code as the CLI agent, connected to a local Ollama model for inference.
You will run a local coding model on your DGX Station (NVIDIA GB300) with Ollama, connect Claude Code to it, and complete a small coding task end-to-end. Use glm-4.7-flash (including high-quality variants) or unsloth/GLM-4.7-GGUF
nvidia-smi typically shows "NVIDIA GB300"latest) to ~60 GB (bf16) — recommended for fast loading and testingglm-4.7-flash:bf16, glm-4.7-flash:q8_0) fit on GB300glm-4.7-flash:latest, plus additional space for the Q8_0 or bf16 variants if you use them~/.ollama/models