Local Coding Agent

30 MINS

Run local CLI coding agents with Ollama on DGX Station (NVIDIA GB300) using glm-4.7-flash (fast) or unsloth/GLM-4.7-GGUF:Q8_0 (best quality)

SymptomCauseFix
ollama: command not foundOllama not installed or PATH not updatedRerun `curl -fsSL https://ollama.com/install.sh
Model load fails with version errorOllama is older than 0.15.0Update Ollama to 0.15.0 or newer (required for GLM-4.7-Flash). Do not pin to 0.14.3.
model not found in Claude CodeModel was not pulledRun ollama pull glm-4.7-flash or ollama pull hf.co/unsloth/GLM-4.7-GGUF:Q8_0 and retry. Use the same model name in claude --model ....
connection refused to localhost:11434Ollama service not runningStart with ollama serve or sudo systemctl start ollama
Slow responses or OOMInsufficient GPU memory or fragmentationOn DGX Station (NVIDIA GB300), ensure no other heavy GPU workloads. If OOM persists, use a smaller variant (e.g. glm-4.7-flash:q8_0 or glm-4.7-flash:q4_K_M) or OLLAMA_MAX_LOADED_MODELS=1.
claude: command not found after installCLI not on PATH or install script did not completeRestart the terminal or run source ~/.bashrc (or your shell profile). Check the install script output for the install path and add it to PATH.
Claude Code install fails (Node.js / network)Node.js missing or install script cannot downloadEnsure Node.js is installed (node --version). If the install script fails with a network error, retry from a stable connection or download the Claude Code CLI from the official site. See Claude Code documentation for alternatives.

NOTE

DGX Station with NVIDIA GB300 provides ample GPU memory for glm-4.7-flash (fast testing) and unsloth/GLM-4.7-GGUF (best quality), plus variants (e.g. glm-4.7-flash:bf16). Use OLLAMA_MAX_LOADED_MODELS=1 if you hit memory limits with multiple models.