CLI Coding Agent

Basic idea

Use Ollama on DGX Spark to run a local coding model and connect a CLI coding agent. This playbook supports three options: Claude Code, OpenCode, and Codex CLI. Each agent is wired up with Ollama's built-in launch method (ollama launch <agent>), so you can work without environment variables, provider config files, or external cloud APIs.

Choose your CLI agent

Pick the tab that matches the CLI agent you want to use:

Claude Code: Fastest path to a working CLI agent with a local Ollama model.
OpenCode: Open-source CLI launched directly from Ollama.
Codex CLI: OpenAI Codex CLI launched directly from Ollama against the local model.

What you'll accomplish

You will run a local coding model (Qwen3.6) on your DGX Spark with Ollama, launch your chosen CLI agent against it with a single command, and complete a small coding task end-to-end.

What to know before starting

Comfort with Linux command line basics
Experience running terminal-based tools and editors
Familiarity with Python for the short coding task

Prerequisites

DGX Spark access with NVIDIA DGX OS 7.3.1 (Ubuntu 24.04.3 LTS base)
Internet access to download model weights
Ollama v0.15 or newer (required for ollama launch)
GPU memory depends on the Qwen3.6 variant you choose:
- qwen3.6:latest (35B-a3b, MoE) — ~24GB, 256K context
- qwen3.6:35b-a3b-nvfp4 — ~22GB, NVIDIA FP4 build tuned for Blackwell (DGX Spark)
- qwen3.6:35b-a3b-q8_0 — ~39GB, higher-quality quant
- qwen3.6:35b-a3b-bf16 — ~71GB, full precision (fits Spark's unified memory)

Time & risk

Duration: ~15-25 minutes (mostly model download time)
Risk level: Low
- Large model downloads can fail if network connectivity is unstable
- Ollama versions older than 0.15 do not support ollama launch
Rollback: Stop Ollama and delete the downloaded model from ~/.ollama/models
Last Updated: 04/16/2026
- Switched to ollama launch method and upgraded the default model to Qwen3.6