---
title: "CLI Coding Agent"
publisher: "nvidia"
type: "playbook"
updated: "2026-04-27T17:26:24.446Z"
description: "Build local CLI coding agents with Ollama"
canonical: "https://build.nvidia.com/spark/cli-coding-agent.md"
---

# Basic idea

Use [Ollama](https://ollama.com) on [DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/) to run a local coding model and connect a CLI coding agent. This
playbook supports three options: **[Claude Code](https://docs.claude.com/en/docs/claude-code)**, **[OpenCode](https://opencode.ai)**, and **[Codex CLI](https://github.com/openai/codex)**. Each
agent is wired up with Ollama's built-in [launch method](https://ollama.com/blog/launch) (`ollama launch <agent>`), so you
can work without environment variables, provider config files, or external cloud APIs.

# Choose your CLI agent

Pick the tab that matches the CLI agent you want to use:

- **Claude Code**: Fastest path to a working CLI agent with a local Ollama model.
- **OpenCode**: Open-source CLI launched directly from Ollama.
- **Codex CLI**: OpenAI Codex CLI launched directly from Ollama against the local model.

# What you'll accomplish

You will run a local coding model ([Qwen3.6](https://ollama.com/library/qwen3.6)) on your DGX Spark with Ollama, launch your
chosen CLI agent against it with a single command, and complete a small coding task end-to-end.

# What to know before starting

- Comfort with Linux command line basics
- Experience running terminal-based tools and editors
- Familiarity with Python for the short coding task

# Prerequisites

- DGX Spark access with NVIDIA DGX OS 7.3.1 (Ubuntu 24.04.3 LTS base)
- Internet access to download model weights
- [Ollama](https://ollama.com/download) v0.15 or newer (required for [`ollama launch`](https://ollama.com/blog/launch))
- GPU memory depends on the Qwen3.6 variant you choose:
- `qwen3.6:latest` (35B-a3b, MoE) — ~24GB, 256K context
- `qwen3.6:35b-a3b-nvfp4` — ~22GB, NVIDIA FP4 build tuned for Blackwell (DGX Spark)
- `qwen3.6:35b-a3b-q8_0` — ~39GB, higher-quality quant
- `qwen3.6:35b-a3b-bf16` — ~71GB, full precision (fits Spark's unified memory)

# Time & risk

* **Duration**: ~15-25 minutes (mostly model download time)
* **Risk level**: Low
* Large model downloads can fail if network connectivity is unstable
* Ollama versions older than 0.15 do not support `ollama launch`
* **Rollback**: Stop Ollama and delete the downloaded model from `~/.ollama/models`
* **Last Updated:** 04/16/2026
* Switched to `ollama launch` method and upgraded the default model to Qwen3.6

## More

- [Claude Code](/spark/cli-coding-agent/claude-code.md)
- [OpenCode](/spark/cli-coding-agent/opencode.md)
- [Codex CLI](/spark/cli-coding-agent/codex.md)
- [Troubleshooting](/spark/cli-coding-agent/troubleshooting.md)