---
title: "Nemotron-3-Nano with llama.cpp"
publisher: "nvidia"
type: "playbook"
updated: "2025-12-18T00:33:58.643Z"
description: "Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark"
canonical: "https://build.nvidia.com/spark/nemotron.md"
---

# Basic idea

Nemotron-3-Nano-30B-A3B is NVIDIA's powerful language model featuring a 30 billion parameter Mixture of Experts (MoE) architecture with only 3 billion active parameters. This efficient design enables high-quality inference with lower computational requirements, making it ideal for DGX Spark's GB10 GPU.

This playbook demonstrates how to run Nemotron-3-Nano using llama.cpp, which compiles CUDA kernels at build time specifically for your GPU architecture. The model includes built-in reasoning (thinking mode) and tool calling support via the chat template.

# What you'll accomplish

You will have a fully functional Nemotron-3-Nano-30B-A3B inference server running on your DGX Spark, accessible via an OpenAI-compatible API. This setup enables:

- Local LLM inference
- OpenAI-compatible API endpoint for easy integration with existing tools
- Built-in reasoning and tool calling capabilities

# What to know before starting

- Basic familiarity with Linux command line and terminal commands
- Understanding of git and working with branches
- Experience building software from source with CMake
- Basic knowledge of REST APIs and cURL for testing
- Familiarity with Hugging Face Hub for model downloads

# Prerequisites

**Hardware Requirements:**
- NVIDIA DGX Spark with GB10 GPU
- At least 40GB available GPU memory (model uses ~38GB VRAM)
- At least 50GB available storage space for model downloads and build artifacts

**Software Requirements:**
- NVIDIA DGX OS
- Git: `git --version`
- CMake (3.14+): `cmake --version`
- CUDA Toolkit: `nvcc --version`
- Network access to GitHub and Hugging Face

# Time & risk

* **Estimated time:** 30 minutes (including model download of ~38GB)
* **Risk level:** Low
* Build process compiles from source but doesn't modify system files
* Model downloads can be resumed if interrupted
* **Rollback:** Delete the cloned `llama.cpp` directory and downloaded model files to fully remove the installation
* **Last Updated:** 12/17/2025
* First Publication

## More

- [Instructions](/spark/nemotron/instructions.md)
- [Troubleshooting](/spark/nemotron/troubleshooting.md)