Nemotron-3-Nano with llama.cpp

30 MIN

Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark

Basic idea

Nemotron-3-Nano-30B-A3B is NVIDIA's powerful language model featuring a 30 billion parameter Mixture of Experts (MoE) architecture with only 3 billion active parameters. This efficient design enables high-quality inference with lower computational requirements, making it ideal for DGX Spark's GB10 GPU.

This playbook demonstrates how to run Nemotron-3-Nano using llama.cpp, which compiles CUDA kernels at build time specifically for your GPU architecture. The model includes built-in reasoning (thinking mode) and tool calling support via the chat template.

What you'll accomplish

You will have a fully functional Nemotron-3-Nano-30B-A3B inference server running on your DGX Spark, accessible via an OpenAI-compatible API. This setup enables:

  • Local LLM inference
  • OpenAI-compatible API endpoint for easy integration with existing tools
  • Built-in reasoning and tool calling capabilities

What to know before starting

  • Basic familiarity with Linux command line and terminal commands
  • Understanding of git and working with branches
  • Experience building software from source with CMake
  • Basic knowledge of REST APIs and cURL for testing
  • Familiarity with Hugging Face Hub for model downloads

Prerequisites

Hardware Requirements:

  • NVIDIA DGX Spark with GB10 GPU
  • At least 40GB available GPU memory (model uses ~38GB VRAM)
  • At least 50GB available storage space for model downloads and build artifacts

Software Requirements:

  • NVIDIA DGX OS
  • Git: git --version
  • CMake (3.14+): cmake --version
  • CUDA Toolkit: nvcc --version
  • Network access to GitHub and Hugging Face

Time & risk

  • Estimated time: 30 minutes (including model download of ~38GB)
  • Risk level: Low
    • Build process compiles from source but doesn't modify system files
    • Model downloads can be resumed if interrupted
  • Rollback: Delete the cloned llama.cpp directory and downloaded model files to fully remove the installation
  • Last Updated: 12/17/2025
    • First Publication