NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • DGX Dashboard
  • Comfy UI
  • Connect Three DGX Spark in a Ring Topology
  • Connect Multiple DGX Spark through a Switch
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark
  • VS Code

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
  • Secure Long Running AI Agents with OpenShell on DGX Spark
  • OpenClaw 🦞
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent
  • Spark & Reachy Photo Booth

inference

  • Speculative Decoding
  • Run models with llama.cpp on DGX Spark
  • vLLM for Inference
  • Nemotron-3-Nano with llama.cpp
  • SGLang for Inference
  • TRT LLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
  • LM Studio on DGX Spark

Nemotron-3-Nano with llama.cpp

30 MIN

Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark

InferenceLLMNemotronllama.cpp
View on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting
SymptomCauseFix
cmake fails with "CUDA not found"CUDA toolkit not in PATHRun export PATH=/usr/local/cuda/bin:$PATH and retry
Model download fails or is interruptedNetwork issuesRe-run the hf download command - it will resume from where it stopped
"CUDA out of memory" when starting serverInsufficient GPU memoryReduce --ctx-size to 4096 or use a smaller quantization (Q4_K_M)
Server starts but inference is slowModel not fully loaded to GPUVerify --n-gpu-layers 99 is set and check nvidia-smi shows GPU usage
"Connection refused" on port 30000Server not running or wrong portVerify server is running and check the --port parameter
"model not found" in API responseWrong model pathVerify the model path in --model parameter matches the downloaded file location

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For latest known issues, please review the DGX Spark User Guide.

Resources

  • llama.cpp GitHub Repository
  • Nemotron-3-Nano GGUF on Hugging Face
  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation