NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • DGX Dashboard
  • Comfy UI
  • Connect Three DGX Spark in a Ring Topology
  • Connect Multiple DGX Spark through a Switch
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark
  • VS Code

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • NemoClaw with Nemotron-3-Super and Telegram on DGX Spark
  • NemoClaw with Nemotron-3-Super and Telegram on DGX Spark
  • Secure Long Running AI Agents with OpenShell on DGX Spark
  • OpenClaw 🦞
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent
  • Spark & Reachy Photo Booth

inference

  • Run models with llama.cpp on DGX Spark
  • vLLM for Inference
  • Nemotron-3-Nano with llama.cpp
  • Speculative Decoding
  • SGLang for Inference
  • TRT LLM for Inference
  • vLLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
  • LM Studio on DGX Spark

Fine-tune with Pytorch

1 HR

Use Pytorch to fine-tune models locally

DGXSpark
View on GitHub
OverviewOverviewInstructionsInstructionsRun on two SparksRun on two SparksTroubleshootingTroubleshooting

Step 1
Configure Docker permissions

To easily manage containers without sudo, you must be in the docker group. If you choose to skip this step, you will need to run Docker commands with sudo.

Open a new terminal and test Docker access. In the terminal, run:

docker ps

If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .

sudo usermod -aG docker $USER
newgrp docker

Step 2
Pull the latest Pytorch container

docker pull nvcr.io/nvidia/pytorch:25.11-py3

Step 3
Launch Docker

docker run --gpus all -it --rm --ipc=host \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-v ${PWD}:/workspace -w /workspace \
nvcr.io/nvidia/pytorch:25.11-py3

Step 4
Install dependencies inside the container

pip install transformers peft datasets trl bitsandbytes

Step 5
Authenticate with Huggingface

hf auth login
#<input your huggingface token.
#<Enter n for git credential>

Step 6
Clone the git repo with fine-tuning recipes

git clone https://github.com/NVIDIA/dgx-spark-playbooks
cd dgx-spark-playbooks/nvidia/pytorch-fine-tune/assets

Step 7
Run the fine-tuning recipes

Available Fine-Tuning Scripts

The following fine-tuning scripts are provided, each optimized for different model sizes and training approaches:

ScriptModelFine-Tuning TypeDescription
Llama3_3B_full_finetuning.pyLlama 3.2 3BFull SFTFull supervised fine-tuning (all parameters trainable)
Llama3_8B_LoRA_finetuning.pyLlama 3.1 8BLoRALow-Rank Adaptation (parameter-efficient)
Llama3_70B_LoRA_finetuning.pyLlama 3.1 70BLoRALow-Rank Adaptation with FSDP support
Llama3_70B_qLoRA_finetuning.pyLlama 3.1 70BQLoRAQuantized LoRA (4-bit quantization for memory efficiency)

Basic Usage

Run any script with default settings:

# Full fine-tuning on Llama 3.2 3B
python Llama3_3B_full_finetuning.py

# LoRA fine-tuning on Llama 3.1 8B
python Llama3_8B_LoRA_finetuning.py

# qLoRA fine-tuning on Llama 3.1 70B
python Llama3_70B_qLoRA_finetuning.py

Common Command-Line Arguments

All scripts support the following command-line arguments for customization:

Model Configuration

  • --model_name: Model name or path (default: varies by script)
  • --dtype: Model precision - float32, float16, or bfloat16 (default: bfloat16)

Training Configuration

  • --batch_size: Per-device training batch size (default: varies by script)
  • --seq_length: Maximum sequence length (default: 2048)
  • --num_epochs: Number of training epochs (default: 1)
  • --gradient_accumulation_steps: Gradient accumulation steps (default: 1)
  • --learning_rate: Learning rate (default: varies by script)
  • --gradient_checkpointing: Enable gradient checkpointing to save memory (flag)

LoRA Configuration (LoRA and QLoRA scripts only)

  • --lora_rank: LoRA rank - higher values = more trainable parameters (default: 8)

Dataset Configuration

  • --dataset_size: Number of samples to use from the Alpaca dataset (default: 512)

Logging Configuration

  • --logging_steps: Log metrics every N steps (default: 1)
  • --log_dir: Directory for TensorBoard logs (default: logs)

Model Saving

  • --output_dir: Directory to save the fine-tuned model (default: None - model not saved)

Usage Examples

python Llama3_8B_LoRA_finetuning.py \
  --dataset_size 100 \
  --num_epochs 1 \
  --batch_size 2

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation