NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark
  • VS Code
  • Connect Three DGX Spark in a Ring Topology
  • Connect Multiple DGX Spark through a Switch

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
  • cuTile Kernels
  • CLI Coding Agent
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent
  • Spark & Reachy Photo Booth
  • Secure Long Running AI Agents with OpenShell on DGX Spark
  • OpenClaw 🦞

inference

  • LM Studio on DGX Spark
  • Speculative Decoding
  • Run models with llama.cpp on DGX Spark
  • Nemotron-3-Nano with llama.cpp
  • SGLang for Inference
  • TRT LLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
  • vLLM for Inference

LLaMA Factory

1 HR

Install and fine-tune models with LLaMA Factory

DGXSpark
View GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting

Step 1
Verify system prerequisites

Check that your NVIDIA Spark system has the required components installed and accessible.

nvcc --version
nvidia-smi
python3 --version
git --version

Step 2
Create and activate a Python virtual environment

Create a virtual environment and activate it for the LLaMA Factory installation.

python3 -m venv factoryEnv
source ./factoryEnv/bin/activate

Step 3
Install PyTorch with CUDA 13 support

Install PyTorch, torchvision, and torchaudio with CUDA 13.0 support from the official PyTorch index.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Step 4
Verify PyTorch CUDA support

Confirm that PyTorch can see the GPU.

python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

Step 5
Clone LLaMA Factory repository

Download the LLaMA Factory source code from the official repository.

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

Step 6
Install LLaMA Factory with dependencies

Install LLaMA Factory in editable mode with metrics support.

pip install -e ".[metrics]"

Step 7
Prepare training configuration

Examine the provided LoRA fine-tuning configuration for Qwen3.

cat examples/train_lora/qwen3_lora_sft.yaml

Step 8
Launch fine-tuning training

NOTE

Login to your Hugging Face Hub to download the model if the model is gated.

Execute the training process using the pre-configured LoRA setup.

hf auth login   # if the model is gated
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml

Example output:

***** train metrics *****
  epoch                    =        3.0
  total_flos               = 11076559GF
  train_loss               =     0.9993
  train_runtime            = 0:14:32.12
  train_samples_per_second =      3.749
  train_steps_per_second   =      0.471
Figure saved at: saves/qwen3-4b/lora/sft/training_loss.png

Step 9
Validate training completion

Verify that training completed successfully and checkpoints were saved.

ls -la saves/qwen3-4b/lora/sft/

Expected output should show:

  • Final checkpoint directory (checkpoint-411 or similar)
  • Model configuration files (adapter_config.json)
  • Training metrics showing decreasing loss values
  • Training loss plot saved as PNG file

Step 10
Test inference with fine-tuned model

Test your fine-tuned model with custom prompts:

llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
# Type: "Hello, how can you help me today?"
# Expect: Response showing fine-tuned behavior

Step 11
For production deployment, export your model

llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml

Step 12
Cleanup and rollback

WARNING

This will delete all training progress and checkpoints.

To remove the virtual environment and cloned repository:

deactivate
cd ..
rm -rf LLaMA-Factory/
rm -rf factoryEnv/

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation