NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data-science

  • Optimized JAX
  • Text to Knowledge Graph

tools

  • Comfy UI
  • DGX Dashboard
  • VS Code
  • RAG application in AI Workbench
  • Set up Tailscale on your Spark

fine-tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine tune with Pytorch
  • Unsloth on DGX Spark
  • Vision-Language Model Fine-tuning

use-case

  • Build and Deploy a Multi-Agent Chatbot
  • NCCL for Two Sparks
  • Connect Two Sparks
  • Video Search and Summarization

inference

  • Multi-modal Inference
  • NIM on Spark
  • NVFP4 Quantization
  • Speculative Decoding
  • TRT LLM for Inference
  • Install and Use vLLM for Inference

LLaMA Factory

1 HR

Install and fine-tune models with LLaMA Factory

View GitHub

Basic idea

LLaMA Factory is an open-source framework that simplifies the process of training and fine tuning large language models. It offers a unified interface for a variety of cutting edge methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune large language models using LLaMA Factory CLI on your NVIDIA Spark device.

What you'll accomplish

You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient model adaptation for specialized domains while leveraging hardware-specific optimizations.

What to know before starting

  • Basic Python knowledge for editing config files and troubleshooting
  • Command line usage for running shell commands and managing environments
  • Familiarity with PyTorch and Hugging Face Transformers ecosystem
  • GPU environment setup including CUDA/cuDNN installation and VRAM management
  • Fine-tuning concepts: understanding tradeoffs between LoRA, QLoRA, and full fine-tuning
  • Dataset preparation: formatting text data into JSON structure for instruction tuning
  • Resource management: adjusting batch size and memory settings for GPU constraints

Prerequisites

  • NVIDIA Spark device with Blackwell architecture

  • CUDA 12.9 or newer version installed: nvcc --version

  • Docker installed and configured for GPU access: docker run --gpus all nvidia/cuda:12.9-devel nvidia-smi

  • Git installed: git --version

  • Python environment with pip: python --version && pip --version

  • Sufficient storage space (>50GB for models and checkpoints): df -h

  • Internet connection for downloading models from Hugging Face Hub

Ancillary files

  • Official LLaMA Factory repository: https://github.com/hiyouga/LLaMA-Factory

  • NVIDIA PyTorch container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

  • Example training configuration: examples/train_lora/llama3_lora_sft.yaml (from repository)

  • Documentation: https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html

Time & risk

  • Duration: 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
  • Risks: Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
  • Rollback: Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.

Resources

  • DGX Spark Documentation
  • DGX Spark Forum