LLaMA Factory

1 HR

Install and fine-tune models with LLaMA Factory

Basic idea

LLaMA Factory is an open-source framework that simplifies the process of training and fine tuning large language models. It offers a unified interface for a variety of cutting edge methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune large language models using LLaMA Factory CLI on your NVIDIA Spark device.

What you'll accomplish

You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient model adaptation for specialized domains while leveraging hardware-specific optimizations.

What to know before starting

  • Basic Python knowledge for editing config files and troubleshooting
  • Command line usage for running shell commands and managing environments
  • Familiarity with PyTorch and Hugging Face Transformers ecosystem
  • GPU environment setup including CUDA/cuDNN installation and VRAM management
  • Fine-tuning concepts: understanding tradeoffs between LoRA, QLoRA, and full fine-tuning
  • Dataset preparation: formatting text data into JSON structure for instruction tuning
  • Resource management: adjusting batch size and memory settings for GPU constraints

Prerequisites

  • NVIDIA Spark device with Blackwell architecture

  • CUDA 12.9 or newer version installed: nvcc --version

  • Docker installed and configured for GPU access: docker run --gpus all nvidia/cuda:12.9-devel nvidia-smi

  • Git installed: git --version

  • Python environment with pip: python --version && pip --version

  • Sufficient storage space (>50GB for models and checkpoints): df -h

  • Internet connection for downloading models from Hugging Face Hub

Ancillary files

Time & risk

  • Duration: 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
  • Risks: Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
  • Rollback: Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.