Install and fine-tune models with LLaMA Factory
LLaMA Factory is an open-source framework that simplifies the process of training and fine tuning large language models. It offers a unified interface for a variety of cutting edge methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune large language models using LLaMA Factory CLI on your NVIDIA Spark device.
You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient model adaptation for specialized domains while leveraging hardware-specific optimizations.
NVIDIA Spark device with Blackwell architecture
CUDA 12.9 or newer version installed: nvcc --version
Docker installed and configured for GPU access: docker run --gpus all nvidia/cuda:12.9-devel nvidia-smi
Git installed: git --version
Python environment with pip: python --version && pip --version
Sufficient storage space (>50GB for models and checkpoints): df -h
Internet connection for downloading models from Hugging Face Hub
Official LLaMA Factory repository: https://github.com/hiyouga/LLaMA-Factory
NVIDIA PyTorch container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
Example training configuration: examples/train_lora/llama3_lora_sft.yaml
(from repository)
Documentation: https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html