LLaMA Factory is an open-source framework that simplifies the process of training and fine tuning large language models. It offers a unified interface for a variety of cutting edge methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune large language models using LLaMA Factory CLI on your NVIDIA Spark device.
You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient model adaptation for specialized domains while leveraging hardware-specific optimizations.
NVIDIA Spark device with Blackwell architecture
CUDA 12.9 or newer version installed: nvcc --version
Git installed: git --version
Python 3 with venv and pip: python3 --version && pip3 --version
Sufficient storage space (>50GB for models and checkpoints): df -h
Internet connection for downloading models from Hugging Face Hub
Official LLaMA Factory repository: https://github.com/hiyouga/LLaMA-Factory
PyTorch with CUDA 13: install via pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
Example training configuration: examples/train_lora/qwen3_lora_sft.yaml (from repository)
Documentation: https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html
factoryEnv and LLaMA-Factory directories. Training checkpoints are saved locally and can be deleted to reclaim storage space.