LLaMA Factory

1 HR

Install and fine-tune models with LLaMA Factory

Verify system prerequisites

Check that your NVIDIA Spark system has the required components installed and accessible.

nvcc --version
docker --version
nvidia-smi
python --version
git --version

Launch PyTorch container with GPU support

Start the NVIDIA PyTorch container with GPU access and mount your workspace directory.

NOTE

This NVIDIA PyTorch container supports CUDA 13

docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.11-py3 bash

Clone LLaMA Factory repository

Download the LLaMA Factory source code from the official repository.

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

Install LLaMA Factory with dependencies

Remove the torchaudio dependency (not needed for LLM fine-tuning) to avoid conflicts with the container's optimized PyTorch, then install.

# Remove torchaudio dependency that conflicts with NVIDIA's PyTorch build
sed -i 's/"torchaudio[^"]*",\?//' pyproject.toml

# Install LLaMA Factory with metrics support
pip install -e ".[metrics]"
pip install --no-deps torchaudio

Verify Pytorch CUDA support.

PyTorch is pre-installed with CUDA support.

To verify installation:

python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

Prepare training configuration

Examine the provided LoRA fine-tuning configuration for Llama-3.

cat examples/train_lora/qwen3_lora_sft.yaml

Launch fine-tuning training

NOTE

Login to your hugging face hub to download the model if the model is gated.

Execute the training process using the pre-configured LoRA setup.

hf auth login # if the model is gated
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml

Example output:

***** train metrics *****
  epoch                    =        3.0
  total_flos               = 11076559GF
  train_loss               =     0.9993
  train_runtime            = 0:14:32.12
  train_samples_per_second =      3.749
  train_steps_per_second   =      0.471
Figure saved at: saves/qwen3-4b/lora/sft/training_loss.png

Validate training completion

Verify that training completed successfully and checkpoints were saved.

ls -la saves/qwen3-4b/lora/sft/

Expected output should show:

  • Final checkpoint directory (checkpoint-411 or similar)
  • Model configuration files (adapter_config.json)
  • Training metrics showing decreasing loss values
  • Training loss plot saved as PNG file

Test inference with fine-tuned model

Test your fine-tuned model with custom prompts:

llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
# Type: "Hello, how can you help me today?"
# Expect: Response showing fine-tuned behavior

For production deployment, export your model

llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml

Cleanup and rollback

WARNING

This will delete all training progress and checkpoints.

To remove all generated files and free up storage space:

cd /workspace
rm -rf LLaMA-Factory/
docker system prune -f

To rollback Docker container changes:

exit  # Exit container
docker container prune -f