Optimized fine-tuning with Unsloth
Confirm your NVIDIA Spark device has the required CUDA toolkit and GPU resources available.
nvcc --version
The output should show CUDA 13.0.
nvidia-smi
The output should show a summary of GPU information.
docker pull nvcr.io/nvidia/pytorch:25.09-py3
docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.09-py3
pip install transformers peft datasets "trl==0.19.1"
pip install --no-deps unsloth unsloth_zoo
pip install --no-deps bitsandbytes
Curl the test script here into the container.
curl -O https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py
We will use this test script to validate the installation with a simple fine-tuning task.
Execute the test script to verify Unsloth is working correctly.
python test_unsloth.py
Expected output in the terminal window:
Test with your own model and dataset by updating the test_unsloth.py file:
# Replace line 32 with your model choice
model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
# Load your custom dataset in line 8
dataset = load_dataset("your_dataset_name")
# Adjust training parameter args at line 61
per_device_train_batch_size = 4
max_steps = 1000
Visit https://github.com/unslothai/unsloth/wiki for advanced usage instructions, including: