Unsloth on DGX Spark
1 HR
Optimized fine-tuning with Unsloth
Verify prerequisites
Confirm your NVIDIA Spark device has the required CUDA toolkit and GPU resources available.
nvcc --version
The output should show CUDA 13.0.
nvidia-smi
The output should show a summary of GPU information.
Get the container image
docker pull nvcr.io/nvidia/pytorch:25.09-py3
Launch Docker
docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.09-py3
Install dependencies inside Docker
pip install transformers peft "datasets==4.3.0" "trl==0.19.1"
pip install --no-deps unsloth unsloth_zoo
pip install hf_transfer
Build and install bitsandbytes inside Docker
pip install --no-deps bitsandbytes
Create Python test script
Curl the test script here into the container.
curl -O https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/unsloth/assets/test_unsloth.py
We will use this test script to validate the installation with a simple fine-tuning task.
Run the validation test
Execute the test script to verify Unsloth is working correctly.
python test_unsloth.py
Expected output in the terminal window:
- "Unsloth: Will patch your computer to enable 2x faster free finetuning"
- Training progress bars showing loss decreasing over 60 steps
- Final training metrics showing completion
Next steps
Test with your own model and dataset by updating the test_unsloth.py file:
# Replace line 32 with your model choice
model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
# Load your custom dataset in line 8
dataset = load_dataset("your_dataset_name")
# Adjust training parameter args at line 61
per_device_train_batch_size = 4
max_steps = 1000
Visit https://github.com/unslothai/unsloth/wiki for advanced usage instructions, including: