NVIDIA
Explore
Models
Blueprints
GPUs
Docs
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • CUDA-X Data Science
  • Optimized JAX
  • Text to Knowledge Graph

tools

  • VS Code
  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • NCCL for Two Sparks
  • Connect Two Sparks
  • Build a Video Search and Summarization (VSS) Agent

inference

  • SGLang Inference Server
  • Multi-modal Inference
  • NIM on Spark
  • NVFP4 Quantization
  • Speculative Decoding
  • TRT LLM for Inference
  • Install and Use vLLM for Inference
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

CUDA-X Data Science

30 MIN

Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes

View on GitHub
OverviewInstructions

Step 1
Verify system requirements

  • Verify the system has CUDA 13 installed using nvcc --version or nvidia-smi
  • Install conda using these instructions
  • Create Kaggle API key using these instructions and place the kaggle.json file in the same folder as the notebook

Step 2
Installing Data Science libraries

Use the following command to install the CUDA-X libraries (this will create a new conda environment)

  conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia  \
  rapids=25.10 python=3.12 'cuda-version=13.0' \
  jupyter hdbscan umap-learn

Step 3
Activate the conda environment

  conda activate rapids-test

Step 4
Cloning the playbook repository

  • Clone the github repository and go the assets folder place in cuda-x-data-science folder
      git clone https://github.com/NVIDIA/dgx-spark-playbooks
    
  • Place the kaggle.json created in Step 1 in the assets folder

Step 5
Run the notebooks

There are two notebooks in the GitHub repository. One runs an example of a large strings data processing workflow with pandas code on GPU.

  • Run the cudf_pandas_demo.ipynb notebook and use localhost:8888 in your browser to access the notebook
      jupyter notebook cudf_pandas_demo.ipynb
    

The other goes over an example of machine learning algorithms including UMAP and HDBSCAN.

  • Run the cuml_sklearn_demo.ipynb notebook and use localhost:8888 in your browser to access the notebook
      jupyter notebook cuml_sklearn_demo.ipynb
    

If you are remotely accessing your DGX-Spark then make sure to forward the necesary port to access the notebook in your local browser. Use the below instruction for port fowarding

  ssh -N -L YYYY:localhost:XXXX username@remote_host 
  • YYYY: The local port you want to use (e.g. 8888)
  • XXXX: The port you specified when starting Jupyter Notebook on the remote machine (e.g. 8888)
  • -N: Prevents SSH from executing a remote command
  • -L: Spcifies local port forwarding

Resources

  • NVIDIA RAPIDS Documentation
  • DGX Spark Documentation
  • DGX Spark DevZone Forum