Single-cell RNA Sequencing

15 MIN

An end-to-end GPU-powered workflow for scRNA-seq using RAPIDS

Basic idea

Single-cell RNA sequencing (scRNA-seq) lets researchers study gene activity in each cell on its own, exposing variation, cell types, and cell states that bulk methods hide. But these large, high-dimensional datasets take heavy compute to handle.

This playbook shows an end-to-end GPU-powered workflow for scRNA-seq using RAPIDS-singlecell, a RAPIDS powered library in the scverse® ecosystem. It follows the familiar Scanpy API and lets researchers run the steps of data preprocessing, quality control (QC) and cleanup, visualization, and investigation faster than CPU tools by working with sparse count matrices directly on the GPU.

What you'll accomplish

  1. GPU-Accelerated Data Loading & Preprocessing
  2. QC cells visually to understand the data
  3. Filter unusual cells
  4. Remove unwanted sources of variation
  5. Cluster and visualize PCA and UMAP data
  6. Batch Correction and analysis using Harmony, k-nearest neighbors, UMAP, and tSNE
  7. Explore the biological information from the data with differential expression analysis and trajectory analysis

The README elaborates on these steps.

What to know before starting

  • The rapids-singlecell library mimics the Scanpy API from scverse, allowing users familiar with the standard CPU workflow to easily adapt to GPU acceleration through cuPy and NVIDIA RAPIDS cuML and cuGraph.
  • Algorithmic Precision: Unlike Scanpy's CPU implementation which uses approximate nearest neighbor search, this GPU implementation computes the exact graph; consequently, small differences in results are expected and valid.
  • Parameter Sensitivity: When performing t-SNE, the number of nearest neighbors must be at least 3x to avoid distortion

Prerequisites

Hardware Requirements:

  • NVIDIA Grace Blackwell GB10 Superchip System (DGX Spark)
  • Minimum 40GB Unified memory free for docker container and GPU accelerated data processing
  • At least 30GB available storage space for docker container and data files
  • High Speed network connectivity
  • High speed internet connection recommended

Software Requirements:

  • NVIDIA DGX OS
  • Docker

Ancillary files

All required assets can be found in the Single-cell RNA Sequencing repository. In the running playbook, they will all be found under the playbook folder.

  • scRNA_analysis_preprocessing.ipynb - Main playbook notebook.
  • README.md - Quick Start Guide to the Playbook Environment. It will also be found in the main directory of the Jupyter Lab. Please start there!
  • /setup/start_playbook.sh - Script to start the install of the playbook in a Docker container
  • /setup/setup_playbook.sh - Configures the Docker container before user enters JupyterLab environment
  • /setup/requirements.txt - used as a list of libraries that commands in setup_playbook will install into the playbook environment

Time & risk

  • Estimated Time: ~15 minutes for first run

    • Total Notebook Processing Time: Approximately 2-3 minutes for the full pipeline (~130 seconds recorded in demo).
    • Data Loading: ~1.7 seconds.
    • Preprocessing: ~21 seconds.
    • Post-processing (Clustering/Diff Exp): ~104 seconds.
    • Data: Internet access to download the docker container, libraries, and demo dataset (dli_census.h5ad).
  • Risks

    • GPU Memory Constraints: The workflow is very GPU memory intensive. Large datasets may trigger Out Of Memory (OOM) errors.
    • Kernel Management: You may need to kill/restart kernels to free up GPU resources between workflow stages.
    • Rollback: If an OOM error occurs, kill all kernels to free GPU memory and restart either the specific notebook or the entire playbook.
  • Last Updated: 01/02/2026

    • First Publication