nvidia

Single Cell Analysis

Investigate, understand, and interpret single cell data in minutes, not days by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS

View GitHub

Single-Cell Analysis

Investigate, understand, and interpret single-cell data in minutes, not days, by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS™

Overview

For single-cell analysis, scientists can test near-real-time data analysis and visualization easily, achieving up to 938X faster accelerations versus CPU by using RAPIDS-singlecell, developed by scverse. This blueprint is for scientists who understand single-cell analysis and want to leverage RAPIDS for single-cell data.

Experience Workflow

It is strongly recommended that users review the README in this blueprint before working through the notebooks.

For this blueprint, two possible deployments are provided:

  1. The Standard Instance: L40s
  2. The Large Instance: 8x H100

Please use the table in the Notebook Overview below to determine which size is right for you.

The workflow is as follows:

  1. After initial code setup, this blueprint utilizes publicly available datasets including those from 10x Genomics and CZ CELLxGENE. Scientists can use their Python API to read the data directly into an AnnData object.
  2. General data preprocessing is performed to clean up and better understand the dataset. This includes calculating QC metrics, filtering, and data normalization.
  3. The data is investigated quantitatively and visually, including feature selection, clustering, dimensionality reduction, and data integration using canonical tools.
  4. The data is visualized and plotted to help users investigate the biological diversity within the sample.
  5. A number of additional advanced tutorials are available for users who are interested in spatial transcriptomics analysis, as well as scaling to 11M cells easily and quickly.

Notebooks Outline

The outline below is a suggested exploration flow. Unless otherwise noted, users can choose any notebook to get started, as long as the GPU resources are available to run the notebook.

For those who are new to doing basic analysis for single-cell data, the end-to-end analysis of 01_scRNA_analysis_preprocessing is the best place to start, where users are walked through the steps of data preprocessing, cleanup, visualization, and investigation.

NotebookDescriptionMin GPU Size /
Instance
01_scRNA_analysis_preprocessing.ipynbEnd to end workflow, where we understand the cells, run ETL on the data set then visiualize and explore the results.
This tutorial is good for all users
24GB /
Standard RSC Instance
02_scRNA_analysis_extended.ipynbThis notebook continues from the outputs of 01_scRNA_analysis_preprocessing.ipynb as an overview of methods that can be used to investigate transcriptional regulation24GB /
Standard RSC Instance
03_scRNA_analysis_with_pearson_residuals.ipynbEnd to end workflow, like 01_scRNA_analysis_preprocessing.ipynb, but uses pearson residuals for normalization.24GB /
Standard RSC Instance
04_scRNA_analysis_dask_out_of_core.ipynbIn this notebook, we show the scalability of the analysis toof up to 11M cells easily by using Dask.
Requires a 48GB GPU
48GB /
Standard RSC Instance
05_scRNA_analysis_multi_GPU.ipynbThis notebook enhances the 11M cell dataset analysis with Dask without exceeding memory limits.
It fully scales to utilize all available GPUs, uses chunk-based execution, and efficiently manages memory
Requires 8x H100s or better. For all other GPUs systems, please run 04_scRNA_analysis_dask_out_of_core.ipynb instead
8x 80GB /
Large RSC Instance
06_scRNA_analysis_90k_brain_example.ipynbIn this notebook, show diversity in capability by run a similar workflow to 01_scRNA_analysis_preprocessing.ipynb, but on brain cells24GB /
Standard RSC Instance
07_scRNA_analysis_1.3M_brain_example.ipynbIn this notebook, we scale up the analysis of 06_scRNA_analysis_90k_brain_example.ipynb to 1 million brain cells.
Requires an 80GB GPU, like an H100
80GB /
Large RSC Instance

Architecture Diagram

Software

The following containers are used in this blueprint:

Additional software—including use of RAPIDS-singlecell, developed by scverseis available on GitHub accompanying these notebooks.

Minimum System Requirements

The single-cell analysis blueprint recommends using L40s with minimum 24GB VRAM, unless otherwise stated in the tutorial. Users may have to wait 5–10 minutes for the instance to start, depending on cloud availability.

The blueprint supports:

Hardware Requirements

  • We recommend using NVIDIA GPU L40s for the best user experience and performance-to-cost ratio for this blueprint, unless otherwise stated in the tutorial. The Large or MultiGPU notebooks require one or more 80GB GPUs. We suggest using an 8x H100 instance.
  • Other supported instances, if available in your region:
    • H100
    • A100
    • A10
    • L4
    • GH200
  • 24 GB VRAM or more recommended

Software Requirements

License

Governing Terms:

The Single Cell Blueprint Github Repository, including notebooks, and RAPIDS AI Container are governed by the Apache 2.0 License. The RAPIDS-singlecell software is governed by the MIT License. The datasets are governed by CC-BY 4.0. The other supporting open source software are governed by their accompanying licenses.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI concerns here.