# Single-Cell Analysis Investigate, understand, and interpret single-cell data in minutes, not days, by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS™ ## Overview For single-cell analysis, scientists can test near-real-time data analysis and visualization easily, achieving up to 938X faster accelerations versus CPU by using [RAPIDS-singlecell](https://rapids-singlecell.readthedocs.io/), developed by [scverse](https://scverse.org/about/). This blueprint is for scientists who understand single-cell analysis and want to leverage [RAPIDS](https://rapids.ai/) for single-cell data. ## Experience Workflow It is strongly recommended that users review the [README](https://github.com/NVIDIA-AI-Blueprints/single-cell-analysis-blueprint/blob/main/README.md) in this blueprint before working through the notebooks. For this blueprint, two possible deployments are provided: 1. The Standard Instance: L40s 2. The Large Instance: 8x H100 Please use the table in the Notebook Overview below to determine which size is right for you. The workflow is as follows: 1. After initial code setup, this blueprint utilizes publicly available datasets including those from [10x Genomics](https://www.10xgenomics.com/datasets) and [CZ CELLxGENE](https://cellxgene.cziscience.com/). Scientists can use their [Python API](https://chanzuckerberg.github.io/cellxgene-census/python-api.html#) to read the data directly into an [AnnData](https://anndata.readthedocs.io/en/stable/) object. 2. General data preprocessing is performed to clean up and better understand the dataset. This includes calculating QC metrics, filtering, and data normalization. 3. The data is investigated quantitatively and visually, including feature selection, clustering, dimensionality reduction, and data integration using canonical tools. 4. The data is visualized and plotted to help users investigate the biological diversity within the sample. 5. A number of additional advanced tutorials are available for users who are interested in spatial transcriptomics analysis, as well as scaling to 11M cells easily and quickly. ### **Notebooks Outline** The outline below is a suggested exploration flow. Unless otherwise noted, users can choose any notebook to get started, as long as the GPU resources are available to run the notebook. For those who are new to doing basic analysis for single-cell data, the end-to-end analysis of [01_scRNA_analysis_preprocessing](https://github.com/NVIDIA-AI-Blueprints/single-cell-analysis-blueprint/blob/main/notebooks/01_scRNA_analysis_preprocessing.ipynb) is the best place to start, where users are walked through the steps of data preprocessing, cleanup, visualization, and investigation. | Notebook | Description |Min GPU Size /
Instance | |------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | 01_scRNA_analysis_preprocessing.ipynb | End to end workflow, where we understand the cells, run ETL on the data set then visiualize and explore the results.
This tutorial is good for all users | 24GB /
Standard RSC Instance | | 02_scRNA_analysis_extended.ipynb | This notebook continues from the outputs of 01_scRNA_analysis_preprocessing.ipynb as an overview of methods that can be used to investigate transcriptional regulation | 24GB /
Standard RSC Instance | | 03_scRNA_analysis_with_pearson_residuals.ipynb | End to end workflow, like 01_scRNA_analysis_preprocessing.ipynb, but uses pearson residuals for normalization. | 24GB /
Standard RSC Instance | | 04_scRNA_analysis_dask_out_of_core.ipynb | In this notebook, we show the scalability of the analysis toof up to 11M cells easily by using Dask.
**Requires a 48GB GPU** | 48GB /
Standard RSC Instance | | 05_scRNA_analysis_multi_GPU.ipynb | This notebook enhances the 11M cell dataset analysis with Dask without exceeding memory limits.
It fully scales to utilize all available GPUs, uses chunk-based execution, and efficiently manages memory
**Requires 8x H100s or better. For all other GPUs systems, please run 04_scRNA_analysis_dask_out_of_core.ipynb instead**| 8x 80GB /
Large RSC Instance | | 06_scRNA_analysis_90k_brain_example.ipynb | In this notebook, show diversity in capability by run a similar workflow to 01_scRNA_analysis_preprocessing.ipynb, but on brain cells | 24GB /
Standard RSC Instance | | 07_scRNA_analysis_1.3M_brain_example.ipynb | In this notebook, we scale up the analysis of 06_scRNA_analysis_90k_brain_example.ipynb to 1 million brain cells.
**Requires an 80GB GPU, like an H100** | 80GB /
Large RSC Instance | ## Architecture Diagram ## ![](https://assets.ngc.nvidia.com/products/api-catalog/single-cell-analysis/diagram.jpg) ## Software The following containers are used in this blueprint: * [RAPIDS](https://developer.nvidia.com/rapids) v25.04 Additional software—including use of [RAPIDS-singlecell](https://rapids-singlecell.readthedocs.io/en/latest/), developed by [scverse](https://scverse.org/about/)—[is available on GitHub accompanying these notebooks](https://github.com/NVIDIA-AI-Blueprints/single-cell-analysis-blueprint/tree/main). ## Minimum System Requirements The single-cell analysis blueprint recommends using L40s with minimum 24GB VRAM, unless otherwise stated in the tutorial. Users may have to wait 5–10 minutes for the instance to start, depending on cloud availability. The blueprint supports: Hardware Requirements * We recommend using NVIDIA GPU L40s for the best user experience and performance-to-cost ratio for this blueprint, unless otherwise stated in the tutorial. The Large or MultiGPU notebooks require one or more 80GB GPUs. We suggest using an 8x H100 instance. * Other supported instances, if available in your region: * H100 * A100 * A10 * L4 * GH200 * 24 GB VRAM or more recommended Software Requirements * [Environment packages can be found on GitHub](https://github.com/NVIDIA-AI-Blueprints/single-cell-analysis-blueprint/tree/main) ## License Governing Terms: The Single Cell Blueprint Github Repository, including notebooks, and RAPIDS AI Container are governed by the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The RAPIDS-singlecell software is governed by the [MIT License](https://github.com/scverse/rapids_singlecell/blob/main/LICENSE). The datasets are governed by CC-BY 4.0. The other supporting open source software are governed by their accompanying licenses. ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).