
nvidia
Single Cell Analysis
Investigate, understand, and interpret single cell data in minutes, not days by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS
Single-Cell Analysis
Investigate, understand, and interpret single-cell data in minutes, not days, by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS™
Overview
For single-cell analysis, scientists can test near-real-time data analysis and visualization easily, achieving up to 938X faster accelerations versus CPU by using RAPIDS-singlecell, developed by scverse. This blueprint is for scientists who understand single-cell analysis and want to leverage RAPIDS for single-cell data.
Experience Workflow
It is strongly recommended that users review the README in this blueprint before working through the notebooks.
For this blueprint, two possible deployments are provided:
- The Standard Instance: L40s
- The Large Instance: 8x H100
Please use the table in the Notebook Overview below to determine which size is right for you.
The workflow is as follows:
- After initial code setup, this blueprint utilizes publicly available datasets including those from 10x Genomics and CZ CELLxGENE. Scientists can use their Python API to read the data directly into an AnnData object.
- General data preprocessing is performed to clean up and better understand the dataset. This includes calculating QC metrics, filtering, and data normalization.
- The data is investigated quantitatively and visually, including feature selection, clustering, dimensionality reduction, and data integration using canonical tools.
- The data is visualized and plotted to help users investigate the biological diversity within the sample.
- A number of additional advanced tutorials are available for users who are interested in spatial transcriptomics analysis, as well as scaling to 11M cells easily and quickly.
Notebooks Outline
The outline below is a suggested exploration flow. Unless otherwise noted, users can choose any notebook to get started, as long as the GPU resources are available to run the notebook.
For those who are new to doing basic analysis for single-cell data, the end-to-end analysis of 01_scRNA_analysis_preprocessing is the best place to start, where users are walked through the steps of data preprocessing, cleanup, visualization, and investigation.
Notebook | Description | Min GPU Size / Instance |
---|---|---|
01_scRNA_analysis_preprocessing.ipynb | End to end workflow, where we understand the cells, run ETL on the data set then visiualize and explore the results. This tutorial is good for all users | 24GB / Standard RSC Instance |
02_scRNA_analysis_extended.ipynb | This notebook continues from the outputs of 01_scRNA_analysis_preprocessing.ipynb as an overview of methods that can be used to investigate transcriptional regulation | 24GB / Standard RSC Instance |
03_scRNA_analysis_with_pearson_residuals.ipynb | End to end workflow, like 01_scRNA_analysis_preprocessing.ipynb, but uses pearson residuals for normalization. | 24GB / Standard RSC Instance |
04_scRNA_analysis_dask_out_of_core.ipynb | In this notebook, we show the scalability of the analysis toof up to 11M cells easily by using Dask. Requires a 48GB GPU | 48GB / Standard RSC Instance |
05_scRNA_analysis_multi_GPU.ipynb | This notebook enhances the 11M cell dataset analysis with Dask without exceeding memory limits. It fully scales to utilize all available GPUs, uses chunk-based execution, and efficiently manages memory Requires 8x H100s or better. For all other GPUs systems, please run 04_scRNA_analysis_dask_out_of_core.ipynb instead | 8x 80GB / Large RSC Instance |
06_scRNA_analysis_90k_brain_example.ipynb | In this notebook, show diversity in capability by run a similar workflow to 01_scRNA_analysis_preprocessing.ipynb, but on brain cells | 24GB / Standard RSC Instance |
07_scRNA_analysis_1.3M_brain_example.ipynb | In this notebook, we scale up the analysis of 06_scRNA_analysis_90k_brain_example.ipynb to 1 million brain cells. Requires an 80GB GPU, like an H100 | 80GB / Large RSC Instance |
Architecture Diagram
Software
The following containers are used in this blueprint:
- RAPIDS v25.04
Additional software—including use of RAPIDS-singlecell, developed by scverse—is available on GitHub accompanying these notebooks.
Minimum System Requirements
The single-cell analysis blueprint recommends using L40s with minimum 24GB VRAM, unless otherwise stated in the tutorial. Users may have to wait 5–10 minutes for the instance to start, depending on cloud availability.
The blueprint supports:
Hardware Requirements
- We recommend using NVIDIA GPU L40s for the best user experience and performance-to-cost ratio for this blueprint, unless otherwise stated in the tutorial. The Large or MultiGPU notebooks require one or more 80GB GPUs. We suggest using an 8x H100 instance.
- Other supported instances, if available in your region:
- H100
- A100
- A10
- L4
- GH200
- 24 GB VRAM or more recommended
Software Requirements
License
Governing Terms:
The Single Cell Blueprint Github Repository, including notebooks, and RAPIDS AI Container are governed by the Apache 2.0 License. The RAPIDS-singlecell software is governed by the MIT License. The datasets are governed by CC-BY 4.0. The other supporting open source software are governed by their accompanying licenses.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI concerns here.