
Genomics Analysis
Easily GPU accelerate essential genomics analysis workflows, such as germline, by using NVIDIA Parabricks.
Overview
For germline analysis, bioinformaticians can try a whole exome sequencing analysis workflow on short reads in a matter of minutes on any cloud available through Brev.dev, leveraging NVIDIA® Parabricks® fq2bam (BWA-MEM) for alignment and DeepVariant for variant calling.
Experience Workflow
It is strongly recommended that users review the README in this blueprint before working through the notebooks. Users can then execute the experience workflow in the germline_wes notebook.
The workflow is as follows:
- This example uses whole exome sequencing (WES) data from sample NA12878.
- Sequence reads are mapped to the reference genome. The input FASTQ files are aligned using the Burrows-Wheeler Aligner (BWA) through the Parabricks fq2bam tool.
- Run DeepVariant, a deep learning-based variant caller, on the aligned reads. It uses a convolutional neural network to find single nucleotide variants (SNVs) and insertions/deletions (InDels).
Architecture Diagram
Short-Read Analysis Workflow
Software
- The Parabricks 4.4.0 container is used in this blueprint - try the latest container here.
Reference(s)
Minimum System Requirements
Users may have to wait 5–10 minutes for the instance to start, depending on cloud availability. The germline analysis blueprint supports the following hardware:
Hardware Requirements
- The L40s with at least 48GB of GPU memory is recommended for the best combination of cost and performance. Users can also try L4 or T4 (better cost) or A100 (better performance).
- The fq2bam tool requires at least 40 GB of GPU memory by default; the
--low-memory
option will reduce this to 16GB of GPU memory at the cost of slower processing. All other tools require at least 16GB of GPU memory per GPU.
Optional Hardware Requirements
- Parabricks can be run on any NVIDIA GPU that supports CUDA® architecture 70, 75, 80, 86, 89, or 90 and has at least 16GB of GPU RAM. NVIDIA Parabricks has been tested on the following NVIDIA GPUs:
- V100
- T4
- A10, A30, A40, A100, A6000
- L4, L40
- H100, H200
- Grace Hopper™ Superchip
- System Requirements:
- A 2-GPU system should have at least 100GB of CPU RAM and at least 24 CPU threads.
- A 4-GPU system should have at least 196GB of CPU RAM and at least 32 CPU threads.
- An 8-GPU system should have at least 392GB of CPU RAM and at least 48 CPU threads.
Software Requirements
- An NVIDIA driver with version 525.60.13 or greater. Please check here for more details on forward compatibility.
- Any Linux operating system that supports Docker version 20.10 (or higher) with the NVIDIA GPU runtime.
Terms of Use
Governing Terms: The Parabricks container is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products. This Genomics Analysis Blueprint github repository is provided under Apache License 2.0.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.