Genomics Analysis Blueprint by NVIDIA

Overview

This developer example enables bioinformaticians to run GPU-accelerated genomics workflows in minutes on any cloud through Brev.dev. NVIDIA® Parabricks® powers both linear and graph-based read alignment along with variant calling via DeepVariant. CodonFM, NVIDIA's open-source suite of foundation models for RNA codon sequences, can then be used to predict the functional impact of each detected variant on specific genes.

Experience Workflow

This developer example shows how to use GPU accelerated tools for alignment (linear and graph), variant calling, and variant effect prediction.

Architecture Diagram

The exact steps to run this workflow are outlined below:

Notebook Outline

All the code can be found in Jupyter notebooks in the notebooks directory of the Github repo.

`germline_wes.ipynb`

Runs a standard germline variant calling workflow on whole exome sequencing (WES) data using NVIDIA Parabricks. Downloads the NA12878 sample from the Genome in a Bottle consortium, aligns reads to the GRCh38 reference using GPU-accelerated BWA-MEM via Parabricks fq2bam, and calls variants with GPU-accelerated DeepVariant, producing a final .vcf file.

`pangenome.ipynb`

Demonstrates a pangenome analysis workflow as an alternative to single-reference alignment using NVIDIA Parabricks. Downloads the HPRC v1.1 pangenome graph, aligns short-read FASTQ samples using GPU-accelerated Giraffe, and calls variants with Pangenome-Aware DeepVariant — a variant of DeepVariant that uses the pangenome graph to improve alignment accuracy and variant detection across diverse populations.

`variant_effect_prediction.ipynb`

Runs a full variant effect prediction pipeline starting from raw FASTQ files. It uses NVIDIA Parabricks to align reads and call variants, processes GENCODE gene annotations to extract protein-coding sequences, maps detected variants onto transcripts, and uses CodonFM (NVIDIA's RNA foundation model) to predict the functional impact of each variant via log likelihood ratios.

How to Run

Hardware Requirements

The L40s with at least 48GB of GPU memory is recommended for the best combination of cost and performance. Users can also try L4 or T4 (better cost) or RTX Pro 6000 (better performance).

NVIDIA Parabricks can be run on any NVIDIA GPU that supports CUDA® architecture 75, 80, 86, 89, 90, 100, or 120 and has at least 16GB of GPU RAM.

Parabricks has been tested specifically on the following NVIDIA GPUs:

T4
A10, A30, A40, A100, A6000
L4, L40
H100, H200
GH200
B200, B300
GB200, GB300
RTX PRO 6000 Blackwell Server Edition
RTX PRO 4500
DGX Spark
DGX Station

The minimum amount of CPU RAM and CPU threads depends on the number of GPUs. Please refer to the table below:

GPUs	Minimum CPU RAM (GB)	Minimum CPU Threads
2	100	24
4	196	32
8	392	48

Software Requirements

Any NVIDIA driver that is compatible with CUDA 12.9 (535, 550, 570, 575, or similar). Please check here for more details on forward compatibility.
Any Linux operating system that supports Docker version 20.10 (or higher) with the NVIDIA GPU runtime.

References

Terms of Use

Governing Terms: The Blueprint scripts are governed by the NVIDIA Software License Agreement, the Product-Specific Terms for NVIDIA AI Products, and enables use of separate open source and proprietary software governed by their respective licenses:

NVIDIA Parabricks (NVIDIA Software License Agreement, the Product-Specific Terms for NVIDIA AI Products)
NV-CodonFM-Encodon-TE-80M-v1 model (NVIDIA Open Model License);
CodonFM (Apache 2.0); and
The other supporting open source software in the Clara Parabricks Workflows public repo are governed by their accompanying licenses (Apache 2.0, BSD 3-Clause, MIT License, and PIGZ License).

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

nvidia

Genomics Analysis