
Easily run essential genomics workflows to save time leveraging Parabricks and CodonFM.
This developer example enables bioinformaticians to run GPU-accelerated genomics workflows in minutes on any cloud through Brev.dev. NVIDIA® Parabricks® powers both linear and graph-based read alignment along with variant calling via DeepVariant. CodonFM, NVIDIA's open-source suite of foundation models for RNA codon sequences, can then be used to predict the functional impact of each detected variant on specific genes.
This developer example shows how to use GPU accelerated tools for alignment (linear and graph), variant calling, and variant effect prediction.
The exact steps to run this workflow are outlined below:
All the code can be found in Jupyter notebooks in the notebooks directory of the Github repo.
germline_wes.ipynbRuns a standard germline variant calling workflow on whole exome sequencing (WES) data using NVIDIA Parabricks. Downloads the NA12878 sample from the Genome in a Bottle consortium, aligns reads to the GRCh38 reference using GPU-accelerated BWA-MEM via Parabricks fq2bam, and calls variants with GPU-accelerated DeepVariant, producing a final .vcf file.
pangenome.ipynbDemonstrates a pangenome analysis workflow as an alternative to single-reference alignment using NVIDIA Parabricks. Downloads the HPRC v1.1 pangenome graph, aligns short-read FASTQ samples using GPU-accelerated Giraffe, and calls variants with Pangenome-Aware DeepVariant — a variant of DeepVariant that uses the pangenome graph to improve alignment accuracy and variant detection across diverse populations.
variant_effect_prediction.ipynbRuns a full variant effect prediction pipeline starting from raw FASTQ files. It uses NVIDIA Parabricks to align reads and call variants, processes GENCODE gene annotations to extract protein-coding sequences, maps detected variants onto transcripts, and uses CodonFM (NVIDIA's RNA foundation model) to predict the functional impact of each variant via log likelihood ratios.
The L40s with at least 48GB of GPU memory is recommended for the best combination of cost and performance. Users can also try L4 or T4 (better cost) or RTX Pro 6000 (better performance).
NVIDIA Parabricks can be run on any NVIDIA GPU that supports CUDA® architecture 75, 80, 86, 89, 90, 100, or 120 and has at least 16GB of GPU RAM.
Parabricks has been tested specifically on the following NVIDIA GPUs:
The minimum amount of CPU RAM and CPU threads depends on the number of GPUs. Please refer to the table below:
| GPUs | Minimum CPU RAM (GB) | Minimum CPU Threads |
|---|---|---|
| 2 | 100 | 24 |
| 4 | 196 | 32 |
| 8 | 392 | 48 |
Governing Terms: The Blueprint scripts are governed by the NVIDIA Software License Agreement, the Product-Specific Terms for NVIDIA AI Products, and enables use of separate open source and proprietary software governed by their respective licenses:
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.