nvidia / deepvariant

Run Google's DeepVariant optimized for GPU. Switch models for high accuracy on all major sequencers.

Model Overview

Description

DeepVariant (the Parabricks tool behind the Universal Variant Calling Microservice) is a deep learning model that can help identify variants in short- and long-read sequencing datasets.
This model is ready for commercial use.

DeepVariant works by taking aligned sequencing reads in BAM/CRAM format and utilizes a convolutional neural network (CNN) to classify the locus into true underlying genomic variation or sequencing error. DeepVariant can therefore call single nucleotide variants (SNVs) and insertions/deletions (InDels) from sequencing data at high accuracy in germline samples.

Parabricks DeepVariant is a highly optimized implementation of the DeepVariant pipeline that dramatically improves variant calling runtimes.

This model supports read sets from Illumina, Oxford Nanopore, and Pacific Biosciences natively; supports both whole-genome and whole-exome sequencing; and can output either Variant Call Format (VCF) or genomic VCF.

The Universal Variant Calling NIM can:

  • Process short-read whole exome data
  • Process short-read and long-read whole genome data
  • Perform inference locally or on NVIDIA GPU Cloud
  • Output VCF or gVCF.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to GitHub.

References(s)

Parabricks Latest Documentation

Terms of use

By using this software or model, you are agreeing to the NVIDIA Parabricks Terms of Use

Model Architecture

Architecture Type: Convolution Neural Network (CNN)

Network Architecture: Inceptionv2

For more information, see the Parabricks documentation.

Input

Input Type(s): Indices (Text, Binary)
Input Format(s): Tarball
Input Parameters: One Dimensional (1D)

  • A reference genome tarball that contains a reference genome and the indices generated by samtools and bwa. This can be generated by running:
samtools faidx <reference genome>
bwa index <reference genome>
tar cvf <reference genome>.tar <reference genome>*
  • A Binary Alignment Map (BAM) file from Parabricks fq2bam or Burrows-Wheeler Aligner.
  • A BAM Index (BAI) file.

Output

Output Type(s): Text (Sample, Manifest, Path, Path)
Output Format: VCF File
Output Parameters: 1D

The output of the DeepVariant Microservice is the following:

  • A VCF file containing variant calls for your sample.
  • A VCF manifest (which contains the needed parts to sign a multipart-upload request if running in the cloud).
  • A path to the STDOUT of the run (either locally or in cloud storage)
  • A path to the STDERR of the run (either locally or in cloud storage)

Software Integration

Supported Hardware Platform(s): NVIDIA GPU(s) with at least 24 GB of RAM, including Hopper, Lovelace, Ampere, Turing, and Volta generations.

Supported Operating System(s): Linux

Model Version:

  • V4.2.1-1

Inference

Engine: Triton and PyTriton
Test Hardware: Other