ipd/rfdiffusion

RUN ANYWHERE

A generative model of protein backbones for protein binder design.

Model Overview

Description:

RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules.

This model is available for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA GitHub Model Card.

License/Terms of Use:

This model is released under the BSD License.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

References:

@ARTICLE{nat2023rfdiffusion, title = "De novo design of protein structure and function with RFdiffusion", author = "Watson, Joseph L. and Juergens, David and Bennett, Nathaniel R. and Trippe, Brian L. and Yim, Jason and Eisenach, Helen E. and Ahern, Woody and Borst, Andrew J. and Ragotte, Robert J. and Milles, Lukas F. and Wicky, Basile I. M. and Hanikel, Nikita and Pellock, Samuel J. and Courbet, Alexis and Sheffler, William and Wang, Jue and Venkatesh, Preetham and Sappington, Isaac and Torres, Susana Vázquez and Lauko, Anna and De Bortoli, Valentin and Mathieu, Emile and Ovchinnikov, Sergey and Barzilay, Regina and Jaakkola, Tommi S. and DiMaio, Frank and Baek, Minkyung and Baker, David", journal = "Nature", volume = 620, number = 7976, pages = "1089--1100", month = aug, year = 2023, language = "en", doi = {10.1038/s41586-023-06415-8} }

Model Architecture:

Architecture Type: Diffusion-based Generative Neural Network
Network Architecture: RFdiffusion

Input:

Input Type(s): Text (Protein)
Input Format(s): Protein Data Bank (PDB)
Input Parameters: String, One-Dimensional (1D)

Output:

Output Type(s): Text (Protein)
Output Format: Protein Data Bank (PDB)
Output Parameters: String, 1D

Software Integration:

Runtime Engine(s):

  • PyTorch

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Hopper
  • NVIDIA Ampere
  • NVIDIA Lovelace
  • NVIDIA Turing

[Preferred/Supported] Operating System(s):

  • Linux

Model Version(s):

RFdiffusion 2.0.0

Training & Evaluation:

Training Dataset:

Link: The Protein Data Bank

** Data Collection Method by dataset

  • Hybrid: Automatic, Human

For PDB dataset, scientists worldwide submit structural data determined by X-ray crystallography or cryo-electron microscopy (cryo-EM). This includes atomic coordinates, experimental data, and metadata about the biological macromolecules.

** Labeling Method by dataset

  • Hybrid: Automatic, Human

For PDB dataset, expert biocurators review the submitted data to ensure accuracy and completeness. This involves checking the plausibility of the data and annotating it with relevant biological and chemical information.

Properties (Quantity, Dataset Descriptions, Sensor(s)): The training dataset used for RFdiffusion, as detailed in referenced paper, consists of protein structures sampled from the Protein Data Bank (PDB). To prepare these structures for training, a noising process is applied. This process involves simulating up to 200 steps of random modifications on the protein structures. Specifically, the modifications include perturbing the Cα coordinates with 3D Gaussian noise and applying Brownian motion to the residue orientations on the manifold of rotation matrices.

Dataset License(s): CC0 1.0.

Evaluation Dataset:

The evaluation strategy involved training the model on PDB structures (as described in Training Dataset) with added noise and then assessing its ability to denoise these structures, as well as evaluating its performance on design tasks with auxiliary conditioning information.

** Data Collection Method by dataset

  • Automatic: random splits from PDB dataset.

** Labeling Method by dataset

  • Automatic: random splits from PDB dataset.

The training, validation, and test splits were derived from protein assemblies in the PDB, which includes structures determined by X-ray crystallography or cryo-electron microscopy (cryoEM).

Inference:

Engine: PyTorch
Test Hardware:

  • A100
  • L40
  • H100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.