A generative model of protein backbones for protein binder design.
RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules.
This model is available for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed
and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA GitHub Model Card.
This model is released under the BSD License.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.
@ARTICLE{nat2023rfdiffusion, title = "De novo design of protein structure and function with RFdiffusion", author = "Watson, Joseph L. and Juergens, David and Bennett, Nathaniel R. and Trippe, Brian L. and Yim, Jason and Eisenach, Helen E. and Ahern, Woody and Borst, Andrew J. and Ragotte, Robert J. and Milles, Lukas F. and Wicky, Basile I. M. and Hanikel, Nikita and Pellock, Samuel J. and Courbet, Alexis and Sheffler, William and Wang, Jue and Venkatesh, Preetham and Sappington, Isaac and Torres, Susana Vázquez and Lauko, Anna and De Bortoli, Valentin and Mathieu, Emile and Ovchinnikov, Sergey and Barzilay, Regina and Jaakkola, Tommi S. and DiMaio, Frank and Baek, Minkyung and Baker, David", journal = "Nature", volume = 620, number = 7976, pages = "1089--1100", month = aug, year = 2023, language = "en", doi = {10.1038/s41586-023-06415-8} }
Architecture Type: Diffusion-based Generative Neural Network
Network Architecture: RFdiffusion
Input Type(s): Text (Protein)
Input Format(s): Protein Data Bank (PDB)
Input Parameters: String, One-Dimensional (1D)
Output Type(s): Text (Protein)
Output Format: Protein Data Bank (PDB)
Output Parameters: String, 1D
Runtime Engine(s):
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
RFdiffusion 2.0.0
Link:
The Protein Data Bank
** Data Collection Method by dataset
For PDB dataset, scientists worldwide submit structural data
determined by X-ray crystallography or cryo-electron microscopy (cryo-EM).
This includes atomic coordinates, experimental data, and metadata about the
biological macromolecules.
** Labeling Method by dataset
For PDB dataset, expert biocurators review the submitted data to
ensure accuracy and completeness. This involves checking the plausibility of
the data and annotating it with relevant biological and chemical information.
Properties (Quantity, Dataset Descriptions, Sensor(s)):
The training dataset
used for RFdiffusion, as detailed in referenced paper, consists of protein structures sampled
from the Protein Data Bank (PDB). To prepare these structures for training, a
noising process is applied. This process involves simulating up to 200 steps of
random modifications on the protein structures. Specifically, the modifications
include perturbing the Cα coordinates with 3D Gaussian noise and applying
Brownian motion to the residue orientations on the manifold of rotation
matrices.
Dataset License(s): CC0 1.0.
The evaluation strategy involved training the model on PDB structures (as
described in Training Dataset) with added noise and then assessing its ability
to denoise these structures, as well as evaluating its performance on design
tasks with auxiliary conditioning information.
** Data Collection Method by dataset
** Labeling Method by dataset
The training, validation, and test splits were derived from protein assemblies
in the PDB, which includes structures determined by X-ray
crystallography or cryo-electron microscopy (cryoEM).
Engine: PyTorch
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.