
ProteinMPNN is a deep learning model for predicting amino acid sequences for protein backbones.
ProteinMPNN (Protein Message Passing Neural Network) is a deep learning-based graph neural network designed to predict amino acid sequences for given protein backbones. This network leverages evolutionary, functional, and structural information to generate sequences that are likely to fold into the desired 3D structures.
This model is available for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case.
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Service Terms of Use. Use of this model is governed by the NVIDIA Community Model License. Additional Information: MIT.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.
Global
ProteinMPNN enables researchers and commercial entities in the Drug Discovery, Life Sciences, and Protein Engineering fields to design amino acid sequences that fold into desired 3D protein structures. It is particularly useful for de novo protein design, enzyme engineering, and therapeutic protein development.
build.nvidia.com: August 13, 2025 via build.nvidia.com/ipd/proteinmpnn
NGC: August 13, 2025 via catalog.ngc.nvidia.com
@article{dauparas2022robust,
title={Robust deep learning--based protein sequence design using ProteinMPNN},
author={Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
journal={Science},
volume={378},
number={6615},
pages={49--56},
year={2022},
publisher={American Association for the Advancement of Science}
}
Architecture Type: Protein Amino Acid Sequence Prediction
Network Architecture: ProteinMPNN
Input Type(s): Protein in Protein Data Bank (PDB) format
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Accepts protein backbone structures in PDB format.
Output Type(s): Amino Acid Sequence
Output Format: Multi-FASTA (text file)
Output Parameters: 1D
Other Properties Related to Output: Generates sequences predicted to fold into the input backbone structure.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engine(s):
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
ProteinMPNN 1.0.0
** Data Modality
Link: The Protein Data Bank
** Data Collection Method by dataset
For PDB dataset, scientists worldwide submit structural data determined by X-ray crystallography or cryo-electron microscopy (cryo-EM). This includes atomic coordinates, experimental data, and metadata about the biological macromolecules.
** Labeling Method by dataset
For PDB dataset, expert biocurators review the submitted data to ensure accuracy and completeness. This involves checking the plausibility of the data and annotating it with relevant biological and chemical information. CATH 4.1 dataset is derived from the PDB dataset. The CATH (Class, Architecture, Topology, Homologous superfamily) database hierarchically classifies protein domain structures that are obtained from protein structures deposited in the PDB. The data in CATH are specifically sourced from PDB files and include structures determined at a resolution of 4 angstrom or better. The classification process involves both manual and automated methods to ensure accurate domain identification and classification. For ProteinMPNN, the data underwent quality filtering to ensure high accuracy, this involved removing structures with low resolution and potential errors.
Properties: Model was trained by Institute for Protein Design. The dataset for training consisted of 23,358 sequences. Dataset: CATH 4.2, PDB. Sensors: X-ray crystallography, cryoEM.
Link: The Protein Data Bank
** Data Collection Method by dataset
** Labeling Method by dataset
Properties: The training, validation, and test splits were derived from protein assemblies in the PDB, which includes structures determined by X-ray crystallography or cryo-electron microscopy (cryoEM). The dataset was divided into random splits with 23,358 sequences for training, 1,464 for validation, and 1,529 for testing.
Acceleration Engine: Triton
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
You are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated, and comply with applicable safety regulations and ethical standards.
Get access to knowledge base articles and support cases or submit a ticket.