Predicts the 3D structure of a protein from its amino acid sequence.
ESMFold is a protein structure prediction deep learning model developed by Facebook AI Research (FAIR) lin2023esmfold
. The model was inspired by AlphaFold, but does not require multiple sequence alignment (MSA) as an input, leading to significantly faster inference times for protein structure prediction that is nearly as accurate as alignment-based methods.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA Model Card.
@ARTICLE{lin2023esmfold, title = "Evolutionary-scale prediction of atomic-level protein structure with a language model", author = "Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yaniv and Dos Santos Costa, Allan and Fazel-Zarandi, Maryam and Sercu, Tom and Candido, Salvatore and Rives, Alexander", journal = "Science", volume = 379, number = 6637, pages = "1123--1130", month = mar, year = 2023, language = "en", doi = {10.1101/2022.07.20.500902} }
Architecture Type: Pose Estimation
Network Architecture: ESMFold
Input Type(s): Protein Sequence
Input Format(s): String
Input Parameters: 1D
Other Properties Related to Input: Protein Sequence matching the regular expression ^[ARNDCQEGHILKMFPSTWYVXBOU]*$
upto 1024 characters
Output Type(s): Protrin Structure Pose(s)
Output Format: PDB (text file)
Output Parameters: 1D
Other Properties Related to Output: Pose
Runtime Engine(s):
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
Link:
UniRef50
** Data Collection Method by dataset
** Labeling Method by dataset
Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef50, September 2021 version, is used for the training of ESM models. The training dataset was partitioned by randomly selecting 0.5% (≈ 250,000) sequences to form the validation set. The training set has sequences removed via the procedure described
Dataset License(s): CC BY 4.0.
UniRef50
** Data Collection Method by dataset
** Labeling Method by dataset
Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef50, September 2021 version, is used for the training of ESM models. The training dataset was partitioned by randomly selecting 0.5% (≈ 250,000) sequences to form the validation set. The training set has sequences removed via the procedure described
Dataset License(s): CC BY 4.0.
Engine: Triton
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.