## Model Overview ### Description: The MSA search NIM is powered by GPU MMSeqs2. GPU MMSeqs2 is a GPU-accelerated toolkit for protein database search and Multiple Sequence Alignment (MSA). While not a deep learning model, MMSeqs2 does require large protein databases for sequence similarity search.
This NIM is ready for commercial use.
### Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. ColabFold was developed by the authors of Mirdita *et al*. 2022. GPU MMSeqs2 was developed by the authors of Kallenborn *et al*. 2025. #### License / Terms of Use GOVERNING TERMS: The trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). **You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.** ### Deployment Geography Global ### Use Case The MSA Search NIM enables researchers and commercial entities in the Drug Discovery, Life Sciences, and Digital Biology fields to rapidly generate multiple sequence alignments (MSA). The output MSA can be used in downstream protein structure prediction and evolutionary analysis applications. ### Release Date Build.nvidia.com March 16, 2025 via [build.nvidia.com/colabfold/msa-search](build.nvidia.com/colabfold/msa-search) NGC March 16, 2025 ### References: ``` @ARTICLE{jumper2021alphafold, title = "Highly accurate protein structure prediction with {AlphaFold}", author = "Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v Z}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis", journal = "Nature", volume = 596, number = 7873, pages = "583--589", month = aug, year = 2021, language = "en", doi = {10.1038/s41586-021-03819-2}, } ``` ``` @ARTICLE{mirdita2022colabfold, title = "ColabFold: making protein folding accessible to all", author = "Mirdita, Milot and Sch{\"u}tze, Konstantin and Moriwaki, Yoshitaka and Heo, Lim and Ovchinnikov, Sergey and Steinegger, Martin", journal = "Nature Methods", volume = 19, number = 6, pages = "679--682", month = jun, year = 2022, language = "en", doi = {10.1038/s41592-022-01488-1}, } ``` ``` @ARTICLE{kallenborn2025gpu, title = "GPU-accelerated homology search with MMseqs2", author = "Kallenborn, Felix and Chacon, Alejandro and Hundt, Christian and Sirelkhatim, Hassan and Didi, Kieran and Cha, Sooyoung and Dallago, Christian and Mirdita, Milot and Schmidt, Bertil and Steinegger, Martin", journal = "bioRxiv", year = 2025, month = jan, day = 20, language = "en", doi = {10.1101/2024.11.13.623350}, } ```
### Model Architecture: **Architecture Type:** Not Applicable
**Network Architecture:** Not Applicable
### Input: **Input Type(s):** Protein Sequence, Databases
**Input Format(s):** String (less than or equal to 4096 characters), Constrained List of Strings (one or more valid database names)
**Input Parameters:** String: 1D; Constrained List of Strings: 1D
**Other Properties Related to Input:** NA
### Output: **Output Type(s):** Multiple Sequence Alignment in A3M or FASTA format
**Output Format:** A3M or FASTA (text file)
**Output Parameters:** 1D
**Other Properties Related to Output:** N/A
### Software Integration: **Runtime Engine(s):** * Python, C++, CUDA
**Supported Hardware Microarchitecture Compatibility:**
* NVIDIA Ampere, NVIDIA Hopper, NVIDIA Ada Lovelace
**[Preferred/Supported] Operating System(s):**
* [Linux]
### Model Version(s): MMSeqs2 GPU 17-b804f
Uniref30_2302
colabfold_envdb_202108
PDB70_220313
## Training & Evaluation: Not Applicable. ### Training Dataset: **Link:** Not Applicable.
** Data Collection Method by dataset
* [Not Applicable]
** Labeling Method by dataset
* [Not Applicable]
**Properties:** Not Applicable. ### Evaluation Dataset: **Link:** Not Applicable.
** Data Collection Method by dataset
* [Not Applicable]
** Labeling Method by dataset
* [Not Applicable]
**Properties:** Not Applicable
### Inference: **Engine:** Python, C++, CUDA
**Test Hardware:**
* NVIDIA A6000
* NVIDIA A100
* NVIDIA L40
* NVIDIA H100
### Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.**