├── LICENSE ├── pLM.md ├── proteinsequencedesign.md └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Christian Dallago 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /pLM.md: -------------------------------------------------------------------------------- 1 | # Protein Language Models 2 | (sorted by number of parameters) 3 | | Name | Params | Paper | Code | Notes | 4 | | :-------- | ------- | --------- | ------- | --------- | 5 | | ESM2 | 8M - 15B | [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1)||| 6 | | ProGen2 | 151M - 6.4B | [arXiv](https://arxiv.org/abs/2206.13517) | [Code](https://github.com/salesforce/progen/tree/main/progen2) || 7 | | ProtTrans | 420M - 3B | [Paper](https://ieeexplore.ieee.org/document/9477085/) | [Code](https://github.com/agemagician/ProtTrans) |BFD+UniRef50| 8 | | ProteinLM | 200M, 3B | [arXiv](https://arxiv.org/abs/2108.07435) | [Code](https://github.com/THUDM/ProteinLM) || 9 | | RITA | 85M - 1.2B | [arXiv](https://arxiv.org/abs/2205.05789) | [Code](https://github.com/lightonai/RITA) || 10 | | ProGen1 | 1.2M | [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2) | [Code](https://github.com/salesforce/progen) || 11 | | ProtGPT2 | 738M | [Paper](https://www.nature.com/articles/s41467-022-32007-7) | [Code](https://huggingface.co/nferruz/ProtGPT2) || 12 | | Tranceptron | 700M | [Paper](https://proceedings.mlr.press/v162/notin22a.html) | [Code](https://github.com/OATML-Markslab/Tranception) || 13 | | ESM1 | 43M - 670M | [Paper](https://www.pnas.org/doi/10.1073/pnas.2016239118) | [Code](https://github.com/facebookresearch/esm) || 14 | | DistilProtBert | 230M |[bioRxiv](https://www.biorxiv.org/content/early/2022/05/10/2022.05.09.491157) | [Code](https://github.com/yarongef/DistilProtBert) || 15 | | DARK | 128M | [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.01.27.478087v1)||| 16 | | TAPE | 38M | [arXiv](https://arxiv.org/abs/1906.08230) | [Code](https://github.com/songlab-cal/tape) || 17 | | ProteinBERT | 16M | [Paper](https://doi.org/10.1093/bioinformatics/btac020) | [Code](https://github.com/nadavbra/protein_bert), [PyTorch](https://github.com/lucidrains/protein-bert-pytorch) |~106M proteins from UniRef90; 28 days over ~670M records (i.e. ~6.4 iterations)| 18 | | AminoBERT || [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.08.02.454840v1) ||| 19 | 20 | ## Special purpose pLM 21 | | Name | Params | Paper | Code | Notes | 22 | | :-------- | ------- | --------- | ------- | --------- | 23 | | PeTriBERT | 40M | [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.08.10.503344v1) | N/A | Optimized for protein design | 24 | 25 | ## Non-transformer-based sequence models 26 | | Name | Params | Paper | Code | Notes | 27 | | :-------- | ------- | --------- | ------- | --------- | 28 | | CARP | 600K - 640M | [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.05.19.492714v2) | [Code](https://github.com/microsoft/protein-sequence-models)| CNN | 29 | | SeqVec | 93M | [Paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3220-8) | [Code](https://github.com/mheinzinger/SeqVec)| bidirectional LSTM; UniRef50 | 30 | | UniRep | 90M | [Paper](https://www.nature.com/articles/s41592-019-0598-1)| [Code](https://github.com/churchlab/UniRep)| mLSTM | 31 | | ProSE | 24M | [Paper](https://www.sciencedirect.com/science/article/pii/S2405471221002039) | [Code](https://github.com/tbepler/prose) | LSTM | 32 | 33 | ## pLM specific to Antibody sequences 34 | | Name | Params | Paper | Code | Notes | 35 | | :-------- | ------- | --------- | ------- | --------- | 36 | | TCR-BERT | 100M | [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.11.18.469186v1) |[Code](https://github.com/wukevin/tcr-bert)|| 37 | | AntiBERTa | 86M | [Paper](https://www.sciencedirect.com/science/article/pii/S2666389922001052) | [Code](https://github.com/alchemab/antiberta) || 38 | | AntiBERTy | 26M | [arXiv](https://arxiv.org/abs/2112.07782) | [Code](https://pypi.org/project/antiberty) || 39 | | IgLM |1.5M, 13M| [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.12.13.472419v1.full) | [Code](https://github.com/Graylab/IgLM) || 40 | | Sapiens | 0.6M | [Paper](https://www.tandfonline.com/doi/full/10.1080/19420862.2021.2020203) | [Code](https://github.com/Merck/BioPhi) || 41 | | AbLang || [Paper](https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac046/6609807) | [Code](https://github.com/oxpig/AbLang) || 42 | 43 | # Building on pLMs 44 | - [protGPT2_gradioFold](https://huggingface.co/spaces/Gradio-Blocks/protGPT2_gradioFold) 45 | -------------------------------------------------------------------------------- /proteinsequencedesign.md: -------------------------------------------------------------------------------- 1 | 2 | 💡 **Notes** 3 | - The following lists are curated by humans, as such may be incomplete 4 | - We only include software targeting the inverse folding problem e.g given a structure predict the sequence that folds into it. This is also referred to as protein sequence design. Note, that most models here model P(sequence | structure). 5 | - We do not wish to advertize one tool over any other, but simply list the tools we are aware of in order of publication year of the model, preprint or publication (whichever is first) 6 | - Any suggestions for improvements and additions are welcome as issues or pull requests 7 | 8 | ⚡️ **Brought to you by:** 9 | - [@simonduerr](https://twitter.com/simonduerr) 10 | - [@ginaelnesr](https://twitter.com/ginaelnesr) 11 | 12 | 13 | # Protein Sequence Design / Inverse folding models 14 | 15 | (sorted by year of release) 16 | | Name | Release year |Architecture | Paper | Code | Notes |Experimental validation | 17 | | :-------- | ------- |--------- | --------- | ------- | --------- |---| 18 | | ABACUS-R| 2022 |Transformer|[Nat Comput Sci](https://www.nature.com/articles/s43588-022-00273-6)|[code](https://doi.org/10.24433/CO.3351944.v1)||✅| 19 | | ProteinMPNN | 2022 | MPNN |[biorxiv](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1)|[git](https://github.com/dauparas/proteinMPNN)|[webserver](https://hf.space/simonduerr/ProteinMPNN) |✅| 20 | | ProDESIGN-LE | 2022 | Transformer |[biorxiv](https://www.biorxiv.org/content/10.1101/2022.06.25.497605v4)|-| |-| 21 | | RaSP | 2022 | 3DCNN |[biorxiv](https://www.biorxiv.org/content/10.1101/2022.07.14.500157v2)|[git](https://github.com/KULL-Centre/papers/tree/main/2022/ML-ddG-Blaabjerg-et-al)| [colab](https://colab.research.google.com/github/KULL-Centre/papers/blob/main/2022/ML-ddG-Blaabjerg-et-al/RaSPLab.ipynb) |-| 22 | |MIF|2022| SGNN|[biorxiv](https://www.biorxiv.org/content/10.1101/2022.05.25.493516v1)|[git](https://github.com/microsoft/protein-sequence-models)||-| 23 | | ESM-IF1 | 2022 | Transformer |[biorxiv](https://www.biorxiv.org/content/10.1101/2022.04.10.487779v1)|[git](https://github.com/facebookresearch/esm)| |-| 24 | | Partlon et al. | 2022 | Transformer |[biorxiv](https://www.biorxiv.org/content/10.1101/2022.04.15.488492v1)|-| |-| 25 | | TIMED-* | 2022 | 3DCNN |[arxiv](https://arxiv.org/pdf/2109.07925.pdf)|[git](https://github.com/wells-wood-research/timed-design)| |not published yet| 26 | | GCNDesign | 2021 | GCN |[pdf](https://github.com/ShintaroMinami/GCNdesign/blob/master/documents/Method_Summary.pdf)|[git](https://github.com/ShintaroMinami/GCNdesign)| [colab](https://github.com/naokob/ColabGCNdesign) |-| 27 | | GX | 2021 | 3DCNN+GNN |[arxiv](https://arxiv.org/pdf/2109.07925.pdf)|[git](https://github.com/wells-wood-research/timed-design)| |not published yet| 28 | | Fold2Seq | 2021 | Transformer | [arxiv](https://arxiv.org/abs/2106.13058) | [git](https://github.com/IBM/fold2seq)| |-| 29 | | CNN_protein_landscape | 2021 | 3DCNN |[Journal of Biological Physics](https://link.springer.com/article/10.1007/s10867-021-09593-6#Abs1)|[git](https://github.com/akulikova64/CNN_protein_landscape)||-| 30 | | Orellana et al. | 2021 | GVP |[biorxiv](https://www.biorxiv.org/content/10.1101/2021.09.06.459171v3)|-||-| 31 | | Jing et al. | 2020 | GVP |[arxiv](https://arxiv.org/abs/2009.01411)|[git](https://github.com/drorlab/gvp-pytorch)|||-| 32 | | DenseCPD | 2020 | 3DCNN |[JCIM](https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00043)|-|[webserver](http://protein.org.cn/densecpd.html)|-| 33 | | ProDCoNN | 2020 | 3DCNN |[Proteins](https://onlinelibrary.wiley.com/doi/10.1002/prot.25868)|-||-| 34 | | ProteinSolver | 2020 | GNN |[Cell Systems](https://www.sciencedirect.com/science/article/pii/S2405471220303276)|[gitlab](https://gitlab.com/ostrokach/proteinsolver)|[webserver](http://design.ccbr.proteinsolver.org/)|✅| 35 | | MutCompute | 2020 | 3DCNN |[ACS SynBio](https://pubs.acs.org/doi/full/10.1021/acssynbio.0c00345)|-| [webserver](https://mutcompute.com)|✅| 36 | | Ingraham et al. | 2019 | Transformer |[NeurIPS Proceedings](https://papers.nips.cc/paper/2019/hash/f3a4ff4839c56a5f460c88cce3666a2b-Abstract.html)|[git](https://github.com/jingraham/neurips19-graph-protein-design)| |-| 37 | | 3DCNN | 2017 | 3DCNN |[BMC Bioinformatics](https://link.springer.com/article/10.1186/s12859-017-1702-0)|-||-| 38 | 39 | # Protein Sequence & Rotamer Design / Inverse folding models 40 | 41 | (sorted by year of release) 42 | | Name | Release year |Architecture | Paper | Code | Notes |Experimental validation | 43 | | :-------- | ------- |--------- | --------- | ------- | --------- | ---| 44 | | TIMED_Rotamer | 2022 | 3DCNN |-|[git](https://github.com/wells-wood-research/timed-design)| | not published yet | 45 | | SeqDes | 2021 | 3DCNN |[Nature Comm](https://www.nature.com/articles/s41467-022-28313-9) - [BioRxiv](https://www.biorxiv.org/content/10.1101/2020.01.06.895466v3)|[git](https://github.com/ProteinDesignLab/protein_seq_des)| |✅| 46 | 47 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 📖 **Table of contents** 2 | * [Predictors](#Predictors) 3 | * [Tools and Extensions](#Tools) 4 | * [Databases and Datasets](#Databases) 5 | * [Webservers](#Webservers) 6 | * [Discontinued](#Discontinued) 7 | 8 | 9 | 💡 **Notes** 10 | - The following lists are curated by humans, as such may be incomplete 11 | - We only include software targeting the folding problem combining learnings from AlphaFold 2 and protein language models. You may find other ML on protein tools [at Kevin's incredible ML for proteins list](https://github.com/yangkky/Machine-learning-for-proteins). 12 | - We do not wish to advertize one tool over any other, but simply list the tools we are aware of in either random or alphabetical order 13 | - Any suggestions for improvements and additions are welcome as issues or pull requests 14 | - Projects we identify as discontinued are marked with 💀 and in a section at the end 15 | 16 | ⚡️ **Brought to you by:** 17 | - [@sacdallago](https://twitter.com/sacdallago) 18 | - [@sokrypton](https://twitter.com/sokrypton) 19 | 20 | ---- 21 | 22 | 23 | ### Predictors 24 | [_in alphabetical order_] 25 | - **MSA-based** (uses Multiple Sequence Alignments (MSAs) as input) 26 | - AlphaFold2 27 | [![](https://img.shields.io/badge/repo-JAX-blue)](https://github.com/deepmind/alphafold) 28 | [![](https://img.shields.io/badge/DOI-10.1038%2Fs41586--021--03819--2-lightgrey)](https://www.nature.com/articles/s41586-021-03819-2) 29 | - The original AlphaFold 2 method 30 | - Features: monomer, multimer 31 | - Other: [Colab Notebook](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb) 32 | - ColabFold 33 | [![](https://img.shields.io/badge/repo-JAX-blue)](https://github.com/sokrypton/ColabFold) 34 | [![](https://img.shields.io/badge/DOI-10.1038%2Fs41592--022--01488--1-lightgrey)](https://www.nature.com/articles/s41592-022-01488-1) 35 | - Faster AF2 compiling and MSA generations 36 | - Features: monomer, multimer 37 | - Other: [localcolabfold](https://github.com/YoshitakaMo/localcolabfold) 38 | - FastFold 39 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/hpcaitech/FastFold) 40 | [![](https://img.shields.io/badge/arxiv-2203.00854-lightgrey)](https://arxiv.org/abs/2203.00854) 41 | - Runtime improvements to OpenFold (see below) 42 | - Features: monomer 43 | - HelixFold 44 | [![](https://img.shields.io/badge/repo-PaddlePaddle-pink)](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold) 45 | [![](https://img.shields.io/badge/arxiv-2207.05477-lightgrey)](https://arxiv.org/abs/2207.05477) 46 | - Reimplementation of AF2 in PaddlePaddle 47 | - Features: monomer 48 | - MEGA-Fold 49 | [![](https://img.shields.io/badge/repo-mindspore-green)](https://gitee.com/mindspore/mindscience/tree/master/MindSPONGE/applications/MEGAProtein) 50 | [![](https://img.shields.io/badge/arxiv-2206.12240-lightgrey)](https://arxiv.org/abs/2206.12240) 51 | - Reimplementation of AF2 in MindSpore; provides training code, training dataset and new model params. 52 | - Features: monomer 53 | - OpenFold 54 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/aqlaboratory/openfold) 55 | - Reimplementation of AF2 in PyTorch; provides training code, training dataset and new model params. 56 | - Features: monomer 57 | - Other: [Colab Notebook](https://colab.research.google.com/github/aqlaboratory/openfold/blob/main/notebooks/OpenFold.ipynb) 58 | - RoseTTAFold 59 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/RosettaCommons/RoseTTAFold) 60 | [![](https://img.shields.io/badge/DOI-10.1126%2Fscience.abj8754-lightgrey)](https://www.science.org/doi/10.1126/science.abj8754) 61 | - Reproduced AF2 in PyTorch before details of AF2 were available; new model parameters. 62 | - Features: monomer 63 | - Other: [Unofficial Colab Notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/RoseTTAFold.ipynb) 64 | - Uni-Fold 65 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/dptech-corp/Uni-Fold) 66 | [![](https://img.shields.io/badge/repo-JAX-blue)](https://github.com/dptech-corp/Uni-Fold-jax) 67 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.08.04.502811-lightgrey)](https://doi.org/10.1101/2022.08.04.502811) 68 | - Reimplementation of AF2 in PyTorch; provides training code and new (monomer/multimer) model parameters. 69 | - Features: monomer, multimer 70 | - Resources: [Colab Notebook](https://colab.research.google.com/github/dptech-corp/Uni-Fold/blob/main/notebooks/unifold.ipynb) 71 | 72 | - **pLM-based** (using embeddings from protein Language Models (pLMs) as input) 73 | - ESM-Fold 74 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.07.20.500902-lightgrey)](https://doi.org/10.1101/2022.07.20.500902) 75 | - Features: monomer 76 | - Other: [[tweet] Alex's announcement](https://twitter.com/alexrives/status/1550148755206414341) 77 | - EMBER3D 78 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](ttps://github.com/kWeissenow/EMBER3D) 79 | - Features: monomer 80 | - HelixFold-single 81 | [![](https://img.shields.io/badge/repo-PaddlePaddle-pink)](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single) 82 | [![](https://img.shields.io/badge/arxiv-2207.13921-lightgrey)](https://arxiv.org/abs/2207.13921) 83 | - Features: monomer 84 | - Resource: https://paddlehelix.baidu.com/app/drug/protein-single/forecast 85 | - IgFold 86 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/Graylab/IgFold) 87 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.04.20.488972-lightgrey)](https://doi.org/10.1101/2022.04.20.488972) 88 | - pLM focused on antibody sequences 89 | - Features: monomer 90 | - Other: [Colab Notebook](https://colab.research.google.com/github/Graylab/IgFold/blob/main/IgFold.ipynb) 91 | - OmegaFold 92 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/HeliXonProtein/OmegaFold) 93 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.07.21.500999-lightgrey)](https://doi.org/10.1101/2022.07.21.500999) 94 | - Features: monomer 95 | - Other: 96 | [Unofficial Colab Notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb), 97 | [[tweet] Martin comparing structures](https://twitter.com/thesteinegger/status/1554881669718573062), 98 | [[tweet] Sergey's positional encoding observation](https://twitter.com/sokrypton/status/1555536325176168448) 99 | 100 | 101 | ### Tools and Extensions 102 | - gget (AF2) 103 | [![](https://img.shields.io/badge/-repo-gray)](https://github.com/phbradley/alphafold_finetune) 104 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.05.17.492392-lightgrey)](https://doi.org/10.1101/2022.05.17.492392) 105 | - alphafold_finetune 106 | [![](https://img.shields.io/badge/-repo-gray)](https://github.com/pachterlab/gget#gget-alphafold-) 107 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.07.12.499365-lightgrey)](https://doi.org/10.1101/2022.07.12.499365) 108 | - finetune AlphaFold for Protein-Peptide prediction 109 | - Other: [[tweet] Amir's announcement](https://twitter.com/AMotmaen/status/1547435940011945984) 110 | - AlphaPulldown 111 | [![](https://img.shields.io/badge/-repo-gray)](https://www.embl-hamburg.de/AlphaPulldown/) 112 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.08.05.502961-lightgrey)](https://doi.org/10.1101/2022.08.05.502961) 113 | - protein-protein interaction screens using AlphaFold-Multimer 114 | - ColabDesign 115 | [![](https://img.shields.io/badge/-repo-gray)](https://github.com/sokrypton/ColabDesign) 116 | - Backprop through AlphaFold for protein design 117 | - AF2Rank 118 | [![](https://img.shields.io/badge/-repo-gray)](https://github.com/jproney/AF2Rank) 119 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.03.11.484043-lightgrey)](https://doi.org/10.1101/2022.03.11.484043) 120 | - Rank Decoy Structures/Sequences using AlphaFold 121 | - Resource: [Colab Notebook](https://colab.research.google.com/github/sokrypton/ColabDesign/blob/main/af/examples/AF2Rank.ipynb) 122 | 123 | ---- 124 | 125 | ### Databases of predictions 126 | - AlphaFold Database 127 | [![](https://img.shields.io/badge/DOI-10.1093%2Fnar%2Fgkab1061-lightgrey)](https://doi.org/10.1093/nar/gkab1061) 128 | - All sequences in UniRef90 - viral sequences; Based on AlphaFold 2 129 | - Resource: https://alphafold.ebi.ac.uk 130 | - Eukaryotic interactormes 131 | [![](https://img.shields.io/badge/DOI-10.1126%2Fscience.abm4805-lightgrey)](https://www.science.org/doi/10.1126/science.abm4805) 132 | - Protein-Protein interactions; Based on RoseTTAFold and AlphaFold 2 133 | - Resource: https://www.ebi.ac.uk/pdbe/news/predicted-complexes-modelarchive-now-pdbe-kb-pages 134 | - Structures of human-transcriptome isoforms 135 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.06.08.495354-lightgrey)](https://doi.org/10.1101/2022.06.08.495354) 136 | - Based on ColabFold (AlphaFold 2) 137 | - Resource: https://www.isoform.io 138 | - AlphaFill 139 | [![](https://img.shields.io/badge/DOI-10.1101%2F2021.11.26.470110-lightgrey)](https://doi.org/10.1101/2021.11.26.470110) 140 | - Enriching the AlphaFold models with ligands and co-factors (AlphaFold 2) 141 | - Resource: https://alphafill.eu/ 142 | - IgFold Database 143 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.04.20.488972-lightgrey)](https://doi.org/10.1101/2022.04.20.488972) 144 | - Predictions specific to antibody sequences; based on OAS dataset and IgFold 145 | - Resource: https://data.graylab.jhu.edu/igfold_oas_paired95.tar.gz 146 | 147 | 148 | ### Datasets for training 149 | - OpenFold 150 | - MSAs for 132K PDBs + 270K UniClust30 predictions for distilation 151 | - Resource: https://registry.opendata.aws/openfold/ 152 | - MindSpore 153 | - MSAs for 570K PDBs + 745K Distillation 154 | - Manuscript: https://arxiv.org/abs/2206.12240 155 | - Resource: http://ftp.cbi.pku.edu.cn/psp/ 156 | 157 | ---- 158 | 159 | 160 | ### Webservers 161 | - Lambda PredictProtein 162 | [![](https://img.shields.io/badge/DOI-10.1101%2F2022.08.04.502750-lightgrey)](https://doi.org/10.1101/2022.08.04.502750) 163 | - Based on ColabFold; Limited to sequences up to 500AAs 164 | - Resource: http://embed.predictprotein.org/ 165 | - Robetta 166 | - Based on RoseTTAFold 167 | - Resource: https://robetta.bakerlab.org/ 168 | 169 | ---- 170 | 171 | 172 | ### Discontinued 173 | 174 | - 💀 Moonbear 175 | - Resource: https://www.getmoonbear.com/ 176 | - Other: [[tweet] Stephanie's announcement](https://twitter.com/stephanieszhang/status/1427773598199164937) 177 | - 💀 Lucidrains' AlphaFold2 178 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/lucidrains/alphafold2) 179 | - AF2 reproduction attempt 180 | - Features: monomer 181 | - 💀 Lupoglaz's OpenFold2 182 | [![](https://img.shields.io/badge/repo-PyTorch-yellowgreen)](https://github.com/lupoglaz/OpenFold2) 183 | - AF2 reproduction attempt 184 | - Features: monomer 185 | --------------------------------------------------------------------------------