├── .github
    └── ISSUE_TEMPLATE
    │   └── question.md
├── LICENSE
├── README.md
├── README_ja.md
├── beta_install_colabbatch_linux.sh
├── beta_update_linux.sh
├── install_colabbatch_M1mac.sh
├── install_colabbatch_intelmac.sh
├── install_colabbatch_linux.sh
├── update_M1mac.sh
├── update_intelmac.sh
├── update_linux.sh
└── v1.0.0
    ├── README.md
    ├── README_ja.md
    ├── colabfold_alphafold.patch
    ├── gpurelaxation.patch
    ├── install_colabfold_M1mac.sh
    ├── install_colabfold_intelmac.sh
    ├── install_colabfold_linux.sh
    ├── residue_constants.patch
    ├── runner.py
    ├── runner_af2advanced.py
    └── runner_af2advanced_old.py


/.github/ISSUE_TEMPLATE/question.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Question
 3 | about: Question template
 4 | title: 'Question:'
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Caution: Please only report your issue related to the installation on your local PC or macOS.** If you can get the help message by `colabfold_batch --help` or run a test prediction successfully, your installation is successful. Requests or questions regarding ColabFold features should be directed to [ColabFold repo's issues](https://github.com/sokrypton/ColabFold/issues).
11 | 
12 | ----
13 | 
14 | **What is your installation issue?**
15 | 
16 | Describe your question here.
17 | 
18 | **Computational environment**
19 | 
20 |  - OS: [e.g. Ubuntu 22.04, Windows10 & WSL2, macOS...]
21 |  - CUDA version if Linux (Show the output of `/usr/local/cuda/bin/nvcc --version`.)
22 | 
23 | **To Reproduce**
24 | 
25 | Steps to reproduce the behavior:
26 | 1. Go to '...'
27 | 2. Click on '....'
28 | 3. Scroll down to '....'
29 | 4. See error
30 | 
31 | **Expected behavior**
32 | 
33 | A clear and concise description of what you expected to happen.
34 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Yoshitaka Moriwaki
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # LocalColabFold
  2 | 
  3 | [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) on your local PC (or macOS). See also [ColabFold repository](https://github.com/sokrypton/ColabFold).
  4 | 
  5 | ## What is LocalColabFold?
  6 | 
  7 | LocalColabFold is an installer script designed to make ColabFold functionality available on users' local machines. It supports wide range of operating systems, such as Windows 10 or later (using Windows Subsystem for Linux 2), macOS, and Linux.
  8 | 
  9 | **If you only intend to predict a small number of naturally occurring proteins, I recommend using [ColabFold notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) or downloading structures from the [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/) or [UniProt](https://www.uniprot.org/). LocalColabFold is suitable for more advanced applications, such as batch processing of structure predictions for natural complexes, non-natural proteins, or predictions with manually specified MSAs/templates.**
 10 | 
 11 | ## Advantages of LocalColabFold
 12 | 
 13 | - **Structure inference and relaxation will be accelerated if your PC has Nvidia GPU and CUDA drivers.**
 14 | - **No Time out (90 minutes and 12 hours)**
 15 | - **No GPU limitations**
 16 | - **NOT necessary to prepare the large database required for native AlphaFold2**.
 17 | 
 18 | ## Note (May 21, 2024)
 19 | 
 20 | - Since current GPU-supported jax > 0.4.26 requires CUDA 12.1 or later and cudnn 9, please upgrade or install your CUDA driver and cudnn. CUDA 12.4 is recommended.
 21 | 
 22 | ## Note (Jan 30, 2024)
 23 | 
 24 | - ColabFold now upgrade to 1.5.5 (compatible with AlphaFold 2.3.2). Now LocalColabFold requires **CUDA 12.1 or later**. Please update your CUDA driver if you have not done so.
 25 | - Now (Local)ColabFold can predict protein structures without connecting the Internet. Use [`setup_databases.sh`](https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh) script to download and build the databases (See also [ColabFold Downloads](https://colabfold.mmseqs.com/)). An instruction to run `colabfold_search` to obtain the MSA and templates locally is written in [this comment](https://github.com/sokrypton/ColabFold/issues/563).
 26 | 
 27 | ## New Updates
 28 | 
 29 | - 30Jan2024, ColabFold 1.5.5 (Compatible with AlphaFold 2.3.2). Now LocalColabFold requires **CUDA 12.1 or later**. Please update your CUDA driver.
 30 | - 30Apr2023, Updated to use python 3.10 for compatibility with Google Colaboratory.
 31 | - 09Mar2023, version 1.5.1 released. The base directory has been changed to `localcolabfold` from `colabfold_batch` to distinguish it from the execution command.
 32 | - 09Mar2023, version 1.5.0 released. See [Release v1.5.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.5.0)
 33 | - 05Feb2023, version 1.5.0-pre released.
 34 | - 16Jun2022, version 1.4.0 released. See [Release v1.4.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.4.0)
 35 | - 07May2022, **Updated `update_linux.sh`.** See also [How to update](#how-to-update). Please use a new option `--use-gpu-relax` if GPU relaxation is required (recommended).
 36 | - 12Apr2022, version 1.3.0 released. See [Release v1.3.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.3.0)
 37 | - 09Dec2021, version 1.2.0-beta released. easy-to-use updater scripts added. See [How to update](#how-to-update).
 38 | - 04Dec2021, LocalColabFold is now compatible with the latest [pip installable ColabFold](https://github.com/sokrypton/ColabFold#running-locally). In this repository, I will provide a script to install ColabFold with some external parameter files to perform relaxation with AMBER. The weight parameters of AlphaFold and AlphaFold-Multimer will be downloaded automatically at your first run.
 39 | 
 40 | ## Installation
 41 | 
 42 | ### For Linux
 43 | 
 44 | 1. Make sure `curl`, `git`, and `wget` commands are already installed on your PC. If not present, you need install them at first. For Ubuntu, type `sudo apt -y install curl git wget`.
 45 | 2. Make sure your Cuda compiler driver is **11.8 or later** (the latest version 12.4 is preferable). If you don't have a GPU or don't plan to use a GPU, you can skip this step :<pre>$ nvcc --version
 46 | nvcc: NVIDIA (R) Cuda compiler driver
 47 | Copyright (c) 2005-2022 NVIDIA Corporation
 48 | Built on Wed_Sep_21_10:33:58_PDT_2022
 49 | Cuda compilation tools, release 11.8, V11.8.89
 50 | Build cuda_11.8.r11.8/compiler.31833905_0
 51 | </pre>DO NOT use `nvidia-smi` to check the version.<br>See [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) if you haven't installed it.
 52 | 3. Make sure your GNU compiler version is **12.0 or later** because `GLIBCXX_3.4.30` is required for openmm 8.0.0 for `--amber` relaxation.
 53 | If the version is old (e.g. CentOS 7, Rocky/Almalinux 8, etc.), install a new one and add `PATH` to it.
 54 | 4. Download `install_colabbatch_linux.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabbatch_linux.sh</pre>About 5 minutes later, `localcolabfold` directory will be created. Do not move this directory after the installation.
 55 | 
 56 |     Keep the network unblocked. And **check the log** output to see if there are any errors.
 57 | 
 58 |     If you find errors in the output log, the easiest way is to check the network and delete the localcolabfold directory, then re-run the installation script.
 59 | 
 60 | 5. Add environment variable PATH:<pre># For bash or zsh<br># e.g. export PATH="/home/moriwaki/Desktop/localcolabfold/colabfold-conda/bin:\$PATH"<br>export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:\$PATH"</pre>
 61 | It is recommended to add this export command to `~/.bashrc` and restart bash (`~/.bashrc` will be executed every time bash is started)
 62 | 
 63 | 6. To run the prediction, type <pre>colabfold_batch input outputdir/</pre>The result files will be created in the `outputdir`. This command will execute the prediction without templates and relaxation (energy minimization). If you want to use templates and relaxation, add `--templates` and `--amber` flags, respectively. For example,
 64 | 
 65 |     <pre>colabfold_batch --templates --amber input outputdir/</pre>
 66 | 
 67 |     `colabfold_batch` will automatically detect whether the prediction is for monomeric or complex prediction. In most cases, users don't have to add `--model-type alphafold2_multimer_v3` to turn on multimer prediction. `alphafold2_multimer_v1, alphafold2_multimer_v2` are also available. Default is `auto` (use `alphafold2_ptm` for monomers and `alphafold2_multimer_v3` for complexes.)
 68 | 
 69 | If you have some errors on `--amber` relaxation, adding `export LD_LIBRARY_PATH=“/path/to/your/localcolabfold/colabfold-conda/lib:${LD_LIBRARY_PATH}”` may solve this issue before running `colabfold_batch`. 
 70 | For more details, see [Flags](#flags) and `colabfold_batch --help`.
 71 | 
 72 | ### For WSL2 (in Windows)
 73 | 
 74 | **Caution: If your installation fails due to symbolic link (`symlink`) creation issues, this is due to the Windows file system being case-insensitive (while the Linux file system is case-sensitive).** To resolve this, run the following command on Windows Powershell:
 75 | ```
 76 | fsutil file SetCaseSensitiveInfo path\to\localcolabfold\installation enable
 77 | ```
 78 | 
 79 | Replace `path\to\colabfold\installation` with the path to the directory where you are installing LocalColabFold. Also, make sure that you are running the command on Windows Powershell (not WSL). For more details, see [Adjust Case Sensitivty (Microsoft)](https://learn.microsoft.com/en-us/windows/wsl/case-sensitivity).
 80 | 
 81 | Before running the prediction:
 82 | 
 83 | ```
 84 | export TF_FORCE_UNIFIED_MEMORY="1"
 85 | export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0"
 86 | export XLA_PYTHON_CLIENT_ALLOCATOR="platform"
 87 | export TF_FORCE_GPU_ALLOW_GROWTH="true"
 88 | ```
 89 | 
 90 | It is recommended to add these export commands to `~/.bashrc` and restart bash (`~/.bashrc` will be executed every time bash is started)
 91 | 
 92 | ### For macOS
 93 | 
 94 | **Caution: Due to the lack of Nvidia GPU/CUDA driver, the structure prediction on macOS are 5-10 times slower than on Linux+GPU**. For the test sequence (58 a.a.), it may take 30 minutes. However, it may be useful to play with it before preparing Linux+GPU environment.
 95 | 
 96 | You can check whether your Mac is Intel or Apple Silicon by typing `uname -m` on Terminal.
 97 | 
 98 | ```bash
 99 | $ uname -m
100 | x86_64 # Intel
101 | arm64  # Apple Silicon
102 | ```
103 | 
104 | Please use the correct installer for your Mac.
105 | 
106 | #### For Mac with Intel CPU
107 | 
108 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
109 | 2. Install `wget`, `gnu-sed`, [HH-suite](https://github.com/soedinglab/hh-suite) and [kalign](https://github.com/TimoLassmann/kalign) using Homebrew:<pre>$ brew install wget gnu-sed<br>\$ brew install brewsci/bio/hh-suite brewsci/bio/kalign</pre>
110 | 3. Download `install_colabbatch_intelmac.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_intelmac.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabbatch_intelmac.sh</pre>About 5 minutes later, `colabfold_batch` directory will be created. Do not move this directory after the installation.
111 | 4. The rest procedure is the same as "For Linux".
112 | 
113 | #### For Mac with Apple Silicon (M1 chip)
114 | 
115 | **Note: This installer is experimental because most of the dependent packages are not fully tested on Apple Silicon Mac.**
116 | 
117 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
118 | 2. Install several commands using Homebrew (Now kalign 3.3.2 is available!):<pre>$ brew install wget cmake gnu-sed<br>$ brew install brewsci/bio/hh-suite<br>$ brew install brewsci/bio/kalign</pre>
119 | 3. Install `miniforge` command using Homebrew:<pre>$ brew install --cask miniforge</pre>
120 | 4. Download `install_colabbatch_M1mac.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabbatch_M1mac.sh</pre>About 5 minutes later, `colabfold_batch` directory will be created. Do not move this directory after the installation. **You can ignore the installation errors that appear along the way**.
121 | 5. The rest procedure is the same as "For Linux".
122 | 
123 | ### Input Examples
124 | 
125 | ColabFold can accept multiple file formats or directory.
126 | 
127 | ```
128 | positional arguments:
129 |   input                 Can be one of the following: Directory with fasta/a3m
130 |                         files, a csv/tsv file, a fasta file or an a3m file
131 |   results               Directory to write the results to
132 | ```
133 | 
134 | #### fasta format
135 | 
136 | It is recommended that the header line starting with `>` be short since the description will be the prefix of the output file. It is acceptable to insert line breaks in the amino acid sequence.
137 | 
138 | ```:P61823.fasta
139 | >sp|P61823
140 | MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
141 | LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
142 | CAYKTTQANKHIIVACEGNPYVPVHFDASV
143 | ```
144 | 
145 | **For prediction of multimers, insert `:` between the protein sequences.**
146 | 
147 | ```
148 | >1BJP_homohexamer
149 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
150 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
151 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
152 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
153 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
154 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR
155 | ```
156 | 
157 | ```
158 | >3KUD_RasRaf_complex
159 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ
160 | YMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIP
161 | YIETSAKTRQGVEDAFYTLVREIRQH:
162 | PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAAS
163 | LIGEELQVDFL
164 | ```
165 | 
166 | Multiple `>` header lines with sequences in a FASTA format file yield multiple predictions at once in the specified output directory.
167 | 
168 | #### csv format
169 | 
170 | In a csv format, `id` and `sequence` should be separated by `,`.
171 | 
172 | ```:test.csv
173 | id,sequence
174 | 5AWL_1,YYDPETGTWY
175 | 3G5O_A_3G5O_B,MRILPISTIKGKLNEFVDAVSSTQDQITITKNGAPAAVLVGADEWESLQETLYWLAQPGIRESIAEADADIASGRTYGEDEIRAEFGVPRRPH:MPYTVRFTTTARRDLHKLPPRILAAVVEFAFGDLSREPLRVGKPLRRELAGTFSARRGTYRLLYRIDDEHTTVVILRVDHRADIYRR
176 | ```
177 | 
178 | #### a3m format
179 | 
180 | You can input your a3m format MSA file. For multimer predictions, the a3m file should be compatible with colabfold format.
181 | 
182 | ### Flags
183 | 
184 | These flags are useful for the predictions.
185 | 
186 | - **`--amber`** : Use amber for structure refinement (relaxation / energy minimization). To control number of top ranked structures are relaxed set `--num-relax`.
187 | - **`--templates`** : Use templates from pdb.
188 | - **`--use-gpu-relax`** : Run amber on NVidia GPU instead of CPU. This feature is only available on a machine with Nvidia GPUs.
189 | - **`--num-recycle <int>`** : Number of prediction recycles. Increasing recycles can improve the quality but slows down the prediction. Default is `3`. (e.g. `--num-recycle 10`)
190 | - `--custom-template-path <directory>` : Restrict template files used for `--template` to only those contained in the specified directory. This flag enables us to use non-public pdb files for the prediction. See also https://github.com/sokrypton/ColabFold/issues/177 .
191 | - `--random-seed <int>` **Changing the seed for the random number generator can result in different structure predictions.** (e.g. `--random-seed 42`)
192 | - `--num-seeds <int>` Number of seeds to try. Will iterate from range(random_seed, random_seed+num_seeds). (e.g. `--num-seed 5`)
193 | - `--max-msa` : Defines: `max-seq:max-extra-seq` number of sequences to use (e.g. `--max-msa 512:1024`). `--max-seq` and `--max-extra-seq` arguments are also available if you want to specify separately. This is a reimplementation of the paper of [Sampling alternative conformational states of transporters and receptors with AlphaFold2](https://elifesciences.org/articles/75751) demonstrated by del Alamo *et al*.
194 | - `--use-dropout` : activate dropouts during inference to sample from uncertainity of the models.
195 | - `--overwrite-existing-results` : Overwrite the result files.
196 | - For more information, `colabfold_batch --help`.
197 | 
198 | ## How to update
199 | 
200 | Since [ColabFold](https://github.com/sokrypton/ColabFold) is still a work in progress, your localcolabfold should be also updated frequently to use the latest features. An easy-to-use update script is provided for this purpose.
201 | 
202 | To update your localcolabfold, simply execute the following:
203 | 
204 | ```bash
205 | # set your OS. Select one of the following variables {linux,intelmac,M1mac}
206 | $ OS=linux # if Linux
207 | # navigate to the directory where you installed localcolabfold, e.g.
208 | $ cd /home/moriwaki/Desktop/localcolabfold/
209 | # get the latest updater
210 | $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_${OS}.sh -O update_${OS}.sh
211 | $ chmod +x update_${OS}.sh
212 | # execute it.
213 | $ ./update_${OS}.sh .
214 | ```
215 | 
216 | ## FAQ
217 | - What else do I need to do before installation? Do I need sudo privileges?
218 |   - No, except for installation of `curl` and `wget` commands.
219 | - Do I need to prepare the large database such as PDB70, BFD, Uniclust30, MGnify?
220 |   - **No. it is not necessary.** Generation of MSA is performed by the MMseqs2 web server, just as implemented in ColabFold.
221 | - Are the pLDDT score and PAE figures available?
222 |   - Yes, they will be generated just like the ColabFold.
223 | - Is it possible to predict homooligomers and complexes?
224 |   - Yes, the format of input sequence is the same as ColabFold. See `query_sequence:` and its use of [ColabFold: AlphaFold2 using MMseqs2](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb).
225 | - Is it possible to create MSA by jackhmmer?
226 |   - **No, it is not currently supported**.
227 | - I want to use multiple GPUs to perform the prediction.
228 |   - **AlphaFold and ColabFold does not support multiple GPUs**. Only One GPU can model your protein.
229 | - I have multiple GPUs. Can I specify to run LocalColabfold on each GPU?
230 |   - Use `CUDA_VISIBLE_DEVICES` environment variable. See https://github.com/YoshitakaMo/localcolabfold/issues/200.
231 | - I got an error message `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`.
232 |   - You may not have updated to CUDA 11.8 or later. Please check the version of Cuda compiler with `nvcc --version` command, not `nvidia-smi`.
233 | - Is this available on Windows 10?
234 |   - You can run LocalColabFold on your Windows 10 with [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10).
235 | - (New!)I want to use a custom MSA file in the format of a3m.
236 |   - **ColabFold can accept various input files now**. See the help messsage. You can set your own A3M file, a fasta file that contains multiple sequences (in FASTA format), or a directory that contains multiple fasta files.
237 | 
238 | 
239 | ## Tutorials & Presentations
240 | 
241 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI).
242 | 
243 | ## Acknowledgments
244 | 
245 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).
246 | 
247 | ## How do I reference this work?
248 | 
249 | - Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. <br />
250 |   *Nature Methods* (2022) doi: [10.1038/s41592-022-01488-1](https://www.nature.com/articles/s41592-022-01488-1)
251 | - If you’re using **AlphaFold**, please also cite: <br />
252 |   Jumper et al. "Highly accurate protein structure prediction with AlphaFold." <br />
253 |   *Nature* (2021) doi: [10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2)
254 | - If you’re using **AlphaFold-multimer**, please also cite: <br />
255 |   Evans et al. "Protein complex prediction with AlphaFold-Multimer." <br />
256 |   *BioRxiv* (2022) doi: [10.1101/2021.10.04.463034v2](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2)
257 | 


--------------------------------------------------------------------------------
/README_ja.md:
--------------------------------------------------------------------------------
  1 | # LocalColabFold
  2 | 
  3 | 個人用パソコンまたはmacOSで動かす[ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb)。
  4 | 
  5 | ## アップデート情報
  6 | 
  7 | - 2024年1月30日, ColabFold 1.5.5 (Compatible with AlphaFold 2.3.2). **CUDA 11.8 or later**が必要となりました。
  8 | - 2023年2月5日, version 1.5.0-preリリース。
  9 | - 2022年6月18日, version 1.4.0 リリース。[Release v1.4.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.4.0)
 10 | - 2021年12月9日, β版。簡単に使えるアップデートスクリプトを追加。[アップデートのやり方](#アップデートのやり方)を参照。
 11 | - 2021年12月4日, LocalColabFoldは最新版の[pipでインストール可能なColabFold](https://github.com/sokrypton/ColabFold#running-locally)に対応しました。このリポジトリではrelax（構造最適化）処理を行うために必要な他のパラメータファイルとともにColabFoldをインストールするためのスクリプトを提供しています。AlphaFoldとAlphaFold-Multimerの重みパラメータは初回の実行時に自動的にダウンロードされます。
 12 | 
 13 | ## インストール方法
 14 | 
 15 | ### Linux+GPUの場合
 16 | 
 17 | 1. ターミナル上で`curl`, `git`と`wget`コマンドがすでにインストールされていることを確認します。存在しない場合は先にこれらをインストールしてください。Ubuntuの場合はtype `sudo apt -y install curl git wget`でインストールできます。
 18 | 2. **Cuda compilerのバージョンが11.8以降であることを確認します。**<pre>$ nvcc --version
 19 | nvcc: NVIDIA (R) Cuda compiler driver
 20 | Copyright (c) 2005-2022 NVIDIA Corporation
 21 | Built on Wed_Sep_21_10:33:58_PDT_2022
 22 | Cuda compilation tools, release 11.8, V11.8.89
 23 | Build cuda_11.8.r11.8/compiler.31833905_0
 24 | </pre>バージョンチェックの時に`nvidia-smi`コマンドを使わないでください。こちらでは不正確です。<br>まだCUDA Compilerをインストールしていない場合は、[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)を参照してください。
 25 | 3. **GNU compilerのバージョンが4.9以降であることを確認します。** 動作上、`GLIBCXX_3.4.20`が必要になるためです。<pre>$ gcc --version
 26 | gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
 27 | Copyright (C) 2019 Free Software Foundation, Inc.
 28 | This is free software; see the source for copying conditions.  There is NO
 29 | warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 30 | </pre>もしバージョンが4.8.5以前の場合は(CentOS 7だとよくありがち)、新しいGCCをインストールしてそれにPATHを通してください。スパコンの場合はEnvironment moduleの`module avail`の中にあるかもしれません。
 31 | 1. このリポジトリにある`install_colabbatch_linux.sh`をダウンロードします。<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh</pre>これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabbatch_linux.sh</pre>およそ5分後に`colabfold_batch`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。
 32 | 2. `cd colabfold_batch`を入力してこのディレクトリに入ります。
 33 | 3. 環境変数`PATH`を追加します。<pre># For bash or zsh<br># e.g. export PATH="/home/moriwaki/Desktop/colabfold_batch/bin:\$PATH"<br>export PATH="<COLABFOLDBATCH_DIR>/bin:\$PATH"</pre>この1行を`~/.bashrc`または`~/.zshrc`に追記しておくと便利です。
 34 | 4. 以下のコマンドでColabFoldを実行します。<pre>colabfold_batch --amber --templates --num-recycle 3 inputfile outputdir/ </pre>結果のファイルは`outputdir`に生成されます. 詳細な使い方は`colabfold_batch --help`コマンドで確認してください。
 35 | 
 36 | ### macOSの場合
 37 | 
 38 | **注意: macOSではNvidia GPUとCUDAドライバがないため、構造推論部分がLinux+GPU環境に比べて5〜10倍ほど遅くなります**。テスト用のアミノ酸配列(58アミノ酸)ではおよそ30分ほど計算に時間がかかります。ただ、Linux+GPU環境を準備する前にこれで遊んでみるのはありかもしれません。
 39 | 
 40 | また、自身の持っているMacがIntel CPUのものか、M1 chip入りのもの（Apple Silicon）かを先に確認してください。ターミナルで`uname -m`の結果でどちらかが判明します。
 41 | 
 42 | ```bash
 43 | $ uname -m
 44 | x86_64 # Intel
 45 | arm64  # Apple Silicon
 46 | ```
 47 | 
 48 | （Apple SiliconでRosetta2を使っている場合はApple Siliconでもx86_64って表示されますけれど……今のところこれには対応していません。）
 49 | 
 50 | 以上の結果を踏まえて適切なインストーラーを選択してください。
 51 | 
 52 | #### Intel CPUのMacの場合
 53 | 
 54 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 55 | 2. Homebrewで`wget`, `gnu-sed`, [HH-suite](https://github.com/soedinglab/hh-suite)と[kalign](https://github.com/TimoLassmann/kalign)をインストールします<pre>$ brew install wget gnu-sed<br>\$ brew install brewsci/bio/hh-suite brewsci/bio/kalign</pre>
 56 | 3. `install_colabbatch_intelmac.sh`をこのリポジトリからダウンロードします:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_intelmac.sh</pre>これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabbatch_intelmac.sh</pre>およそ5分後に`colabfold_batch`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。
 57 | 4. 残りの手順は"Linux+GPUの場合"と同様です.
 58 | 
 59 | #### Apple Silicon (M1 chip)のMacの場合
 60 | 
 61 | **Note: 依存するPythonパッケージのほとんどがまだApple Silicon Macで十分にテストされていないため、このインストーラーによる動作は試験的なものです。**
 62 | 
 63 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 64 | 1. いくつかのコマンドをHomebrewでインストールします。(現在kalignはM1 macでインストールすることはできないみたいですが、問題ありません):<pre>$ brew install wget cmake gnu-sed<br>$ brew install brewsci/bio/hh-suite</pre>
 65 | 2. `miniforge`をHomebrewでインストールします:<pre>$ brew install --cask miniforge</pre>
 66 | 3. インストーラー`install_colabbatch_M1mac.sh`をこのリポジトリからダウンロードします:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.sh</pre> これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabbatch_M1mac.sh</pre>およそ5分後に`colabfold_batch`ディレクトリができます。途中色々WarningsやErrorが出るかもしれませんが大丈夫です。インストール後はこのディレクトリを移動させないでください。
 67 | 4. 残りの手順は"Linux+GPUの場合"と同様です.
 68 | 
 69 | ## アップデートのやり方
 70 | 
 71 | [ColabFold](https://github.com/sokrypton/ColabFold)はいまだ開発途中であるため、最新の機能を利用するためにはこのlocalcolabfoldも頻繁にアップデートする必要があります。そこでお手軽にアップデートするためのスクリプトを用意しました。
 72 | 
 73 | アップデートは`localcolabfold`ディレクトリで以下のように入力するだけです。
 74 | 
 75 | ```bash
 76 | $ ./update_linux.sh . # if Linux
 77 | $ ./update_intelmac.sh . # if Intel Mac
 78 | $ ./update_M1mac.sh . # if M1 Mac
 79 | ```
 80 | 
 81 | また、もしすでに1.2.0-beta以前からlocalcolabfoldをインストールしていた場合は、まずこれらのアップデートスクリプトをダウンロードしてきてから実行してください。例として以下のような感じです。
 82 | 
 83 | ```bash
 84 | # set your OS. Select one of the following variables {linux,intelmac,M1mac}
 85 | $ OS=linux # if Linux
 86 | # navigate to the directory where you installed localcolabfold, e.g.
 87 | $ cd /home/moriwaki/Desktop/localcolabfold/
 88 | $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_${OS}.sh
 89 | $ chmod +x update_${OS}.sh
 90 | $ ./update_${OS}.sh /path/to/your/localcolabfold
 91 | ```
 92 | 
 93 | ## LocalColabFoldを利用する利点
 94 | 
 95 | - **お使いのパソコンにNvidia GPUとCUDAドライバがあれば、AlphaFold2による構造推論(Structure inference)と構造最適化(relax)が高速になります。**
 96 | - **Google Colabは90分アイドルにしていたり、12時間以上の利用でタイムアウトしますが、その制限がありません。また、GPUの使用についても当然制限がありません。**
 97 | - **データベースをダウンロードしてくる必要がないです**。
 98 | 
 99 | ## FAQ
100 | - インストールの事前準備は？
101 |   - `curl`, `wget`コマンド以外は不要です
102 | - BFD, Mgnify, PDB70, Uniclust30などの巨大なデータベースを用意する必要はありますか？
103 |   - **必要ないです**。
104 | - AlphaFold2の最初の動作に必要なMSA作成はどのように行っていますか？
105 |   - MSA作成はColabFoldと同様にMMseqs2のウェブサーバーによって行われています。
106 | - ColabFoldで表示されるようなpLDDTスコアやPAEの図も生成されますか？
107 |   - はい、生成されます。
108 | - ホモ多量体予測、複合体予測も可能ですか？
109 |   - はい、可能です。配列の入力方法は[ColabFold: AlphaFold2 using MMseqs2](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb)のやり方と同じです。
110 | - jackhmmerによるMSA作成は可能ですか？
111 |   - **現在のところ対応していません**。
112 | - 複数のGPUを利用して計算を行いたい。
113 |   - **AlphaFold, ColabFoldは複数GPUを利用した構造予測はできないようです**。1つのGPUでしか計算できません。
114 | - 長いアミノ酸を予測しようとしたときに`ResourceExhausted`というエラーが発生するのを解決したい。
115 |   - 上と同じissueを読んでください。
116 | - `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`というエラーメッセージが出る
117 |   - CUDA 11.1以降にアップデートされていない可能性があります。`nvcc --version`コマンドでCuda compilerのバージョンを確認してみてください。
118 | - Windows 10の上でも利用することはできますか？
119 |   - [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10)を入れればWindows 10の上でも同様に動作させることができます。
120 | - (New!) 自作したA3Mファイルを利用して構造予測を行いたい。
121 |   - **現在ColabFoldはFASTAファイル以外にも様々な入力を受け取ることが可能です**。詳細な使い方はヘルプメッセージを読んでください。手持ちのA3Mフォーマットファイル、FASTAフォーマットで入力された複数のアミノ酸配列を含む1つのfastaファイル、さらにはディレクトリ自体をインプットに指定する事が可能です。
122 | 
123 | ## Tutorials & Presentations
124 | 
125 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI).
126 | 
127 | ## Acknowledgments
128 | 
129 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).
130 | 
131 | ## How do I reference this work?
132 | 
133 | - Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. <br />
134 |   *Nature Methods* (2022) doi: [10.1038/s41592-022-01488-1](https://www.nature.com/articles/s41592-022-01488-1)
135 | - If you’re using **AlphaFold**, please also cite: <br />
136 |   Jumper et al. "Highly accurate protein structure prediction with AlphaFold." <br />
137 |   *Nature* (2021) doi: [10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2)
138 | - If you’re using **AlphaFold-multimer**, please also cite: <br />
139 |   Evans et al. "Protein complex prediction with AlphaFold-Multimer." <br />
140 |   *BioRxiv* (2021) doi: [10.1101/2021.10.04.463034v1](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1)
141 | - If you are using **RoseTTAFold**, please also cite: <br />
142 |   Minkyung et al. "Accurate prediction of protein structures and interactions using a three-track neural network." <br />
143 |   *Science* (2021) doi: [10.1126/science.abj8754](https://doi.org/10.1126/science.abj8754)
144 | 
145 | [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5123296.svg)](https://doi.org/10.5281/zenodo.5123296)


--------------------------------------------------------------------------------
/beta_install_colabbatch_linux.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | type wget || { echo "wget command is not installed. Please install it at first using apt or yum." ; exit 1 ; }
 4 | type curl || { echo "curl command is not installed. Please install it at first using apt or yum. " ; exit 1 ; }
 5 | 
 6 | CURRENTPATH=`pwd`
 7 | COLABFOLDDIR="${CURRENTPATH}/localcolabfold"
 8 | 
 9 | mkdir -p ${COLABFOLDDIR}
10 | cd ${COLABFOLDDIR}
11 | wget -q -P . https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
12 | bash ./Mambaforge-Linux-x86_64.sh -b -p ${COLABFOLDDIR}/conda
13 | rm Mambaforge-Linux-x86_64.sh
14 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
15 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
16 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.9 -y
17 | conda activate $COLABFOLDDIR/colabfold-conda
18 | conda update -n base conda -y
19 | conda install -c conda-forge python=3.9 cudnn==8.2.1.32 cudatoolkit==11.1.1 openmm==7.5.1 pdbfixer -y
20 | # Download the updater
21 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_linux.sh --no-check-certificate
22 | chmod +x update_linux.sh
23 | # install alignment tools
24 | conda install -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 mmseqs2=14.7e284 -y
25 | # install ColabFold and Jaxlib
26 | # colabfold-conda/bin/python3.9 -m pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
27 | colabfold-conda/bin/python3.9 -m pip install -q --no-warn-conflicts "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold@beta"
28 | colabfold-conda/bin/python3.9 -m pip install https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl
29 | colabfold-conda/bin/python3.9 -m pip install jax==0.3.25 biopython==1.79
30 | 
31 | # Use 'Agg' for non-GUI backend
32 | cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.9/site-packages/colabfold
33 | sed -i -e "s#from matplotlib import pyplot as plt#import matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt#g" plot.py
34 | # modify the default params directory
35 | sed -i -e "s#appdirs.user_cache_dir(__package__ or \"colabfold\")#\"${COLABFOLDDIR}/colabfold\"#g" download.py
36 | 
37 | # start downloading weights
38 | cd ${COLABFOLDDIR}
39 | colabfold-conda/bin/python3.9 -m colabfold.download
40 | cd ${CURRENTPATH}
41 | 
42 | echo "Download of alphafold2 weights finished."
43 | echo "-----------------------------------------"
44 | echo "Installation of colabfold_batch finished."
45 | echo "Add ${COLABFOLDDIR}/colabfold-conda/bin to your environment variable PATH to run 'colabfold_batch'."
46 | echo "i.e. For Bash, export PATH=\"${COLABFOLDDIR}/colabfold-conda/bin:\$PATH\""
47 | echo "For more details, please type 'colabfold_batch --help'."


--------------------------------------------------------------------------------
/beta_update_linux.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | COLABFOLDDIR=$1
 4 | 
 5 | if [ ! -d $COLABFOLDDIR/colabfold-conda ]; then
 6 |     echo "Error! colabfold-conda directory is not present in $COLABFOLDDIR."
 7 |     exit 1
 8 | fi
 9 | 
10 | pushd $COLABFOLDDIR || { echo "${COLABFOLDDIR} is not present." ; exit 1 ; }
11 | 
12 | # get absolute path of COLABFOLDDIR
13 | COLABFOLDDIR=$(cd $(dirname colabfold_batch); pwd)
14 | # activate conda in $COLABFOLDDIR/conda
15 | . ${COLABFOLDDIR}/conda/etc/profile.d/conda.sh
16 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
17 | conda activate $COLABFOLDDIR/colabfold-conda
18 | # reinstall colabfold and alphafold-colabfold
19 | python3.8 -m pip uninstall -q "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold@beta" -y
20 | python3.8 -m pip uninstall alphafold-colabfold -y
21 | python3.8 -m pip install --no-warn-conflicts "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold@beta"
22 | 
23 | # use 'agg' for non-GUI backend
24 | pushd ${COLABFOLDDIR}/colabfold-conda/lib/python3.9/site-packages/colabfold
25 | sed -i -e "s#from matplotlib import pyplot as plt#import matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt#g" plot.py
26 | sed -i -e "s#appdirs.user_cache_dir(__package__ or \"colabfold\")#\"${COLABFOLDDIR}/colabfold\"#g" download.py
27 | popd
28 | popd


--------------------------------------------------------------------------------
/install_colabbatch_M1mac.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | # check commands
 4 | type wget 2>/dev/null || { echo "Please install wget using Homebrew:\n\tbrew install wget" ; exit 1 ; }
 5 | type hhsearch 2>/dev/null || { echo -e "Please install hh-suite using Homebrew:\n\tbrew install brewsci/bio/hh-suite" ; exit 1 ; }
 6 | type kalign 2>/dev/null || { echo -e "Please install kalign using Homebrew:\n\tbrew install kalign" ; exit 1 ; }
 7 | type mmseqs 2>/dev/null || { echo -e "Please install mmseqs2 using Homebrew:\n\tbrew install mmseqs2" ; exit 1 ; }
 8 | 
 9 | # check whether Apple Silicon (M1 mac) or Intel Mac
10 | arch_name="$(uname -m)"
11 | if [ "${arch_name}" = "x86_64" ]; then
12 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
13 |         echo "Running on Rosetta 2"
14 |     else
15 |         echo "Running on native Intel"
16 |     fi
17 |     echo "This installer is only for Apple Silicon. Use install_colabfold_intelmac.sh to install on this Mac."
18 |     exit 1
19 | elif [ "${arch_name}" = "arm64" ]; then
20 |     echo "Running on Apple Silicon (M1 mac)"
21 | else
22 |     echo "Unknown architecture: ${arch_name}"
23 |     exit 1
24 | fi
25 | 
26 | # Maybe required for Apple Silicon (M1 mac) when installing mambaforge
27 | ulimit -n 99999
28 | 
29 | CURRENTPATH="$(pwd)"
30 | COLABFOLDDIR="${CURRENTPATH}/localcolabfold"
31 | 
32 | mkdir -p "${COLABFOLDDIR}"
33 | cd "${COLABFOLDDIR}"
34 | wget -q -P . https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
35 | bash ./Miniforge3-MacOSX-arm64.sh -b -p "${COLABFOLDDIR}/conda"
36 | rm Miniforge3-MacOSX-arm64.sh
37 | 
38 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
39 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
40 | conda update -n base conda -y
41 | conda create -p "$COLABFOLDDIR/colabfold-conda" -c conda-forge \
42 |     git python=3.10 openmm==8.0.0 pdbfixer==1.9 -y
43 | conda activate "$COLABFOLDDIR/colabfold-conda"
44 | 
45 | # install colabfold
46 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts \
47 |     "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
48 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install jax==0.4.23 jaxlib==0.4.23
49 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
50 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow
51 | 
52 | # Download the updater
53 | wget -qnc -O "$COLABFOLDDIR/update_M1mac.sh" \
54 |     https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_M1mac.sh
55 | chmod +x "$COLABFOLDDIR/update_M1mac.sh"
56 | 
57 | # Download weights
58 | "$COLABFOLDDIR/colabfold-conda/bin/python3" -m colabfold.download
59 | echo "Download of alphafold2 weights finished."
60 | echo "-----------------------------------------"
61 | echo "Installation of ColabFold finished."
62 | echo "Add ${COLABFOLDDIR}/colabfold-conda/bin to your environment variable PATH to run 'colabfold_batch'."
63 | echo -e "i.e. for Bash:\n\texport PATH=\"${COLABFOLDDIR}/colabfold-conda/bin:\$PATH\""
64 | echo "For more details, please run 'colabfold_batch --help'."
65 | 


--------------------------------------------------------------------------------
/install_colabbatch_intelmac.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | # check commands
 4 | type wget 2>/dev/null || { echo "Please install wget using Homebrew:\n\tbrew install wget" ; exit 1 ; }
 5 | type hhsearch 2>/dev/null || { echo -e "Please install hh-suite using Homebrew:\n\tbrew install brewsci/bio/hh-suite" ; exit 1 ; }
 6 | type kalign 2>/dev/null || { echo -e "Please install kalign using Homebrew:\n\tbrew install kalign" ; exit 1 ; }
 7 | type mmseqs 2>/dev/null || { echo -e "Please install mmseqs2 using Homebrew:\n\tbrew install mmseqs2" ; exit 1 ; }
 8 | 
 9 | # check whether Apple Silicon (M1 mac) or Intel Mac
10 | arch_name="$(uname -m)"
11 | if [ "${arch_name}" = "x86_64" ]; then
12 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
13 |         echo "Running on Rosetta 2"
14 |     else
15 |         echo "Running on native Intel"
16 |     fi
17 | elif [ "${arch_name}" = "arm64" ]; then
18 |     echo "Running on Apple Silicon (M1 mac)"
19 |     echo "This installer is only for intel Mac. Use install_colabfold_M1mac.sh to install on this Mac."
20 |     exit 1
21 | else
22 |     echo "Unknown architecture: ${arch_name}"
23 |     exit 1
24 | fi
25 | 
26 | CURRENTPATH="$(pwd)"
27 | COLABFOLDDIR="${CURRENTPATH}/localcolabfold"
28 | 
29 | mkdir -p "${COLABFOLDDIR}"
30 | cd "${COLABFOLDDIR}"
31 | wget -q -P . https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
32 | bash ./Miniforge3-MacOSX-x86_64.sh -b -p "${COLABFOLDDIR}/conda"
33 | rm Miniforge3-MacOSX-x86_64.sh
34 | 
35 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
36 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
37 | conda update -n base conda -y
38 | conda create -p "$COLABFOLDDIR/colabfold-conda" -c conda-forge -c bioconda \
39 |     git python=3.10 openmm==8.0.0 pdbfixer==1.9 \
40 |     kalign2=2.04 hhsuite=3.3.0 mmseqs2=15.6f452 -y
41 | conda activate "$COLABFOLDDIR/colabfold-conda"
42 | 
43 | # install colabfold
44 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts \
45 |     "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
46 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install jax==0.4.23 jaxlib==0.4.23
47 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
48 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow
49 | 
50 | # Download the updater
51 | wget -qnc -O "$COLABFOLDDIR/update_intelmac.sh" \
52 |     https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_intelmac.sh
53 | chmod +x "$COLABFOLDDIR/update_intelmac.sh"
54 | 
55 | # Download weights
56 | "$COLABFOLDDIR/colabfold-conda/bin/python3" -m colabfold.download
57 | echo "Download of alphafold2 weights finished."
58 | echo "-----------------------------------------"
59 | echo "Installation of ColabFold finished."
60 | echo "Note: AlphaFold2 weights were downloaded to the ~/Library/Caches/colabfold/params directory."
61 | echo "Add ${COLABFOLDDIR}/colabfold-conda/bin to your PATH environment variable to run 'colabfold_batch'."
62 | echo -e "i.e. for Bash:\n\texport PATH=\"${COLABFOLDDIR}/colabfold-conda/bin:\$PATH\""
63 | echo "For more details, please run 'colabfold_batch --help'."
64 | 


--------------------------------------------------------------------------------
/install_colabbatch_linux.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | type wget 2>/dev/null || { echo "wget is not installed. Please install it using apt or yum." ; exit 1 ; }
 4 | 
 5 | CURRENTPATH=`pwd`
 6 | COLABFOLDDIR="${CURRENTPATH}/localcolabfold"
 7 | 
 8 | mkdir -p "${COLABFOLDDIR}"
 9 | cd "${COLABFOLDDIR}"
10 | wget -q -P . https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
11 | bash ./Miniforge3-Linux-x86_64.sh -b -p "${COLABFOLDDIR}/conda"
12 | rm Miniforge3-Linux-x86_64.sh
13 | 
14 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
15 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
16 | conda update -n base conda -y
17 | conda create -p "$COLABFOLDDIR/colabfold-conda" -c conda-forge -c bioconda \
18 |     git python=3.10 openmm==8.2.0 pdbfixer \
19 |     kalign2=2.04 hhsuite=3.3.0 mmseqs2 -y
20 | conda activate "$COLABFOLDDIR/colabfold-conda"
21 | 
22 | # install ColabFold and Jaxlib
23 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts \
24 |     "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold"
25 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
26 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --upgrade "jax[cuda12]==0.5.3"
27 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --upgrade tensorflow
28 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow
29 | 
30 | # Download the updater
31 | wget -qnc -O "$COLABFOLDDIR/update_linux.sh" \
32 |     https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_linux.sh
33 | chmod +x "$COLABFOLDDIR/update_linux.sh"
34 | 
35 | pushd "${COLABFOLDDIR}/colabfold-conda/lib/python3.10/site-packages/colabfold"
36 | # Use 'Agg' for non-GUI backend
37 | sed -i -e "s#from matplotlib import pyplot as plt#import matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt#g" plot.py
38 | # modify the default params directory
39 | sed -i -e "s#appdirs.user_cache_dir(__package__ or \"colabfold\")#\"${COLABFOLDDIR}/colabfold\"#g" download.py
40 | # suppress warnings related to tensorflow
41 | sed -i -e "s#from io import StringIO#from io import StringIO\nfrom silence_tensorflow import silence_tensorflow\nsilence_tensorflow()#g" batch.py
42 | # remove cache directory
43 | rm -rf __pycache__
44 | popd
45 | 
46 | # Download weights
47 | "$COLABFOLDDIR/colabfold-conda/bin/python3" -m colabfold.download
48 | echo "Download of alphafold2 weights finished."
49 | echo "-----------------------------------------"
50 | echo "Installation of ColabFold finished."
51 | echo "Add ${COLABFOLDDIR}/colabfold-conda/bin to your PATH environment variable to run 'colabfold_batch'."
52 | echo -e "i.e. for Bash:\n\texport PATH=\"${COLABFOLDDIR}/colabfold-conda/bin:\$PATH\""
53 | echo "For more details, please run 'colabfold_batch --help'."
54 | 


--------------------------------------------------------------------------------
/update_M1mac.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | # check whether Apple Silicon (M1 mac) or Intel Mac
 4 | arch_name="$(uname -m)"
 5 | if [ "${arch_name}" = "x86_64" ]; then
 6 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
 7 |         echo "Running on Rosetta 2"
 8 |     else
 9 |         echo "Running on native Intel"
10 |     fi
11 |     echo "This installer is only for Apple Silicon. Use update_intelmac.sh to install on this Mac."
12 |     exit 1
13 | elif [ "${arch_name}" = "arm64" ]; then
14 |     echo "Running on Apple Silicon (M1 mac)"
15 | else
16 |     echo "Unknown architecture: ${arch_name}"
17 |     exit 1
18 | fi
19 | 
20 | # Maybe required for Apple Silicon (M1 mac) when installing mambaforge
21 | ulimit -n 99999
22 | 
23 | COLABFOLDDIR="$1"
24 | if [ ! -d "$COLABFOLDDIR/colabfold-conda" ]; then
25 |     echo "Error! colabfold-conda directory is not present in $COLABFOLDDIR."
26 |     exit 1
27 | fi
28 | 
29 | # activate conda in $COLABFOLDDIR/conda
30 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
31 | conda activate "$COLABFOLDDIR/colabfold-conda"
32 | 
33 | # reinstall colabfold
34 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts --upgrade --force-reinstall \
35 |     "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
36 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install jax==0.4.23 jaxlib==0.4.23
37 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
38 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow


--------------------------------------------------------------------------------
/update_intelmac.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | # check whether Apple Silicon (M1 mac) or Intel Mac
 4 | arch_name="$(uname -m)"
 5 | if [ "${arch_name}" = "x86_64" ]; then
 6 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
 7 |         echo "Running on Rosetta 2"
 8 |     else
 9 |         echo "Running on native Intel"
10 |     fi
11 | elif [ "${arch_name}" = "arm64" ]; then
12 |     echo "Running on Apple Silicon (M1 mac)"
13 |     echo "This installer is only for intel Mac."
14 |     exit 1
15 | else
16 |     echo "Unknown architecture: ${arch_name}"
17 |     exit 1
18 | fi
19 | 
20 | COLABFOLDDIR="$1"
21 | if [ ! -d "$COLABFOLDDIR/colabfold-conda" ]; then
22 |     echo "Error! colabfold-conda directory is not present in $COLABFOLDDIR."
23 |     exit 1
24 | fi
25 | 
26 | # activate conda in $COLABFOLDDIR/conda
27 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
28 | conda activate "$COLABFOLDDIR/colabfold-conda"
29 | 
30 | # reinstall colabfold
31 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts --upgrade --force-reinstall \
32 |     "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold"
33 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install jax==0.4.23 jaxlib==0.4.23
34 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
35 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow


--------------------------------------------------------------------------------
/update_linux.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -e
 2 | 
 3 | # get absolute path of COLABFOLDDIR
 4 | COLABFOLDDIR=$(realpath $(dirname $0))
 5 | 
 6 | if [ ! -d "$COLABFOLDDIR/colabfold-conda" ]; then
 7 |     echo "Error! colabfold-conda directory is not present in $COLABFOLDDIR."
 8 |     exit 1
 9 | fi
10 | 
11 | # activate conda in $COLABFOLDDIR/conda
12 | source "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
13 | conda activate "$COLABFOLDDIR/colabfold-conda"
14 | 
15 | # reinstall colabfold and alphafold-colabfold
16 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --no-warn-conflicts --upgrade --force-reinstall \
17 |     "colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold"
18 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install "colabfold[alphafold]"
19 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --force-reinstall "jax[cuda12]==0.5.3" "numpy==2.2.5"
20 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install --upgrade tensorflow
21 | "$COLABFOLDDIR/colabfold-conda/bin/pip" install silence_tensorflow
22 | 
23 | # use 'agg' for non-GUI backend
24 | cd "${COLABFOLDDIR}/colabfold-conda/lib/python3.10/site-packages/colabfold"
25 | sed -i -e "s#from matplotlib import pyplot as plt#import matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt#g" plot.py
26 | # modify the default params directory
27 | sed -i -e "s#appdirs.user_cache_dir(__package__ or \"colabfold\")#\"${COLABFOLDDIR}/colabfold\"#g" download.py
28 | # suppress warnings related to tensorflow
29 | sed -i -e "s#from io import StringIO#from io import StringIO\nfrom silence_tensorflow import silence_tensorflow\nsilence_tensorflow()#g" batch.py
30 | # remove cache directory
31 | rm -rf __pycache__
32 | 


--------------------------------------------------------------------------------
/v1.0.0/README.md:
--------------------------------------------------------------------------------
  1 | # LocalColabFold
  2 | 
  3 | [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb) on your local PC (or macOS)
  4 | 
  5 | ## Installation
  6 | 
  7 | ### For Linux
  8 | 
  9 | 1. Make sure `curl`, `git`, and `wget` commands are already installed on your PC. If not present, you need install them at first. For Ubuntu, type `sudo apt -y install curl git wget`.
 10 | 2. Make sure your Cuda compiler driver is **11.1 or later**:<pre>$ nvcc --version
 11 | nvcc: NVIDIA (R) Cuda compiler driver
 12 | Copyright (c) 2005-2020 NVIDIA Corporation
 13 | Built on Mon_Oct_12_20:09:46_PDT_2020
 14 | Cuda compilation tools, release 11.1, V11.1.105
 15 | Build cuda_11.1.TC455_06.29190527_0
 16 | </pre>DO NOT use `nvidia-smi` for checking the version.<br>See [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) if you haven't installed it.
 17 | 1. Download `install_colabfold_linux.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_linux.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabfold_linux.sh</pre>About 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation.
 18 | 1. Type `cd colabfold` to enter the directory.
 19 | 1. Modify the variables such as `sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`, `jobname = "test"`, and etc. in `runner.py` for your prediction. For more information, please refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb).
 20 | 1. To run the prediction, type <pre>$ colabfold-conda/bin/python3.7 runner.py</pre>in the `colabfold` directory. The result files will be created in the `predition_<jobname>_<hash>` in the `colabfold` directory. After the prediction finished, you may move the results from the `colabfold` directory.
 21 | 
 22 | ### For macOS
 23 | 
 24 | **Caution: Due to the lack of Nvidia GPU/CUDA driver, the structure prediction on macOS are 5-10 times slower than on Linux+GPU**. For the test sequence (58 a.a.), it may take 30 minutes. However, it may be useful to play with it before preparing Linux+GPU environment.
 25 | 
 26 | You can check whether your Mac is Intel or Apple Silicon by typing `uname -m` on Terminal.
 27 | 
 28 | ```bash
 29 | $ uname -m
 30 | x86_64 # Intel
 31 | arm64  # Apple Silicon
 32 | ```
 33 | 
 34 | Please use the correct installer for your Mac.
 35 | 
 36 | #### For Mac with Intel CPU
 37 | 
 38 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 39 | 1. Install `wget` command using Homebrew:<pre>$ brew install wget</pre>
 40 | 1. Download `install_colabfold_intelmac.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_intelmac.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabfold_intelmac.sh</pre>About 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation.
 41 | 1. The rest procedure is the same as "For Linux".
 42 | 
 43 | #### For Mac with Apple Silicon (M1 chip)
 44 | 
 45 | **Note: This installer is experimental because most of the dependent packages are not fully tested on Apple Silicon Mac.**
 46 | 
 47 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 48 | 1. Install `wget` and `cmake` commands using Homebrew:<pre>$ brew install wget cmake</pre>
 49 | 1. Install `miniforge` command using Homebrew:<pre>$ brew install --cask miniforge</pre>
 50 | 1. Download `install_colabfold_M1mac.sh` from this repository:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_M1mac.sh</pre> and run it in the directory where you want to install:<pre>$ bash install_colabfold_M1mac.sh</pre>About 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation.
 51 | 1. Type `cd colabfold` to enter the directory.
 52 | 1. Modify the variables such as `sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`, `jobname = "test"`, and etc. in `runner.py` for your prediction. For more information, please refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb).
 53 | 1. To run the prediction, type <pre>$ colabfold-conda/bin/python3.8 runner.py</pre>in the `colabfold` directory. The result files will be created in the `predition_<jobname>_<hash>` in the `colabfold` directory. After the prediction finished, you may move the results from the `colabfold` directory.
 54 | 
 55 | A Warning message appeared when you run the prediction:
 56 | ```
 57 | You are using an experimental build of OpenMM v7.5.1.
 58 | This is NOT SUITABLE for production!
 59 | It has not been properly tested on this platform and we cannot guarantee it provides accurate results.
 60 | ```
 61 | 
 62 | This message is due to Apple Silicon, but I think we can ignore it.
 63 | 
 64 | ## Usage of `colabfold` shell script (Linux)
 65 | 
 66 | An executable `colabfold` shell script is installed in `/path/to/colabfold/bin` directory. This is more helpful for installation on a shared computer and users who want to predict many sequences.
 67 | 
 68 | 1. Prepare a FASTA file containing the amino acid sequence for which you want to predict the structure (e.g. `6x9z.fasta`).<pre>>6X9Z_1|Chain A|Transmembrane beta-barrels|synthetic construct (32630)
 69 | MEQKPGTLMVYVVVGYNTDNTVDVVGGAQYAVSPYLFLDVGYGWNNSSLNFLEVGGGVSYKVSPDLEPYVKAGFEYNTDNTIKPTAGAGALYRVSPNLALMVEYGWNNSSLQKVAIGIAYKVKD</pre>
 70 | 2. Type `export PATH="/path/to/colabfold/bin:$PATH"` to add a path to the PATH environment variable. For example, `export PATH="/home/foo/bar/colabfold/bin:$PATH"` if you installed localcolabfold on `/home/foo/bar/colabfold`.
 71 | 3. Run colabfold command with your FASTA file. For example,<pre>$ colabfold --input 6x9z.fasta \\
 72 |    --output_dir 6x9z \\
 73 |    --max_recycle 18 \\
 74 |    --use_ptm \\
 75 |    --use_turbo \\
 76 |    --num_relax Top5</pre>This will predict a protein structure [6x9z](https://www.rcsb.org/structure/6x9z) with increasing the number of 'recycling' to 18. This may be effective for *de novo* structure prediction. For another example, [PDB: 3KUD](https://www.rcsb.org/structure/3KUD), <pre>$ colabfold --input 3kud_complex.fasta \\
 77 |    --output_dir 3kud \\
 78 |    --homooligomer 1:1 \\
 79 |    --use_ptm \\
 80 |    --use_turbo \\
 81 |    --max_recycle 3 \\
 82 |    --num_relax Top5</pre>where the input sequence `3kud_complex.fasta` is<pre>>3KUD_complex
 83 |    MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH:
 84 |    PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL</pre>This will predict a heterooligomer. For more information about the options, type `colabfold --help` or refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb).
 85 | 
 86 | ## Advantages of LocalColabFold
 87 | - **Structure inference and relaxation will be accelerated if your PC has Nvidia GPU and CUDA drivers.**
 88 | - **No Time out (90 minutes and 12 hours)**
 89 | - **No GPU limitations**
 90 | - **NOT necessary to prepare the large database required for native AlphaFold2**.
 91 | 
 92 | ## FAQ
 93 | - What else do I need to do before installation? Do I need sudo privileges?
 94 |   - No, except for installation of `curl` and `wget` commands.
 95 | - Do I need to prepare the large database such as PDB70, BFD, Uniclust30, MGnify...?
 96 |   - **No. it is not necessary.** Generation of MSA is performed by the MMseqs2 web server, just as implemented in ColabFold.
 97 | - Are the pLDDT score and PAE figures available?
 98 |   - Yes, they will be generated just like the ColabFold.
 99 | - Is it possible to predict homooligomers and complexes?
100 |   - Yes, the sequence input is the same as ColabFold. See [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb).
101 | - Is it possible to create MSA by jackhmmer?
102 |   - **No, it is not currently supported**.
103 | - I want to run the predictions step-by-step like Google Colab.
104 |   - You can use VSCode and Python plugin to do the same. See https://code.visualstudio.com/docs/python/jupyter-support-py.
105 | - I want to use multiple GPUs to perform the prediction.
106 |   - You need to set the environment variables `TF_FORCE_UNIFIED_MEMORY`,`XLA_PYTHON_CLIENT_MEM_FRACTION` before execution. See [this discussion](https://github.com/YoshitakaMo/localcolabfold/issues/7#issuecomment-923027641).
107 | - I want to solve the `ResourceExhausted` error when trying to predict for a sequence with > 1000 residues.
108 |   - See the same discussion as above.
109 | - I got an error message `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`.
110 |   - You may not have updated to CUDA 11.1 or later. Please check the version of Cuda compiler with `nvcc --version` command, not `nvidia-smi`.
111 | - Is this available on Windows 10?
112 |   - You can run LocalColabFold on your Windows 10 with [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10).
113 | 
114 | ## Tutorials & Presentations
115 | 
116 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI).
117 | 
118 | ## Acknowledgments
119 | 
120 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).
121 | 
122 | ## How do I reference this work?
123 | 
124 | - Mirdita M, Schuetze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. *bioRxiv*, doi: [10.1101/2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2) (2021)
125 | - John Jumper, Richard Evans, Alexander Pritzel, et al. -  Highly accurate protein structure prediction with AlphaFold. *Nature*, 1–11, doi: [10.1038/s41586-021-03819-2](https://www.nature.com/articles/s41586-021-03819-2) (2021)
126 | 
127 | [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5123296.svg)](https://doi.org/10.5281/zenodo.5123296)
128 | 


--------------------------------------------------------------------------------
/v1.0.0/README_ja.md:
--------------------------------------------------------------------------------
  1 | # LocalColabFold
  2 | 
  3 | 個人用パソコンのCPUとGPUで動かす[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)。
  4 | 
  5 | ## インストール方法
  6 | 
  7 | ### Linux+GPUの場合
  8 | 
  9 | 1. ターミナル上で`curl`, `git`と`wget`コマンドがすでにインストールされていることを確認します。存在しない場合は先にこれらをインストールしてください。Ubuntuの場合はtype `sudo apt -y install curl git wget`でインストールできます。
 10 | 2. **Cuda compilerのバージョンが11.1以降であることを確認します。**<pre>$ nvcc --version
 11 | nvcc: NVIDIA (R) Cuda compiler driver
 12 | Copyright (c) 2005-2020 NVIDIA Corporation
 13 | Built on Mon_Oct_12_20:09:46_PDT_2020
 14 | Cuda compilation tools, release 11.1, V11.1.105
 15 | Build cuda_11.1.TC455_06.29190527_0
 16 | </pre>バージョンチェックの時に`nvidia-smi`コマンドを使わないでください。こちらでは不正確です。<br>まだCUDA Compilerをインストールしていない場合は、[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)を参照してください。
 17 | 3. このリポジトリにある`install_colabfold_linux.sh`をダウンロードします。<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_linux.sh</pre>これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabfold_linux.sh</pre>およそ5分後に`colabfold`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。
 18 | 4. `cd colabfold`を入力してこのディレクトリに入ります。
 19 | 5. `runner.py`ファイル中の`sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`や`jobname = "test"`などのパラメータを変更し、構造予測のために必要な情報を入力します。詳細な設定方法についてはオリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)を参考にしてください。こちらで可能な設定はほとんど利用可能です（MSA_methods以外）。
 20 | 6. 予測を行うには、`colabfold`ディレクトリ内で以下のコマンドをターミナルで入力してください：<pre>$ colabfold-conda/bin/python3.7 runner.py</pre>予測結果のファイルは`predition_<jobname>_<hash>`という形式で`colabfold`内に作成されます。予測が終了した後は、結果ファイルを`colabfold`ディレクトリの外に移動させたり結果ファイルのディレクトリの名前を変えてもOKです。
 21 | 
 22 | ### macOSの場合
 23 | 
 24 | **注意: macOSではNvidia GPUとCUDAドライバがないため、構造推論部分がLinux+GPU環境に比べて5〜10倍ほど遅くなります**。テスト用のアミノ酸配列(58アミノ酸)ではおよそ30分ほど計算に時間がかかります。ただ、Linux+GPU環境を準備する前にこれで遊んでみるのはありかもしれません。
 25 | 
 26 | また、自身の持っているMacがIntel CPUのものか、M1 chip入りのもの（Apple Silicon）かを先に確認してください。ターミナルで`uname -m`の結果でどちらかが判明します。
 27 | 
 28 | ```bash
 29 | $ uname -m
 30 | x86_64 # Intel
 31 | arm64  # Apple Silicon
 32 | ```
 33 | 
 34 | （Apple SiliconでRosetta2を使っている場合はApple Siliconでもx86_64って表示されますけれど……今のところこれには対応していません。）
 35 | 
 36 | 以上の結果を踏まえて適切なインストーラーを選択してください。
 37 | 
 38 | #### Intel CPUのMacの場合
 39 | 
 40 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 41 | 2. Homebrewで`wget`コマンドをインストールします:<pre>$ brew install wget</pre>
 42 | 3. `install_colabfold_intelmac.sh`をこのリポジトリからダウンロードします:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_intelmac.sh</pre>これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabfold_intelmac.sh</pre>およそ5分後に`colabfold`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。
 43 | 4. 残りの手順は"Linux+GPUの場合"と同様です.
 44 | 
 45 | #### Apple Silicon (M1 chip)のMacの場合
 46 | 
 47 | **Note: 依存するPythonパッケージのほとんどがまだApple Silicon Macで十分にテストされていないため、このインストーラーによる動作は試験的なものです。**
 48 | 
 49 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:<pre>$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</pre>
 50 | 1. Homebrweで`wget`と`cmake`コマンドをインストールします:<pre>$ brew install wget cmake</pre>
 51 | 1. `miniforge`をHomebrewでインストールします:<pre>$ brew install --cask miniforge</pre>
 52 | 1. インストーラー`install_colabfold_M1mac.sh`をこのリポジトリからダウンロードします:<pre>$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_M1mac.sh</pre> これをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:<pre>$ bash install_colabfold_M1mac.sh</pre>およそ5分後に`colabfold`ディレクトリができます。途中色々WarningsやErrorが出るかもしれません。インストール後はこのディレクトリを移動させないでください。
 53 | 1. `cd colabfold`を入力してこのディレクトリに入ります。
 54 | 1. `runner.py`ファイル中の`sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`や`jobname = "test"`などのパラメータを変更し、構造予測のために必要な情報を入力します。詳細な設定方法についてはオリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)を参考にしてください。こちらで可能な設定はほとんど利用可能です（MSA_methods以外）。
 55 | 1. 予測を行うには、`colabfold`ディレクトリ内で以下のコマンドをターミナルで入力してください：<pre>$ colabfold-conda/bin/python3.8 runner.py</pre>予測結果のファイルは`predition_<jobname>_<hash>`という形式で`colabfold`内に作成されます。予測が終了した後は、結果ファイルを`colabfold`ディレクトリの外に移動させたり結果ファイルのディレクトリの名前を変えてもOKです。
 56 | 
 57 | 予測を行っているときに以下のようなメッセージが現れます:
 58 | 
 59 | ```
 60 | You are using an experimental build of OpenMM v7.5.1.
 61 | This is NOT SUITABLE for production!
 62 | It has not been properly tested on this platform and we cannot guarantee it provides accurate results.
 63 | ```
 64 | 
 65 | このメッセージはApple Silicon上で動作させる時のみ現れますが、たぶん無視して大丈夫です。
 66 | 
 67 | ## `colabfold`コマンドの使い方（Linux向け）
 68 | 
 69 | `colabfold`は`runner.py`の代わりにコマンドライン引数を取ることのできる実行可能シェルスクリプトです。こちらは共用計算機上に一度インストールするだけで済み、複数のユーザーがlocalcolabfoldを使ってより多くの配列を予測したい場合に有用です。
 70 | 
 71 | 1. 予測したいアミノ酸配列が含まれるFASTA形式のファイルを同ディレクトリに用意します。例として`6x9z.fasta`とします。<pre>>6X9Z_1|Chain A|Transmembrane beta-barrels|synthetic construct (32630)
 72 | MEQKPGTLMVYVVVGYNTDNTVDVVGGAQYAVSPYLFLDVGYGWNNSSLNFLEVGGGVSYKVSPDLEPYVKAGFEYNTDNTIKPTAGAGALYRVSPNLALMVEYGWNNSSLQKVAIGIAYKVKD</pre>
 73 | 1. `export PATH="/path/to/colabfold/bin:$PATH"`と打つことで環境変数PATHにこのcolabfoldシェルスクリプトのファイルパスを設定します。例えばLocalColabFoldを`/home/foo/bar/colabfold`にインストールした場合は、`export PATH="/home/foo/bar/colabfold/bin:$PATH"`と入力します。
 74 | 1. 入力のアミノ酸配列ファイルを`--input`の引数に指定し、`colabfold`コマンドを実行します。例えばこんな感じ<pre>$ colabfold --input 6x9z.fasta \\
 75 |    --output_dir 6x9z \\
 76 |    --max_recycle 18 \\
 77 |    --use_ptm \\
 78 |    --use_turbo \\
 79 |    --num_relax Top5</pre>上記コマンドは*de novo*タンパク質構造[PDB: 6X9Z](https://www.rcsb.org/structure/6x9z)を予想するときに、'recycling'回数を最大18回まで引き上げています。この回数の引き上げは*de novo*タンパク質構造を予測する時には効果的であることが示されています（通常のタンパク質は3回で十分なことがほとんどです）。<br>他の入力例として, [PDB: 3KUD](https://www.rcsb.org/structure/3KUD)の**複合体予測**を行おうとするときは<pre>$ colabfold --input 3kud_complex.fasta \\
 80 |    --output_dir 3kud \\
 81 |    --homooligomer 1:1 \\
 82 |    --use_ptm \\
 83 |    --use_turbo \\
 84 |    --max_recycle 3 \\
 85 |    --num_relax Top5</pre>ここで入力配列`3kud_complex.fasta`は以下の通りです。<pre>>3KUD_complex
 86 |    MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH:
 87 |    PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL
 88 |    </pre>`:`記号でアミノ酸配列を隔てることで複合体予測をすることができます。この場合はヘテロ複合体予測になっています。ホモオリゴマー予測を行いたいときなど、他の設定については`colabfold --help`で設定方法を読むか、オリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)にある説明を読んでください。
 89 | 
 90 | ## LocalColabFoldを利用するメリット
 91 | 
 92 | - **お使いのパソコンにNvidia GPUとCUDAドライバがあれば、AlphaFold2による構造推論(Structure inference)と構造最適化(relax)が高速になります。**
 93 | - **Google Colabは90分アイドルにしていたり、12時間以上の利用でタイムアウトしますが、その制限がありません。また、GPUの使用についても当然制限がありません。**
 94 | - **データベースをダウンロードしてくる必要がないです**。
 95 | 
 96 | ## FAQ
 97 | - インストールの事前準備は？
 98 |   - `curl`, `wget`コマンド以外は不要です
 99 | - BFD, Mgnify, PDB70, Uniclust30などの巨大なデータベースを用意する必要はありますか？
100 |   - **必要ないです**。
101 | - AlphaFold2の最初の動作に必要なMSA作成はどのように行っていますか？
102 |   - MSA作成はColabFoldと同様にMMseqs2のウェブサーバーによって行われています。
103 | - ColabFoldで表示されるようなpLDDTスコアやPAEの図も生成されますか？
104 |   - はい、生成されます。
105 | - ホモ多量体予測、複合体予測も可能ですか？
106 |   - はい、可能です。配列の入力方法はGoogle Colabのやり方と同じです。
107 | - jackhmmerによるMSA作成は可能ですか？
108 |   - **現在のところ対応していません**。
109 | - Google Colabのようにセルごとに実行したい。
110 |   - VSCodeとPythonプラグインを使えば同様のことができます。See https://code.visualstudio.com/docs/python/jupyter-support-py .
111 | - 複数のGPUを利用して計算を行いたい。
112 |   - 実行前に環境変数`TF_FORCE_UNIFIED_MEMORY`,`XLA_PYTHON_CLIENT_MEM_FRACTION`を設定する必要があります。[こちらのissue](https://github.com/YoshitakaMo/localcolabfold/issues/7#issuecomment-923027641)を読んでください。
113 | - 長いアミノ酸を予測しようとしたときに`ResourceExhausted`というエラーが発生するのを解決したい。
114 |   - 上と同じissueを読んでください。
115 | - `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`というエラーメッセージが出る
116 |   - CUDA 11.1以降にアップデートされていない可能性があります。`nvcc --version`コマンドでCuda compilerのバージョンを確認してみてください。
117 | - Windows 10の上でも利用することはできますか？
118 |   - [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10)を入れればWindows 10の上でも同様に動作させることができます。
119 | 
120 | ## Tutorials & Presentations
121 | 
122 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI).
123 | 
124 | ## Acknowledgments
125 | 
126 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).
127 | 
128 | ## How do I reference this work?
129 | 
130 | - Mirdita M, Schuetze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. *bioRxiv*, doi: [10.1101/2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2) (2021)
131 | - John Jumper, Richard Evans, Alexander Pritzel, et al. -  Highly accurate protein structure prediction with AlphaFold. *Nature*, 1–11, doi: [10.1038/s41586-021-03819-2](https://www.nature.com/articles/s41586-021-03819-2) (2021)
132 | 
133 | 
134 | [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5123296.svg)](https://doi.org/10.5281/zenodo.5123296)
135 | 


--------------------------------------------------------------------------------
/v1.0.0/colabfold_alphafold.patch:
--------------------------------------------------------------------------------
 1 | --- colabfold_alphafold.py.orig    2021-10-24 10:56:09.887461716 +0900
 2 | +++ colabfold_alphafold.py         2021-10-24 11:25:12.811888920 +0900
 3 | @@ -32,6 +32,13 @@ try:
 4 |  except:
 5 |    IN_COLAB = False
 6 | 
 7 | +if os.getenv('COLABFOLD_PATH'):
 8 | +    print("COLABFOLD_PATH is set to " + os.getenv('COLABFOLD_PATH'))
 9 | +    colabfold_path = os.getenv('COLABFOLD_PATH')
10 | +else:
11 | +    print("COLABFOLD_PATH is not set.")
12 | +    colabfold_path = '.'
13 | +
14 |  import tqdm.notebook
15 |  TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'
16 | 
17 | @@ -641,7 +648,7 @@ def prep_model_runner(opt=None, model_na
18 |      cfg.model.recycle_tol = opt["tol"]
19 |      cfg.data.eval.num_ensemble = opt["num_ensemble"]
20 | 
21 | -    params = data.get_model_haiku_params(name, params_loc)
22 | +    params = data.get_model_haiku_params(name, colabfold_path + "/" + params_loc)
23 |      return {"model":model.RunModel(cfg, params, is_training=opt["is_training"]), "opt":opt}
24 |    else:
25 |      return old_runner
26 | @@ -749,7 +756,7 @@ def run_alphafold(feature_dict, opt=None
27 |            pbar.set_description(f'Running {key}')
28 | 
29 |            # replace model parameters
30 | -          params = data.get_model_haiku_params(name, params_loc)
31 | +          params = data.get_model_haiku_params(name, colabfold_path + "/" + params_loc)
32 |            for k in runner["model"].params.keys():
33 |              runner["model"].params[k] = params[k]
34 | 
35 | 


--------------------------------------------------------------------------------
/v1.0.0/gpurelaxation.patch:
--------------------------------------------------------------------------------
 1 | --- alphafold/relax/amber_minimize.py.org     2021-08-31 16:59:21.161164190 +0900
 2 | +++ alphafold/relax/amber_minimize.py 2021-08-31 16:59:32.073226369 +0900
 3 | @@ -90,7 +90,7 @@ def _openmm_minimize(
 4 |      _add_restraints(system, pdb, stiffness, restraint_set, exclude_residues)
 5 | 
 6 |    integrator = openmm.LangevinIntegrator(0, 0.01, 0.0)
 7 | -  platform = openmm.Platform.getPlatformByName("CPU")
 8 | +  platform = openmm.Platform.getPlatformByName("CUDA")
 9 |    simulation = openmm_app.Simulation(
10 |        pdb.topology, system, integrator, platform)
11 |    simulation.context.setPositions(pdb.positions)
12 | @@ -530,7 +530,7 @@ def get_initial_energies(pdb_strs: Seque
13 |    simulation = openmm_app.Simulation(openmm_pdbs[0].topology,
14 |                                       system,
15 |                                       openmm.LangevinIntegrator(0, 0.01, 0.0),
16 | -                                     openmm.Platform.getPlatformByName("CPU"))
17 | +                                     openmm.Platform.getPlatformByName("CUDA"))
18 |    energies = []
19 |    for pdb in openmm_pdbs:
20 |      try:
21 | 


--------------------------------------------------------------------------------
/v1.0.0/install_colabfold_M1mac.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/sh
  2 | 
  3 | # check whether `wget` and `cmake` are installed
  4 | type wget || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; }
  5 | type cmake || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; }
  6 | 
  7 | # check whether miniforge is present
  8 | test -f "/opt/homebrew/Caskroom/miniforge/base/etc/profile.d/conda.sh" || { echo "Install miniforge by using Homebrew before installation. \n 'brew install --cask miniforge'" ; exit 1 ; }
  9 | 
 10 | # check whether Apple Silicon (M1 mac) or Intel Mac
 11 | arch_name="$(uname -m)"
 12 | 
 13 | if [ "${arch_name}" = "x86_64" ]; then
 14 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
 15 |         echo "Running on Rosetta 2"
 16 |     else
 17 |         echo "Running on native Intel"
 18 |     fi
 19 |     echo "This installer is only for Apple Silicon. Use install_colabfold_intelmac.sh to install on this Mac."
 20 |     exit 1
 21 | elif [ "${arch_name}" = "arm64" ]; then
 22 |     echo "Running on Apple Silicon (M1 mac)"
 23 | else
 24 |     echo "Unknown architecture: ${arch_name}"
 25 |     exit 1
 26 | fi
 27 | 
 28 | GIT_REPO="https://github.com/deepmind/alphafold"
 29 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar"
 30 | CURRENTPATH=`pwd`
 31 | COLABFOLDDIR="${CURRENTPATH}/colabfold"
 32 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params"
 33 | MSATOOLS="${COLABFOLDDIR}/tools"
 34 | 
 35 | # download the original alphafold as "${COLABFOLDDIR}"
 36 | echo "downloading the original alphafold as ${COLABFOLDDIR}..."
 37 | rm -rf ${COLABFOLDDIR}
 38 | git clone ${GIT_REPO} ${COLABFOLDDIR}
 39 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet)
 40 | 
 41 | # colabfold patches
 42 | echo "Applying several patches to be Alphafold2_advanced..."
 43 | cd ${COLABFOLDDIR}
 44 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py
 45 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py
 46 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py
 47 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch
 48 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch
 49 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch
 50 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch
 51 | # GPU relaxation patch
 52 | # wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch
 53 | 
 54 | # donwload reformat.pl from hh-suite
 55 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl
 56 | # Apply multi-chain patch from Lim Heo @huhlim
 57 | patch -u alphafold/common/protein.py -i protein.patch
 58 | patch -u alphafold/model/model.py -i model.patch
 59 | patch -u alphafold/model/modules.py -i modules.patch
 60 | patch -u alphafold/model/config.py -i config.patch
 61 | cd ..
 62 | 
 63 | # Downloading parameter files
 64 | echo "Downloading AlphaFold2 trained parameters..."
 65 | mkdir -p ${PARAMS_DIR}
 66 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR}
 67 | 
 68 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure
 69 | echo "Downloading stereo_chemical_props.txt..."
 70 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
 71 | mkdir -p ${COLABFOLDDIR}/alphafold/common
 72 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common
 73 | 
 74 | # echo "installing HH-suite 3.3.0..."
 75 | # mkdir -p ${MSATOOLS}
 76 | # git clone --branch v3.3.0 https://github.com/soedinglab/hh-suite.git hh-suite-3.3.0
 77 | # (cd hh-suite-3.3.0 ; mkdir build ; cd build ; cmake -DCMAKE_INSTALL_PREFIX=${MSATOOLS}/hh-suite .. ; make -j4 ; make install)
 78 | # rm -rf hh-suite-3.3.0
 79 | 
 80 | # echo "installing HMMER 3.3.2..."
 81 | # wget http://eddylab.org/software/hmmer/hmmer-3.3.2.tar.gz
 82 | # (tar xzvf hmmer-3.3.2.tar.gz ; cd hmmer-3.3.2 ; ./configure --prefix=${MSATOOLS}/hmmer ; make -j4 ; make install)
 83 | # rm -rf hmmer-3.3.2.tar.gz hmmer-3.3.2
 84 | 
 85 | echo "Creating conda environments with python3.8 as ${COLABFOLDDIR}/colabfold-conda"
 86 | . "/opt/homebrew/Caskroom/miniforge/base/etc/profile.d/conda.sh"
 87 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.8 -y
 88 | conda activate $COLABFOLDDIR/colabfold-conda
 89 | conda update -y conda
 90 | 
 91 | echo "Installing conda-forge packages"
 92 | conda install -y -c conda-forge python=3.8 openmm==7.5.1 pdbfixer jupyter matplotlib py3Dmol tqdm biopython==1.79 immutabledict==2.0.0
 93 | conda install -y -c conda-forge jax==0.2.20
 94 | conda install -y -c apple tensorflow-deps
 95 | python3.8 -m pip install tensorflow-macos
 96 | python3.8 -m pip install jaxlib==0.1.70 -f "https://dfm.io/custom-wheels/jaxlib/index.html"
 97 | python3.8 -m pip install numpy==1.21.2
 98 | python3.8 -m pip install git+git://github.com/deepmind/tree.git
 99 | python3.8 -m pip install git+git://github.com/google/ml_collections.git
100 | python3.8 -m pip install git+git://github.com/deepmind/dm-haiku.git
101 | 
102 | # Apply OpenMM patch.
103 | echo "Applying OpenMM patch..."
104 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.8/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch)
105 | 
106 | # Enable GPU-accelerated relaxation.
107 | # echo "Enable GPU-accelerated relaxation..."
108 | # (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch)
109 | 
110 | echo "Downloading runner.py"
111 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py")
112 | 
113 | echo "Installation of Alphafold2_advanced finished."
114 | 


--------------------------------------------------------------------------------
/v1.0.0/install_colabfold_intelmac.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/sh
  2 | 
  3 | # check whether `wget` are installed
  4 | type wget || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; }
  5 | 
  6 | # check whether Apple Silicon (M1 mac) or Intel Mac
  7 | arch_name="$(uname -m)"
  8 | 
  9 | if [ "${arch_name}" = "x86_64" ]; then
 10 |     if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then
 11 |         echo "Running on Rosetta 2"
 12 |     else
 13 |         echo "Running on native Intel"
 14 |     fi
 15 | elif [ "${arch_name}" = "arm64" ]; then
 16 |     echo "Running on Apple Silicon (M1 mac)"
 17 |     echo "This installer is only for intel Mac. Use install_colabfold_M1mac.sh to install on this Mac."
 18 |     exit 1
 19 | else
 20 |     echo "Unknown architecture: ${arch_name}"
 21 |     exit 1
 22 | fi
 23 | 
 24 | GIT_REPO="https://github.com/deepmind/alphafold"
 25 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar"
 26 | CURRENTPATH=`pwd`
 27 | COLABFOLDDIR="${CURRENTPATH}/colabfold"
 28 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params"
 29 | MSATOOLS="${COLABFOLDDIR}/tools"
 30 | 
 31 | # download the original alphafold as "${COLABFOLDDIR}"
 32 | echo "downloading the original alphafold as ${COLABFOLDDIR}..."
 33 | rm -rf ${COLABFOLDDIR}
 34 | git clone ${GIT_REPO} ${COLABFOLDDIR}
 35 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet)
 36 | 
 37 | # colabfold patches
 38 | echo "Applying several patches to be Alphafold2_advanced..."
 39 | cd ${COLABFOLDDIR}
 40 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py
 41 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py
 42 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py
 43 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch
 44 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch
 45 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch
 46 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch
 47 | # GPU relaxation patch
 48 | # wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch
 49 | 
 50 | # donwload reformat.pl from hh-suite
 51 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl
 52 | # Apply multi-chain patch from Lim Heo @huhlim
 53 | patch -u alphafold/common/protein.py -i protein.patch
 54 | patch -u alphafold/model/model.py -i model.patch
 55 | patch -u alphafold/model/modules.py -i modules.patch
 56 | patch -u alphafold/model/config.py -i config.patch
 57 | cd ..
 58 | 
 59 | # Downloading parameter files
 60 | echo "Downloading AlphaFold2 trained parameters..."
 61 | mkdir -p ${PARAMS_DIR}
 62 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR}
 63 | 
 64 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure
 65 | echo "Downloading stereo_chemical_props.txt..."
 66 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
 67 | mkdir -p ${COLABFOLDDIR}/alphafold/common
 68 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common
 69 | 
 70 | # Install Miniconda3 for Linux
 71 | echo "Installing Miniconda3 for macOS..."
 72 | cd ${COLABFOLDDIR}
 73 | wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
 74 | bash ./Miniconda3-latest-MacOSX-x86_64.sh -b -p ${COLABFOLDDIR}/conda
 75 | rm Miniconda3-latest-MacOSX-x86_64.sh
 76 | cd ..
 77 | 
 78 | echo "Creating conda environments with python3.7 as ${COLABFOLDDIR}/colabfold-conda"
 79 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
 80 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
 81 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.7 -y
 82 | conda activate $COLABFOLDDIR/colabfold-conda
 83 | conda update -y conda
 84 | 
 85 | echo "Installing conda-forge packages"
 86 | conda install -c conda-forge python=3.7 openmm==7.5.1 pdbfixer -y
 87 | conda install -c bioconda hmmer==3.3.2 hhsuite==3.3.0 -y
 88 | echo "Installing alphafold dependencies by pip"
 89 | python3.7 -m pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 jaxlib==0.1.69 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0
 90 | python3.7 -m pip install jupyter matplotlib py3Dmol tqdm
 91 | 
 92 | # Apply OpenMM patch.
 93 | echo "Applying OpenMM patch..."
 94 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.7/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch)
 95 | 
 96 | # Enable GPU-accelerated relaxation.
 97 | # echo "Enable GPU-accelerated relaxation..."
 98 | # (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch)
 99 | 
100 | echo "Downloading runner.py"
101 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py")
102 | 
103 | echo "Installation of Alphafold2_advanced finished."
104 | 


--------------------------------------------------------------------------------
/v1.0.0/install_colabfold_linux.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/sh
  2 | 
  3 | # check whether `wget` and `curl` are installed
  4 | type wget || { echo "wget command is not installed. Please install it at first using apt or yum." ; exit 1 ; }
  5 | type curl || { echo "curl command is not installed. Please install it at first using apt or yum. " ; exit 1 ; }
  6 | 
  7 | GIT_REPO="https://github.com/deepmind/alphafold"
  8 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar"
  9 | CURRENTPATH=`pwd`
 10 | COLABFOLDDIR="${CURRENTPATH}/colabfold"
 11 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params"
 12 | MSATOOLS="${COLABFOLDDIR}/tools"
 13 | 
 14 | # download the original alphafold as "${COLABFOLDDIR}"
 15 | echo "downloading the original alphafold as ${COLABFOLDDIR}..."
 16 | rm -rf ${COLABFOLDDIR}
 17 | git clone ${GIT_REPO} ${COLABFOLDDIR}
 18 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet)
 19 | 
 20 | # colabfold patches
 21 | echo "Applying several patches to be Alphafold2_advanced..."
 22 | cd ${COLABFOLDDIR}
 23 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py
 24 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py
 25 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py
 26 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch
 27 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch
 28 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch
 29 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch
 30 | # GPU relaxation patch
 31 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch
 32 | 
 33 | # donwload reformat.pl from hh-suite
 34 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl
 35 | # Apply multi-chain patch from Lim Heo @huhlim
 36 | patch -u alphafold/common/protein.py -i protein.patch
 37 | patch -u alphafold/model/model.py -i model.patch
 38 | patch -u alphafold/model/modules.py -i modules.patch
 39 | patch -u alphafold/model/config.py -i config.patch
 40 | cd ..
 41 | 
 42 | # Downloading parameter files
 43 | echo "Downloading AlphaFold2 trained parameters..."
 44 | mkdir -p ${PARAMS_DIR}
 45 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR}
 46 | 
 47 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure
 48 | echo "Downloading stereo_chemical_props.txt..."
 49 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate
 50 | mkdir -p ${COLABFOLDDIR}/alphafold/common
 51 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common
 52 | 
 53 | # Install Miniconda3 for Linux
 54 | echo "Installing Miniconda3 for Linux..."
 55 | cd ${COLABFOLDDIR}
 56 | wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
 57 | bash ./Miniconda3-latest-Linux-x86_64.sh -b -p ${COLABFOLDDIR}/conda
 58 | rm Miniconda3-latest-Linux-x86_64.sh
 59 | cd ..
 60 | 
 61 | echo "Creating conda environments with python3.7 as ${COLABFOLDDIR}/colabfold-conda"
 62 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
 63 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}"
 64 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.7 -y
 65 | conda activate $COLABFOLDDIR/colabfold-conda
 66 | conda update -n base conda -y
 67 | 
 68 | echo "Installing conda-forge packages"
 69 | conda install -c conda-forge python=3.7 cudnn==8.2.1.32 cudatoolkit==11.1.1 openmm==7.5.1 pdbfixer -y
 70 | conda install -c bioconda hmmer==3.3.2 hhsuite==3.3.0 -y
 71 | echo "Installing alphafold dependencies by pip"
 72 | python3.7 -m pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow-gpu==2.5.0
 73 | python3.7 -m pip install jupyter matplotlib py3Dmol tqdm
 74 | python3.7 -m pip install --upgrade jax jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
 75 | 
 76 | # Apply OpenMM patch.
 77 | echo "Applying OpenMM patch..."
 78 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.7/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch)
 79 | 
 80 | # Enable GPU-accelerated relaxation.
 81 | echo "Enable GPU-accelerated relaxation..."
 82 | (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch)
 83 | 
 84 | echo "Downloading runner.py..."
 85 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py")
 86 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner_af2advanced.py")
 87 | 
 88 | echo "Making standalone command 'colabfold'..."
 89 | cd ${COLABFOLDDIR}
 90 | mkdir -p bin && cd bin
 91 | cat << EOF > colabfold
 92 | #!/bin/sh
 93 | 
 94 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh"
 95 | conda activate ${COLABFOLDDIR}/colabfold-conda
 96 | export NVIDIA_VISIBLE_DEVICES="all"
 97 | export TF_FORCE_UNIFIED_MEMORY="1"
 98 | export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0"
 99 | export COLABFOLD_PATH="${COLABFOLDDIR}"
100 | python3.7 ${COLABFOLDDIR}/runner_af2advanced.py \$@
101 | EOF
102 | chmod +x ./colabfold
103 | cd ${COLABFOLDDIR}
104 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/residue_constants.patch -O residue_constants.patch
105 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/colabfold_alphafold.patch -O colabfold_alphafold.patch
106 | patch -u alphafold/common/residue_constants.py -i residue_constants.patch
107 | patch -u colabfold_alphafold.py -i colabfold_alphafold.patch
108 | 
109 | echo "Installation of Alphafold2_advanced finished."
110 | 


--------------------------------------------------------------------------------
/v1.0.0/residue_constants.patch:
--------------------------------------------------------------------------------
 1 | --- residue_constants.py.orig   2021-10-24 11:30:58.275400080 +0900
 2 | +++ residue_constants.py        2021-10-24 11:20:08.028085425 +0900
 3 | @@ -20,6 +20,8 @@ from typing import List, Mapping, Tuple
 4 | 
 5 |  import numpy as np
 6 |  import tree
 7 | +import os
 8 | +colabfold_path = os.getenv('COLABFOLD_PATH', '.')
 9 | 
10 |  # Internal import (35fd).
11 | 
12 | @@ -403,7 +405,7 @@ def load_stereo_chemical_props() -> Tupl
13 |      residue_bond_angles: dict that maps resname --> list of BondAngle tuples
14 |    """
15 |    stereo_chemical_props_path = (
16 | -      'alphafold/common/stereo_chemical_props.txt')
17 | +      colabfold_path + '/alphafold/common/stereo_chemical_props.txt')
18 |    with open(stereo_chemical_props_path, 'rt') as f:
19 |      stereo_chemical_props = f.read()
20 |    lines_iter = iter(stereo_chemical_props.splitlines())
21 | 
22 | 


--------------------------------------------------------------------------------
/v1.0.0/runner.py:
--------------------------------------------------------------------------------
  1 | #%%
  2 | import os
  3 | import tensorflow as tf
  4 | tf.config.set_visible_devices([], 'GPU')
  5 | 
  6 | import jax
  7 | 
  8 | from IPython.utils import io
  9 | import subprocess
 10 | import tqdm.notebook
 11 | 
 12 | # --- Python imports ---
 13 | import colabfold as cf
 14 | import pairmsa
 15 | import sys
 16 | import pickle
 17 | 
 18 | from urllib import request
 19 | from concurrent import futures
 20 | import json
 21 | from matplotlib import gridspec
 22 | import matplotlib.pyplot as plt
 23 | import numpy as np
 24 | import py3Dmol
 25 | 
 26 | from urllib import request
 27 | from concurrent import futures
 28 | import json
 29 | from matplotlib import gridspec
 30 | import matplotlib.pyplot as plt
 31 | import numpy as np
 32 | import py3Dmol
 33 | 
 34 | from alphafold.model import model
 35 | from alphafold.model import config
 36 | from alphafold.model import data
 37 | 
 38 | from alphafold.data import parsers
 39 | from alphafold.data import pipeline
 40 | from alphafold.data.tools import jackhmmer
 41 | 
 42 | from alphafold.common import protein
 43 | 
 44 | ### Check your OS for localcolabfold
 45 | import platform
 46 | pf = platform.system()
 47 | if pf == 'Windows':
 48 |   print('ColabFold on Windows')
 49 | elif pf == 'Darwin':
 50 |   print('ColabFold on Mac')
 51 |   device="cpu"
 52 | elif pf == 'Linux':
 53 |   print('ColabFold on Linux')
 54 |   device="gpu"
 55 | #%%
 56 | 
 57 | def run_jackhmmer(sequence, prefix):
 58 | 
 59 |   fasta_path = f"{prefix}.fasta"
 60 |   with open(fasta_path, 'wt') as f:
 61 |     f.write(f'>query\n{sequence}')
 62 | 
 63 |   pickled_msa_path = f"{prefix}.jackhmmer.pickle"
 64 |   if os.path.isfile(pickled_msa_path):
 65 |     msas_dict = pickle.load(open(pickled_msa_path,"rb"))
 66 |     msas, deletion_matrices, names = (msas_dict[k] for k in ['msas', 'deletion_matrices', 'names'])
 67 |     full_msa = []
 68 |     for msa in msas:
 69 |       full_msa += msa
 70 |   else:
 71 |     # --- Find the closest source ---
 72 |     test_url_pattern = 'https://storage.googleapis.com/alphafold-colab{:s}/latest/uniref90_2021_03.fasta.1'
 73 |     ex = futures.ThreadPoolExecutor(3)
 74 |     def fetch(source):
 75 |       request.urlretrieve(test_url_pattern.format(source))
 76 |       return source
 77 |     fs = [ex.submit(fetch, source) for source in ['', '-europe', '-asia']]
 78 |     source = None
 79 |     for f in futures.as_completed(fs):
 80 |       source = f.result()
 81 |       ex.shutdown()
 82 |       break
 83 | 
 84 |     jackhmmer_binary_path = '/usr/bin/jackhmmer'
 85 |     dbs = []
 86 | 
 87 |     num_jackhmmer_chunks = {'uniref90': 59, 'smallbfd': 17, 'mgnify': 71}
 88 |     total_jackhmmer_chunks = sum(num_jackhmmer_chunks.values())
 89 |     with tqdm.notebook.tqdm(total=total_jackhmmer_chunks, bar_format=TQDM_BAR_FORMAT) as pbar:
 90 |       def jackhmmer_chunk_callback(i):
 91 |         pbar.update(n=1)
 92 | 
 93 |       pbar.set_description('Searching uniref90')
 94 |       jackhmmer_uniref90_runner = jackhmmer.Jackhmmer(
 95 |           binary_path=jackhmmer_binary_path,
 96 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/uniref90_2021_03.fasta',
 97 |           get_tblout=True,
 98 |           num_streamed_chunks=num_jackhmmer_chunks['uniref90'],
 99 |           streaming_callback=jackhmmer_chunk_callback,
100 |           z_value=135301051)
101 |       dbs.append(('uniref90', jackhmmer_uniref90_runner.query(fasta_path)))
102 | 
103 |       pbar.set_description('Searching smallbfd')
104 |       jackhmmer_smallbfd_runner = jackhmmer.Jackhmmer(
105 |           binary_path=jackhmmer_binary_path,
106 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/bfd-first_non_consensus_sequences.fasta',
107 |           get_tblout=True,
108 |           num_streamed_chunks=num_jackhmmer_chunks['smallbfd'],
109 |           streaming_callback=jackhmmer_chunk_callback,
110 |           z_value=65984053)
111 |       dbs.append(('smallbfd', jackhmmer_smallbfd_runner.query(fasta_path)))
112 | 
113 |       pbar.set_description('Searching mgnify')
114 |       jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
115 |           binary_path=jackhmmer_binary_path,
116 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/mgy_clusters_2019_05.fasta',
117 |           get_tblout=True,
118 |           num_streamed_chunks=num_jackhmmer_chunks['mgnify'],
119 |           streaming_callback=jackhmmer_chunk_callback,
120 |           z_value=304820129)
121 |       dbs.append(('mgnify', jackhmmer_mgnify_runner.query(fasta_path)))
122 | 
123 |     # --- Extract the MSAs and visualize ---
124 |     # Extract the MSAs from the Stockholm files.
125 |     # NB: deduplication happens later in pipeline.make_msa_features.
126 | 
127 |     mgnify_max_hits = 501
128 |     msas = []
129 |     deletion_matrices = []
130 |     names = []
131 |     for db_name, db_results in dbs:
132 |       unsorted_results = []
133 |       for i, result in enumerate(db_results):
134 |         msa, deletion_matrix, target_names = parsers.parse_stockholm(result['sto'])
135 |         e_values_dict = parsers.parse_e_values_from_tblout(result['tbl'])
136 |         e_values = [e_values_dict[t.split('/')[0]] for t in target_names]
137 |         zipped_results = zip(msa, deletion_matrix, target_names, e_values)
138 |         if i != 0:
139 |           # Only take query from the first chunk
140 |           zipped_results = [x for x in zipped_results if x[2] != 'query']
141 |         unsorted_results.extend(zipped_results)
142 |       sorted_by_evalue = sorted(unsorted_results, key=lambda x: x[3])
143 |       db_msas, db_deletion_matrices, db_names, _ = zip(*sorted_by_evalue)
144 |       if db_msas:
145 |         if db_name == 'mgnify':
146 |           db_msas = db_msas[:mgnify_max_hits]
147 |           db_deletion_matrices = db_deletion_matrices[:mgnify_max_hits]
148 |           db_names = db_names[:mgnify_max_hits]
149 |         msas.append(db_msas)
150 |         deletion_matrices.append(db_deletion_matrices)
151 |         names.append(db_names)
152 |         msa_size = len(set(db_msas))
153 |         print(f'{msa_size} Sequences Found in {db_name}')
154 | 
155 |       pickle.dump({"msas":msas,
156 |                    "deletion_matrices":deletion_matrices,
157 |                    "names":names}, open(pickled_msa_path,"wb"))
158 |   return msas, deletion_matrices, names
159 | 
160 | import re
161 | 
162 | # define sequence
163 | sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK' #@param {type:"string"}
164 | sequence = re.sub("[^A-Z:/]", "", sequence.upper())
165 | sequence = re.sub(":+",":",sequence)
166 | sequence = re.sub("/+","/",sequence)
167 | sequence = re.sub("^[:/]+","",sequence)
168 | sequence = re.sub("[:/]+$","",sequence)
169 | 
170 | jobname = "test" #@param {type:"string"}
171 | jobname = re.sub(r'\W+', '', jobname)
172 | 
173 | # define number of copies
174 | homooligomer = "1" #@param {type:"string"}
175 | homooligomer = re.sub("[:/]+",":",homooligomer)
176 | homooligomer = re.sub("^[:/]+","",homooligomer)
177 | homooligomer = re.sub("[:/]+$","",homooligomer)
178 | 
179 | if len(homooligomer) == 0: homooligomer = "1"
180 | homooligomer = re.sub("[^0-9:]", "", homooligomer)
181 | homooligomers = [int(h) for h in homooligomer.split(":")]
182 | 
183 | #@markdown - `sequence` Specify protein sequence to be modelled.
184 | #@markdown  - Use `/` to specify intra-protein chainbreaks (for trimming regions within protein).
185 | #@markdown  - Use `:` to specify inter-protein chainbreaks (for modeling protein-protein hetero-complexes).
186 | #@markdown  - For example, sequence `AC/DE:FGH` will be modelled as polypeptides: `AC`, `DE` and `FGH`. A separate MSA will be generates for `ACDE` and `FGH`.
187 | #@markdown    If `pair_msa` is enabled, `ACDE`'s MSA will be paired with `FGH`'s MSA.
188 | #@markdown - `homooligomer` Define number of copies in a homo-oligomeric assembly.
189 | #@markdown  - Use `:` to specify different homooligomeric state (copy numer) for each component of the complex.
190 | #@markdown  - For example, **sequence:**`ABC:DEF`, **homooligomer:** `2:1`, the first protein `ABC` will be modeled as a homodimer (2 copies) and second `DEF` a monomer (1 copy).
191 | 
192 | ori_sequence = sequence
193 | sequence = sequence.replace("/","").replace(":","")
194 | seqs = ori_sequence.replace("/","").split(":")
195 | 
196 | if len(seqs) != len(homooligomers):
197 |   if len(homooligomers) == 1:
198 |     homooligomers = [homooligomers[0]] * len(seqs)
199 |     homooligomer = ":".join([str(h) for h in homooligomers])
200 |   else:
201 |     while len(seqs) > len(homooligomers):
202 |       homooligomers.append(1)
203 |     homooligomers = homooligomers[:len(seqs)]
204 |     homooligomer = ":".join([str(h) for h in homooligomers])
205 |     print("WARNING: Mismatch between number of breaks ':' in 'sequence' and 'homooligomer' definition")
206 | 
207 | full_sequence = "".join([s*h for s,h in zip(seqs,homooligomers)])
208 | 
209 | # prediction directory
210 | output_dir = 'prediction_' + jobname + '_' + cf.get_hash(full_sequence)[:5]
211 | os.makedirs(output_dir, exist_ok=True)
212 | # delete existing files in working directory
213 | for f in os.listdir(output_dir):
214 |   os.remove(os.path.join(output_dir, f))
215 | 
216 | MIN_SEQUENCE_LENGTH = 16
217 | MAX_SEQUENCE_LENGTH = 2500
218 | 
219 | aatypes = set('ACDEFGHIKLMNPQRSTVWY')  # 20 standard aatypes
220 | if not set(full_sequence).issubset(aatypes):
221 |   raise Exception(f'Input sequence contains non-amino acid letters: {set(sequence) - aatypes}. AlphaFold only supports 20 standard amino acids as inputs.')
222 | if len(full_sequence) < MIN_SEQUENCE_LENGTH:
223 |   raise Exception(f'Input sequence is too short: {len(full_sequence)} amino acids, while the minimum is {MIN_SEQUENCE_LENGTH}')
224 | if len(full_sequence) > MAX_SEQUENCE_LENGTH:
225 |   raise Exception(f'Input sequence is too long: {len(full_sequence)} amino acids, while the maximum is {MAX_SEQUENCE_LENGTH}. Please use the full AlphaFold system for long sequences.')
226 | 
227 | if len(full_sequence) > 1400:
228 |   print(f"WARNING: For a typical Google-Colab-GPU (16G) session, the max total length is ~1400 residues. You are at {len(full_sequence)}! Run Alphafold may crash.")
229 | 
230 | print(f"homooligomer: '{homooligomer}'")
231 | print(f"total_length: '{len(full_sequence)}'")
232 | print(f"working_directory: '{output_dir}'")
233 | #%%
234 | TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'
235 | #@markdown Once this cell has been executed, you will see
236 | #@markdown statistics about the multiple sequence alignment
237 | #@markdown (MSA) that will be used by AlphaFold. In particular,
238 | #@markdown you’ll see how well each residue is covered by similar
239 | #@markdown sequences in the MSA.
240 | #@markdown (Note that the search against databases and the actual prediction can take some time, from minutes to hours, depending on the length of the protein and what type of GPU you are allocated by Colab.)
241 | 
242 | #@markdown ---
243 | msa_method = "mmseqs2" #@param ["mmseqs2","jackhmmer","single_sequence","precomputed"]
244 | #@markdown - `mmseqs2` - FAST method from [ColabFold](https://github.com/sokrypton/ColabFold)
245 | #@markdown - `jackhmmer` - default method from Deepmind (SLOW, but may find more/less sequences).
246 | #@markdown - `single_sequence` - use single sequence input
247 | #@markdown - `precomputed` If you have previously run this notebook and saved the results,
248 | #@markdown you can skip this step by uploading
249 | #@markdown the previously generated  `prediction_?????/msa.pickle`
250 | 
251 | 
252 | #@markdown ---
253 | #@markdown **custom msa options**
254 | add_custom_msa = False #@param {type:"boolean"}
255 | msa_format = "fas" #@param ["fas","a2m","a3m","sto","psi","clu"]
256 | #@markdown - `add_custom_msa` - If enabled, you'll get an option to upload your custom MSA in the specified `msa_format`. Note: Your MSA will be supplemented with those from 'mmseqs2' or 'jackhmmer', unless `msa_method` is set to 'single_sequence'.
257 | 
258 | #@markdown ---
259 | #@markdown **pair msa options**
260 | 
261 | #@markdown Experimental option for protein complexes. Pairing currently only supported for proteins in same operon (prokaryotic genomes).
262 | pair_mode = "unpaired" #@param ["unpaired","unpaired+paired","paired"] {type:"string"}
263 | #@markdown - `unpaired` - generate separate MSA for each protein.
264 | #@markdown - `unpaired+paired` - attempt to pair sequences from the same operon within the genome.
265 | #@markdown - `paired` - only use sequences that were successfully paired.
266 | 
267 | #@markdown Options to prefilter each MSA before pairing. (It might help if there are any paralogs in the complex.)
268 | pair_cov = 50 #@param [0,25,50,75,90] {type:"raw"}
269 | pair_qid = 20 #@param [0,15,20,30,40,50] {type:"raw"}
270 | #@markdown - `pair_cov` prefilter each MSA to minimum coverage with query (%) before pairing.
271 | #@markdown - `pair_qid` prefilter each MSA to minimum sequence identity with query (%) before pairing.
272 | 
273 | # --- Search against genetic databases ---
274 | os.makedirs('tmp', exist_ok=True)
275 | msas, deletion_matrices = [],[]
276 | 
277 | if add_custom_msa:
278 |   print(f"upload custom msa in '{msa_format}' format")
279 |   msa_dict = files.upload()
280 |   lines = msa_dict[list(msa_dict.keys())[0]].decode()
281 | 
282 |   # convert to a3m
283 |   with open(f"tmp/upload.{msa_format}","w") as tmp_upload:
284 |     tmp_upload.write(lines)
285 |   os.system(f"reformat.pl {msa_format} a3m tmp/upload.{msa_format} tmp/upload.a3m")
286 |   a3m_lines = open("tmp/upload.a3m","r").read()
287 | 
288 |   # parse
289 |   msa, mtx = parsers.parse_a3m(a3m_lines)
290 |   msas.append(msa)
291 |   deletion_matrices.append(mtx)
292 | 
293 |   if len(msas[0][0]) != len(sequence):
294 |     raise ValueError("ERROR: the length of msa does not match input sequence")
295 | 
296 | if msa_method == "precomputed":
297 |   print("upload precomputed pickled msa from previous run")
298 |   pickled_msa_dict = files.upload()
299 |   msas_dict = pickle.loads(pickled_msa_dict[list(pickled_msa_dict.keys())[0]])
300 |   msas, deletion_matrices = (msas_dict[k] for k in ['msas', 'deletion_matrices'])
301 | 
302 | elif msa_method == "single_sequence":
303 |   if len(msas) == 0:
304 |     msas.append([sequence])
305 |     deletion_matrices.append([[0]*len(sequence)])
306 | 
307 | else:
308 |   seqs = ori_sequence.replace('/','').split(':')
309 |   _blank_seq = ["-" * len(seq) for seq in seqs]
310 |   _blank_mtx = [[0] * len(seq) for seq in seqs]
311 |   def _pad(ns,vals,mode):
312 |     if mode == "seq": _blank = _blank_seq.copy()
313 |     if mode == "mtx": _blank = _blank_mtx.copy()
314 |     if isinstance(ns, list):
315 |       for n,val in zip(ns,vals): _blank[n] = val
316 |     else: _blank[ns] = vals
317 |     if mode == "seq": return "".join(_blank)
318 |     if mode == "mtx": return sum(_blank,[])
319 | 
320 |   if len(seqs) == 1 or "unpaired" in pair_mode:
321 |     # gather msas
322 |     if msa_method == "mmseqs2":
323 |       prefix = cf.get_hash("".join(seqs))
324 |       prefix = os.path.join('tmp',prefix)
325 |       print(f"running mmseqs2")
326 |       A3M_LINES = cf.run_mmseqs2(seqs, prefix, filter=True)
327 | 
328 |     for n, seq in enumerate(seqs):
329 |       # tmp directory
330 |       prefix = cf.get_hash(seq)
331 |       prefix = os.path.join('tmp',prefix)
332 | 
333 |       if msa_method == "mmseqs2":
334 |         # run mmseqs2
335 |         a3m_lines = A3M_LINES[n]
336 |         msa, mtx = parsers.parse_a3m(a3m_lines)
337 |         msas_, mtxs_ = [msa],[mtx]
338 | 
339 |       elif msa_method == "jackhmmer":
340 |         print(f"running jackhmmer on seq_{n}")
341 |         # run jackhmmer
342 |         msas_, mtxs_, names_ = ([sum(x,())] for x in run_jackhmmer(seq, prefix))
343 | 
344 |       # pad sequences
345 |       for msa_,mtx_ in zip(msas_,mtxs_):
346 |         msa,mtx = [sequence],[[0]*len(sequence)]
347 |         for s,m in zip(msa_,mtx_):
348 |           msa.append(_pad(n,s,"seq"))
349 |           mtx.append(_pad(n,m,"mtx"))
350 | 
351 |         msas.append(msa)
352 |         deletion_matrices.append(mtx)
353 | 
354 |   ####################################################################################
355 |   # PAIR_MSA
356 |   ####################################################################################
357 | 
358 |   if len(seqs) > 1 and (pair_mode == "paired" or pair_mode == "unpaired+paired"):
359 |     print("attempting to pair some sequences...")
360 | 
361 |     if msa_method == "mmseqs2":
362 |       prefix = cf.get_hash("".join(seqs))
363 |       prefix = os.path.join('tmp',prefix)
364 |       print(f"running mmseqs2_noenv_nofilter on all seqs")
365 |       A3M_LINES = cf.run_mmseqs2(seqs, prefix, use_env=False, use_filter=False)
366 | 
367 |     _data = []
368 |     for a in range(len(seqs)):
369 |       print(f"prepping seq_{a}")
370 |       _seq = seqs[a]
371 |       _prefix = os.path.join('tmp',cf.get_hash(_seq))
372 | 
373 |       if msa_method == "mmseqs2":
374 |         a3m_lines = A3M_LINES[a]
375 |         _msa, _mtx, _lab = pairmsa.parse_a3m(a3m_lines,
376 |                                              filter_qid=pair_qid/100,
377 |                                              filter_cov=pair_cov/100)
378 | 
379 |       elif msa_method == "jackhmmer":
380 |         _msas, _mtxs, _names = run_jackhmmer(_seq, _prefix)
381 |         _msa, _mtx, _lab = pairmsa.get_uni_jackhmmer(_msas[0], _mtxs[0], _names[0],
382 |                                                      filter_qid=pair_qid/100,
383 |                                                      filter_cov=pair_cov/100)
384 | 
385 |       if len(_msa) > 1:
386 |         _data.append(pairmsa.hash_it(_msa, _lab, _mtx, call_uniprot=False))
387 |       else:
388 |         _data.append(None)
389 | 
390 |     Ln = len(seqs)
391 |     O = [[None for _ in seqs] for _ in seqs]
392 |     for a in range(Ln):
393 |       if _data[a] is not None:
394 |         for b in range(a+1,Ln):
395 |           if _data[b] is not None:
396 |             print(f"attempting pairwise stitch for {a} {b}")
397 |             O[a][b] = pairmsa._stitch(_data[a],_data[b])
398 |             _seq_a, _seq_b, _mtx_a, _mtx_b = (*O[a][b]["seq"],*O[a][b]["mtx"])
399 | 
400 |             ##############################################
401 |             # filter to remove redundant sequences
402 |             ##############################################
403 |             ok = []
404 |             with open("tmp/tmp.fas","w") as fas_file:
405 |               fas_file.writelines([f">{n}\n{a+b}\n" for n,(a,b) in enumerate(zip(_seq_a,_seq_b))])
406 |             os.system("hhfilter -maxseq 1000000 -i tmp/tmp.fas -o tmp/tmp.id90.fas -id 90")
407 |             for line in open("tmp/tmp.id90.fas","r"):
408 |               if line.startswith(">"): ok.append(int(line[1:]))
409 |             ##############################################
410 |             print(f"found {len(_seq_a)} pairs ({len(ok)} after filtering)")
411 | 
412 |             if len(_seq_a) > 0:
413 |               msa,mtx = [sequence],[[0]*len(sequence)]
414 |               for s_a,s_b,m_a,m_b in zip(_seq_a, _seq_b, _mtx_a, _mtx_b):
415 |                 msa.append(_pad([a,b],[s_a,s_b],"seq"))
416 |                 mtx.append(_pad([a,b],[m_a,m_b],"mtx"))
417 |               msas.append(msa)
418 |               deletion_matrices.append(mtx)
419 | 
420 |     '''
421 |     # triwise stitching (WIP)
422 |     if Ln > 2:
423 |       for a in range(Ln):
424 |         for b in range(a+1,Ln):
425 |           for c in range(b+1,Ln):
426 |             if O[a][b] is not None and O[b][c] is not None:
427 |               print(f"attempting triwise stitch for {a} {b} {c}")
428 |               list_ab = O[a][b]["lab"][1]
429 |               list_bc = O[b][c]["lab"][0]
430 |               msa,mtx = [sequence],[[0]*len(sequence)]
431 |               for i,l_b in enumerate(list_ab):
432 |                 if l_b in list_bc:
433 |                   j = list_bc.index(l_b)
434 |                   s_a = O[a][b]["seq"][0][i]
435 |                   s_b = O[a][b]["seq"][1][i]
436 |                   s_c = O[b][c]["seq"][1][j]
437 | 
438 |                   m_a = O[a][b]["mtx"][0][i]
439 |                   m_b = O[a][b]["mtx"][1][i]
440 |                   m_c = O[b][c]["mtx"][1][j]
441 | 
442 |                   msa.append(_pad([a,b,c],[s_a,s_b,s_c],"seq"))
443 |                   mtx.append(_pad([a,b,c],[m_a,m_b,m_c],"mtx"))
444 |               if len(msa) > 1:
445 |                 msas.append(msa)
446 |                 deletion_matrices.append(mtx)
447 |                 print(f"found {len(msa)} triplets")
448 |     '''
449 | ####################################################################################
450 | ####################################################################################
451 | 
452 | # save MSA as pickle
453 | pickle.dump({"msas":msas,"deletion_matrices":deletion_matrices},
454 |             open(os.path.join(output_dir,"msa.pickle"),"wb"))
455 | 
456 | make_msa_plot = len(msas[0]) > 1
457 | if make_msa_plot:
458 |   plt = cf.plot_msas(msas, ori_sequence)
459 |   plt.savefig(os.path.join(output_dir,"msa_coverage.png"), bbox_inches = 'tight', dpi=300)
460 | #%%
461 | #@title run alphafold
462 | num_relax = "None"
463 | rank_by = "pLDDT" #@param ["pLDDT","pTMscore"]
464 | use_turbo = True #@param {type:"boolean"}
465 | max_msa = "512:1024" #@param ["512:1024", "256:512", "128:256", "64:128", "32:64"]
466 | max_msa_clusters, max_extra_msa = [int(x) for x in max_msa.split(":")]
467 | 
468 | 
469 | 
470 | #@markdown - `rank_by` specify metric to use for ranking models (For protein-protein complexes, we recommend pTMscore)
471 | #@markdown - `use_turbo` introduces a few modifications (compile once, swap params, adjust max_msa) to speedup and reduce memory requirements. Disable for default behavior.
472 | #@markdown - `max_msa` defines: `max_msa_clusters:max_extra_msa` number of sequences to use. When adjusting after GPU crash, be sure to `Runtime` → `Restart runtime`. (Lowering will reduce GPU requirements, but may result in poor model quality. This option ignored if `use_turbo` is disabled)
473 | show_images = True #@param {type:"boolean"}
474 | #@markdown - `show_images` To make things more exciting we show images of the predicted structures as they are being generated. (WARNING: the order of images displayed does not reflect any ranking).
475 | #@markdown ---
476 | #@markdown #### Sampling options
477 | #@markdown There are two stochastic parts of the pipeline. Within the feature generation (choice of cluster centers) and within the model (dropout).
478 | #@markdown To get structure diversity, you can iterate through a fixed number of random_seeds (using `num_samples`) and/or enable dropout (using `is_training`).
479 | 
480 | num_models = 5 #@param [1,2,3,4,5] {type:"raw"}
481 | use_ptm = True #@param {type:"boolean"}
482 | num_ensemble = 1 #@param [1,8] {type:"raw"}
483 | max_recycles = 3 #@param [1,3,6,12,24,48] {type:"raw"}
484 | tol = 0 #@param [0,0.1,0.5,1] {type:"raw"}
485 | is_training = False #@param {type:"boolean"}
486 | num_samples = 1 #@param [1,2,4,8,16,32] {type:"raw"}
487 | 
488 | subsample_msa = True #@param {type:"boolean"}
489 | #@markdown - `subsample_msa` subsample large MSA to `3E7/length` sequences to avoid crashing the preprocessing protocol. (This option ignored if `use_turbo` is disabled.)
490 | 
491 | save_pae_json = True
492 | save_tmp_pdb = True
493 | 
494 | 
495 | if use_ptm == False and rank_by == "pTMscore":
496 |   print("WARNING: models will be ranked by pLDDT, 'use_ptm' is needed to compute pTMscore")
497 |   rank_by = "pLDDT"
498 | 
499 | #############################
500 | # delete old files
501 | #############################
502 | for f in os.listdir(output_dir):
503 |   if "rank_" in f:
504 |     os.remove(os.path.join(output_dir, f))
505 | 
506 | #############################
507 | # homooligomerize
508 | #############################
509 | lengths = [len(seq) for seq in seqs]
510 | msas_mod, deletion_matrices_mod = cf.homooligomerize_heterooligomer(msas, deletion_matrices,
511 |                                                                     lengths, homooligomers)
512 | #############################
513 | # define input features
514 | #############################
515 | def _placeholder_template_feats(num_templates_, num_res_):
516 |   return {
517 |       'template_aatype': np.zeros([num_templates_, num_res_, 22], np.float32),
518 |       'template_all_atom_masks': np.zeros([num_templates_, num_res_, 37, 3], np.float32),
519 |       'template_all_atom_positions': np.zeros([num_templates_, num_res_, 37], np.float32),
520 |       'template_domain_names': np.zeros([num_templates_], np.float32),
521 |       'template_sum_probs': np.zeros([num_templates_], np.float32),
522 |   }
523 | 
524 | num_res = len(full_sequence)
525 | feature_dict = {}
526 | feature_dict.update(pipeline.make_sequence_features(full_sequence, 'test', num_res))
527 | feature_dict.update(pipeline.make_msa_features(msas_mod, deletion_matrices=deletion_matrices_mod))
528 | if not use_turbo:
529 |   feature_dict.update(_placeholder_template_feats(0, num_res))
530 | 
531 | def do_subsample_msa(F, random_seed=0):
532 |   '''subsample msa to avoid running out of memory'''
533 |   N = len(F["msa"])
534 |   L = len(F["residue_index"])
535 |   N_ = int(3E7/L)
536 |   if N > N_:
537 |     print(f"whhhaaa... too many sequences ({N}) subsampling to {N_}")
538 |     np.random.seed(random_seed)
539 |     idx = np.append(0,np.random.permutation(np.arange(1,N)))[:N_]
540 |     F_ = {}
541 |     F_["msa"] = F["msa"][idx]
542 |     F_["deletion_matrix_int"] = F["deletion_matrix_int"][idx]
543 |     F_["num_alignments"] = np.full_like(F["num_alignments"],N_)
544 |     for k in ['aatype', 'between_segment_residues',
545 |               'domain_name', 'residue_index',
546 |               'seq_length', 'sequence']:
547 |               F_[k] = F[k]
548 |     return F_
549 |   else:
550 |     return F
551 | 
552 | ################################
553 | # set chain breaks
554 | ################################
555 | Ls = []
556 | for seq,h in zip(ori_sequence.split(":"),homooligomers):
557 |   Ls += [len(s) for s in seq.split("/")] * h
558 | Ls_plot = sum([[len(seq)]*h for seq,h in zip(seqs,homooligomers)],[])
559 | feature_dict['residue_index'] = cf.chain_break(feature_dict['residue_index'], Ls)
560 | 
561 | ###########################
562 | # run alphafold
563 | ###########################
564 | def parse_results(prediction_result, processed_feature_dict):
565 |   b_factors = prediction_result['plddt'][:,None] * prediction_result['structure_module']['final_atom_mask']
566 |   dist_bins = jax.numpy.append(0,prediction_result["distogram"]["bin_edges"])
567 |   dist_mtx = dist_bins[prediction_result["distogram"]["logits"].argmax(-1)]
568 |   contact_mtx = jax.nn.softmax(prediction_result["distogram"]["logits"])[:,:,dist_bins < 8].sum(-1)
569 | 
570 |   out = {"unrelaxed_protein": protein.from_prediction(processed_feature_dict, prediction_result, b_factors=b_factors),
571 |          "plddt": prediction_result['plddt'],
572 |          "pLDDT": prediction_result['plddt'].mean(),
573 |          "dists": dist_mtx,
574 |          "adj": contact_mtx}
575 | 
576 |   if "ptm" in prediction_result:
577 |     out.update({"pae": prediction_result['predicted_aligned_error'],
578 |                 "pTMscore": prediction_result['ptm']})
579 |   return out
580 | 
581 | model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'][:num_models]
582 | total = len(model_names) * num_samples
583 | with tqdm.notebook.tqdm(total=total, bar_format=TQDM_BAR_FORMAT) as pbar:
584 |   #######################################################################
585 |   # precompile model and recompile only if length changes
586 |   #######################################################################
587 |   if use_turbo:
588 |     name = "model_5_ptm" if use_ptm else "model_5"
589 |     N = len(feature_dict["msa"])
590 |     L = len(feature_dict["residue_index"])
591 |     compiled = (N, L, use_ptm, max_recycles, tol, num_ensemble, max_msa, is_training)
592 |     if "COMPILED" in dir():
593 |       if COMPILED != compiled: recompile = True
594 |     else: recompile = True
595 |     if recompile:
596 |       cf.clear_mem(device)
597 |       cfg = config.model_config(name)
598 | 
599 |       # set size of msa (to reduce memory requirements)
600 |       msa_clusters = min(N, max_msa_clusters)
601 |       cfg.data.eval.max_msa_clusters = msa_clusters
602 |       cfg.data.common.max_extra_msa = max(min(N-msa_clusters,max_extra_msa),1)
603 | 
604 |       cfg.data.common.num_recycle = max_recycles
605 |       cfg.model.num_recycle = max_recycles
606 |       cfg.model.recycle_tol = tol
607 |       cfg.data.eval.num_ensemble = num_ensemble
608 | 
609 |       params = data.get_model_haiku_params(name,'./alphafold/data')
610 |       model_runner = model.RunModel(cfg, params, is_training=is_training)
611 |       COMPILED = compiled
612 |       recompile = False
613 | 
614 |   else:
615 |     cf.clear_mem(device)
616 |     recompile = True
617 | 
618 |   # cleanup
619 |   if "outs" in dir(): del outs
620 |   outs = {}
621 |   cf.clear_mem("cpu")
622 | 
623 |   #######################################################################
624 |   def report(key):
625 |     pbar.update(n=1)
626 |     o = outs[key]
627 |     line = f"{key} recycles:{o['recycles']} tol:{o['tol']:.2f} pLDDT:{o['pLDDT']:.2f}"
628 |     if use_ptm: line += f" pTMscore:{o['pTMscore']:.2f}"
629 |     print(line)
630 |     if show_images:
631 |       fig = cf.plot_protein(o['unrelaxed_protein'], Ls=Ls_plot, dpi=100)
632 |       # plt.show()
633 |       plt.ion()
634 |     if save_tmp_pdb:
635 |       tmp_pdb_path = os.path.join(output_dir,f'unranked_{key}_unrelaxed.pdb')
636 |       pdb_lines = protein.to_pdb(o['unrelaxed_protein'])
637 |       with open(tmp_pdb_path, 'w') as f: f.write(pdb_lines)
638 | 
639 |   if use_turbo:
640 |     # go through each random_seed
641 |     for seed in range(num_samples):
642 | 
643 |       # prep input features
644 |       if subsample_msa:
645 |         sampled_feats_dict = do_subsample_msa(feature_dict, random_seed=seed)
646 |         processed_feature_dict = model_runner.process_features(sampled_feats_dict, random_seed=seed)
647 |       else:
648 |         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed)
649 | 
650 |       # go through each model
651 |       for num, model_name in enumerate(model_names):
652 |         name = model_name+"_ptm" if use_ptm else model_name
653 |         key = f"{name}_seed_{seed}"
654 |         pbar.set_description(f'Running {key}')
655 | 
656 |         # replace model parameters
657 |         params = data.get_model_haiku_params(name, './alphafold/data')
658 |         for k in model_runner.params.keys():
659 |           model_runner.params[k] = params[k]
660 | 
661 |         # predict
662 |         prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
663 | 
664 |         # save results
665 |         outs[key] = parse_results(prediction_result, processed_feature_dict)
666 |         outs[key].update({"recycles":r, "tol":t})
667 |         report(key)
668 | 
669 |         del prediction_result, params
670 |       del sampled_feats_dict, processed_feature_dict
671 | 
672 |   else:
673 |     # go through each model
674 |     for num, model_name in enumerate(model_names):
675 |       name = model_name+"_ptm" if use_ptm else model_name
676 |       params = data.get_model_haiku_params(name, './alphafold/data')
677 |       cfg = config.model_config(name)
678 |       cfg.data.common.num_recycle = cfg.model.num_recycle = max_recycles
679 |       cfg.model.recycle_tol = tol
680 |       cfg.data.eval.num_ensemble = num_ensemble
681 |       model_runner = model.RunModel(cfg, params, is_training=is_training)
682 | 
683 |       # go through each random_seed
684 |       for seed in range(num_samples):
685 |         key = f"{name}_seed_{seed}"
686 |         pbar.set_description(f'Running {key}')
687 |         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed)
688 |         prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
689 |         outs[key] = parse_results(prediction_result, processed_feature_dict)
690 |         outs[key].update({"recycles":r, "tol":t})
691 |         report(key)
692 | 
693 |         # cleanup
694 |         del processed_feature_dict, prediction_result
695 | 
696 |       del params, model_runner, cfg
697 |       cf.clear_mem("gpu")
698 | 
699 |   # delete old files
700 |   for f in os.listdir(output_dir):
701 |     if "rank" in f:
702 |       os.remove(os.path.join(output_dir, f))
703 | 
704 |   # Find the best model according to the mean pLDDT.
705 |   model_rank = list(outs.keys())
706 |   model_rank = [model_rank[i] for i in np.argsort([outs[x][rank_by] for x in model_rank])[::-1]]
707 | 
708 |   # Write out the prediction
709 |   for n,key in enumerate(model_rank):
710 |     prefix = f"rank_{n+1}_{key}"
711 |     pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb')
712 |     fig = cf.plot_protein(outs[key]["unrelaxed_protein"], Ls=Ls_plot, dpi=200)
713 |     plt.savefig(os.path.join(output_dir,f'{prefix}.png'), bbox_inches = 'tight')
714 |     plt.close(fig)
715 | 
716 |     pdb_lines = protein.to_pdb(outs[key]["unrelaxed_protein"])
717 |     with open(pred_output_path, 'w') as f:
718 |       f.write(pdb_lines)
719 | 
720 | ############################################################
721 | print(f"model rank based on {rank_by}")
722 | for n,key in enumerate(model_rank):
723 |   print(f"rank_{n+1}_{key} {rank_by}:{outs[key][rank_by]:.2f}")
724 | #%%
725 | #@title Refine structures with Amber-Relax (Optional)
726 | num_relax = "None" #@param ["None", "Top1", "Top5", "All"] {type:"string"}
727 | if num_relax == "None":
728 |   num_relax = 0
729 | elif num_relax == "Top1":
730 |   num_relax = 1
731 | elif num_relax == "Top5":
732 |   num_relax = 5
733 | else:
734 |   num_relax = len(model_names) * num_samples
735 | 
736 | if num_relax > 0:
737 |   if "relax" not in dir():
738 |     # add conda environment to path
739 |     sys.path.append('./colabfold-conda/lib/python3.7/site-packages')
740 | 
741 |     # import libraries
742 |     from alphafold.relax import relax
743 |     from alphafold.relax import utils
744 | 
745 |   with tqdm.notebook.tqdm(total=num_relax, bar_format=TQDM_BAR_FORMAT) as pbar:
746 |     pbar.set_description(f'AMBER relaxation')
747 |     for n,key in enumerate(model_rank):
748 |       if n < num_relax:
749 |         prefix = f"rank_{n+1}_{key}"
750 |         pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb')
751 |         if not os.path.isfile(pred_output_path):
752 |           amber_relaxer = relax.AmberRelaxation(
753 |               max_iterations=0,
754 |               tolerance=2.39,
755 |               stiffness=10.0,
756 |               exclude_residues=[],
757 |               max_outer_iterations=20)
758 |           relaxed_pdb_lines, _, _ = amber_relaxer.process(prot=outs[key]["unrelaxed_protein"])
759 |           with open(pred_output_path, 'w') as f:
760 |             f.write(relaxed_pdb_lines)
761 |         pbar.update(n=1)
762 | #%%
763 | #@title Display 3D structure {run: "auto"}
764 | rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"}
765 | color = "lDDT" #@param ["chain", "lDDT", "rainbow"]
766 | show_sidechains = False #@param {type:"boolean"}
767 | show_mainchains = False #@param {type:"boolean"}
768 | 
769 | key = model_rank[rank_num-1]
770 | prefix = f"rank_{rank_num}_{key}"
771 | pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb')
772 | if not os.path.isfile(pred_output_path):
773 |   pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb')
774 | 
775 | cf.show_pdb(pred_output_path, show_sidechains, show_mainchains, color, Ls=Ls_plot).show()
776 | if color == "lDDT": cf.plot_plddt_legend().show()
777 | if use_ptm:
778 |   cf.plot_confidence(outs[key]["plddt"], outs[key]["pae"], Ls=Ls_plot).show()
779 | else:
780 |   cf.plot_confidence(outs[key]["plddt"], Ls=Ls_plot).show()
781 | #%%
782 | #@title Extra outputs
783 | dpi =  300#@param {type:"integer"}
784 | save_to_txt = True #@param {type:"boolean"}
785 | save_pae_json = True #@param {type:"boolean"}
786 | #@markdown - save data used to generate contact and distogram plots below to text file (pae values can be found in json file if `use_ptm` is enabled)
787 | 
788 | if use_ptm:
789 |   print("predicted alignment error")
790 |   cf.plot_paes([outs[k]["pae"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
791 |   plt.savefig(os.path.join(output_dir,f'predicted_alignment_error.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
792 |   # plt.show()
793 | 
794 | print("predicted contacts")
795 | cf.plot_adjs([outs[k]["adj"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
796 | plt.savefig(os.path.join(output_dir,f'predicted_contacts.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
797 | # plt.show()
798 | 
799 | print("predicted distogram")
800 | cf.plot_dists([outs[k]["dists"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
801 | plt.savefig(os.path.join(output_dir,f'predicted_distogram.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
802 | # plt.show()
803 | 
804 | print("predicted LDDT")
805 | cf.plot_plddts([outs[k]["plddt"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
806 | plt.savefig(os.path.join(output_dir,f'predicted_LDDT.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
807 | # plt.show()
808 | 
809 | def do_save_to_txt(filename, adj, dists):
810 |   adj = np.asarray(adj)
811 |   dists = np.asarray(dists)
812 |   L = len(adj)
813 |   with open(filename,"w") as out:
814 |     out.write("i\tj\taa_i\taa_j\tp(cbcb<8)\tmaxdistbin\n")
815 |     for i in range(L):
816 |       for j in range(i+1,L):
817 |         if dists[i][j] < 21.68 or adj[i][j] >= 0.001:
818 |           line = f"{i+1}\t{j+1}\t{full_sequence[i]}\t{full_sequence[j]}\t{adj[i][j]:.3f}"
819 |           line += f"\t>{dists[i][j]:.2f}" if dists[i][j] == 21.6875 else f"\t{dists[i][j]:.2f}"
820 |           out.write(f"{line}\n")
821 | 
822 | for n,key in enumerate(model_rank):
823 |   if save_to_txt:
824 |     txt_filename = os.path.join(output_dir,f'rank_{n+1}_{key}.raw.txt')
825 |     do_save_to_txt(txt_filename,adj=outs[key]["adj"],dists=outs[key]["dists"])
826 | 
827 |   if use_ptm and save_pae_json:
828 |     pae = outs[key]["pae"]
829 |     max_pae = pae.max()
830 |     # Save pLDDT and predicted aligned error (if it exists)
831 |     pae_output_path = os.path.join(output_dir,f'rank_{n+1}_{key}_pae.json')
832 |     # Save predicted aligned error in the same format as the AF EMBL DB
833 |     rounded_errors = np.round(np.asarray(pae), decimals=1)
834 |     indices = np.indices((len(rounded_errors), len(rounded_errors))) + 1
835 |     indices_1 = indices[0].flatten().tolist()
836 |     indices_2 = indices[1].flatten().tolist()
837 |     pae_data = json.dumps([{
838 |         'residue1': indices_1,
839 |         'residue2': indices_2,
840 |         'distance': rounded_errors.flatten().tolist(),
841 |         'max_predicted_aligned_error': max_pae.item()
842 |     }],
843 |                           indent=None,
844 |                           separators=(',', ':'))
845 |     with open(pae_output_path, 'w') as f:
846 |       f.write(pae_data)
847 | #%%


--------------------------------------------------------------------------------
/v1.0.0/runner_af2advanced.py:
--------------------------------------------------------------------------------
  1 | #%%
  2 | ## command-line arguments
  3 | import argparse
  4 | parser = argparse.ArgumentParser(description="Runner script that can take command-line arguments")
  5 | parser.add_argument("-i", "--input", help="Path to a FASTA file. Required.", required=True)
  6 | parser.add_argument("-o", "--output_dir", default="", type=str,
  7 |                     help="Path to a directory that will store the results. "
  8 |                     "The default name is 'prediction_<hash>'. ")
  9 | parser.add_argument("-ho", "--homooligomer", default="1", type=str,
 10 |                     help="homooligomer: Define number of copies in a homo-oligomeric assembly. "
 11 |                     "For example, sequence:ABC:DEF, homooligomer: 2:1, "
 12 |                     "the first protein ABC will be modeled as a homodimer (2 copies) and second DEF a monomer (1 copy). Default is 1.")
 13 | parser.add_argument("-m", "--msa_method", default="mmseqs2", type=str, choices=["mmseqs2", "single_sequence", "precomputed"],
 14 |                     help="Options to generate MSA."
 15 |                     "mmseqs2 - FAST method from ColabFold (default) "
 16 |                     "single_sequence - use single sequence input."
 17 |                     "precomputed - specify 'msa.pickle' file generated previously if you have."
 18 |                     "Default is 'mmseqs2'.")
 19 | parser.add_argument("--precomputed", default=None, type=str,
 20 |                     help="Specify the file path of a precomputed pickled msa from previous run. "
 21 |                     )
 22 | parser.add_argument("-p", "--pair_mode", default="unpaired", choices=["unpaired", "unpaired+paired", "paired"],
 23 |                     help="Experimental option for protein complexes. "
 24 |                     "Pairing currently only supported for proteins in same operon (prokaryotic genomes). "
 25 |                     "unpaired - generate separate MSA for each protein. (default) "
 26 |                     "unpaired+paired - attempt to pair sequences from the same operon within the genome. "
 27 |                     "paired - only use sequences that were successfully paired. "
 28 |                     "Default is 'unpaired'.")
 29 | parser.add_argument("-pc", "--pair_cov", default=50, type=int,
 30 |                     help="Options to prefilter each MSA before pairing. It might help if there are any paralogs in the complex. "
 31 |                     "prefilter each MSA to minimum coverage with query (%%) before pairing. "
 32 |                     "Default is 50.")
 33 | parser.add_argument("-pq", "--pair_qid", default=20, type=int,
 34 |                     help="Options to prefilter each MSA before pairing. It might help if there are any paralogs in the complex. "
 35 |                     "prefilter each MSA to minimum sequence identity with query (%%) before pairing. "
 36 |                     "Default is 20.")
 37 | parser.add_argument("-b", "--rank_by", default="pLDDT", type=str, choices=["pLDDT", "pTMscore"],
 38 |                     help="specify metric to use for ranking models (For protein-protein complexes, we recommend pTMscore). "
 39 |                     "Default is 'pLDDT'.")
 40 | parser.add_argument("-t", "--use_turbo", action='store_true',
 41 |                     help="introduces a few modifications (compile once, swap params, adjust max_msa) to speedup and reduce memory requirements. "
 42 |                     "Disable for default behavior.")
 43 | parser.add_argument("-mm", "--max_msa", default="512:1024", type=str,
 44 |                     help="max_msa defines: max_msa_clusters:max_extra_msa number of sequences to use. "
 45 |                     "This option ignored if use_turbo is disabled. Default is '512:1024'.")
 46 | parser.add_argument("-n", "--num_models", default=5, type=int, help="specify how many model params to try. (Default is 5)")
 47 | parser.add_argument("-pt", "--use_ptm", action='store_true',
 48 |                     help="uses Deepmind's ptm finetuned model parameters to get PAE per structure. "
 49 |                     "Disable to use the original model params. (Disabling may give alternative structures.)")
 50 | parser.add_argument("-e", "--num_ensemble", default=1, type=int, choices=[1, 8],
 51 |                     help="the trunk of the network is run multiple times with different random choices for the MSA cluster centers. "
 52 |                     "(1=default, 8=casp14 setting)")
 53 | parser.add_argument("-r", "--max_recycles", default=3, type=int, help="controls the maximum number of times the structure is fed back into the neural network for refinement. (default is 3)")
 54 | parser.add_argument("--tol", default=0, type=float, help="tolerance for deciding when to stop (CA-RMS between recycles)")
 55 | parser.add_argument("--is_training", action='store_true',
 56 |                     help="enables the stochastic part of the model (dropout), when coupled with num_samples can be used to 'sample' a diverse set of structures. False (NOT specifying this option) is recommended at first.")
 57 | parser.add_argument("--num_samples", default=1, type=int, help="number of random_seeds to try. Default is 1.")
 58 | parser.add_argument("--num_relax", default="None", choices=["None", "Top1", "Top5", "All"],
 59 |                     help="num_relax is 'None' (default), 'Top1', 'Top5' or 'All'. Specify how many of the top ranked structures to relax.")
 60 | args = parser.parse_args()
 61 | ## command-line arguments
 62 | ### Check your OS for localcolabfold
 63 | import platform
 64 | pf = platform.system()
 65 | if pf == 'Windows':
 66 |   print('ColabFold on Windows')
 67 | elif pf == 'Darwin':
 68 |   print('ColabFold on Mac')
 69 |   device="cpu"
 70 | elif pf == 'Linux':
 71 |   print('ColabFold on Linux')
 72 |   device="gpu"
 73 | #%%
 74 | ### python code of AlphaFold2_advanced.ipynb
 75 | import os
 76 | import tensorflow as tf
 77 | tf.config.set_visible_devices([], 'GPU')
 78 | 
 79 | import jax
 80 | 
 81 | from IPython.utils import io
 82 | import subprocess
 83 | import tqdm.notebook
 84 | 
 85 | # --- Python imports ---
 86 | import colabfold as cf
 87 | import colabfold_alphafold as cf_af
 88 | import pairmsa
 89 | import sys
 90 | import pickle
 91 | 
 92 | from urllib import request
 93 | from concurrent import futures
 94 | import json
 95 | from matplotlib import gridspec
 96 | import matplotlib.pyplot as plt
 97 | import numpy as np
 98 | 
 99 | TMP_DIR = "tmp"
100 | os.makedirs(TMP_DIR, exist_ok=True)
101 | 
102 | try:
103 |   from google.colab import files
104 |   IN_COLAB = True
105 | except:
106 |   IN_COLAB = False
107 | 
108 | #%%
109 | import re
110 | # define sequence
111 | # --read sequence from input file--
112 | from Bio import SeqIO
113 | 
114 | def readfastafile(fastafile):
115 |     records = list(SeqIO.parse(fastafile, "fasta"))
116 |     if(len(records) != 1):
117 |         raise ValueError('Input FASTA file must have a single ID/sequence.')
118 |     else:
119 |         return records[0].id, records[0].seq
120 | 
121 | 
122 | print("Input ID: {}".format(readfastafile(args.input)[0]))
123 | print("Input Sequence: {}".format(readfastafile(args.input)[1]))
124 | sequence = str(readfastafile(args.input)[1])
125 | # --read sequence from input file--
126 | jobname = "test" #@param {type:"string"}
127 | homooligomer = args.homooligomer #@param {type:"string"}
128 | 
129 | TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'
130 | 
131 | # prediction directory
132 | # --set the output directory from command-line arguments
133 | if args.output_dir != "":
134 |   output_dir = args.output_dir
135 | # --set the output directory from command-line arguments
136 | 
137 | I = cf_af.prep_inputs(sequence, jobname, homooligomer, output_dir, clean=IN_COLAB)
138 | 
139 | msa_method = args.msa_method #@param ["mmseqs2","single_sequence"]
140 | 
141 | if msa_method == "precomputed":
142 |   if args.precomputed is None:
143 |     raise ValueError("ERROR: `--precomputed` undefined. "
144 |                      "You must specify the file path of previously generated 'msa.pickle' if you set '--msa_method precomputed'.")
145 |   else:
146 |     precomputed = args.precomputed
147 |     print("Use precomputed msa.pickle: {}".format(precomputed))
148 | else:
149 |   precomputed = args.precomputed
150 | 
151 | add_custom_msa = False #@param {type:"boolean"}
152 | msa_format = "fas" #@param ["fas","a2m","a3m","sto","psi","clu"]
153 | 
154 | # --set the output directory from command-line arguments
155 | pair_mode = args.pair_mode #@param ["unpaired","unpaired+paired","paired"] {type:"string"}
156 | pair_cov = args.pair_cov #@param [0,25,50,75,90] {type:"raw"}
157 | pair_qid = args.pair_qid #@param [0,15,20,30,40,50] {type:"raw"}
158 | # --set the output directory from command-line arguments
159 | 
160 | # --- Search against genetic databases ---
161 | 
162 | I = cf_af.prep_msa(I, msa_method, add_custom_msa, msa_format, pair_mode, pair_cov, pair_qid,
163 |                    hhfilter_loc="colabfold-conda/bin/hhfilter", precomputed=precomputed, TMP_DIR=output_dir)
164 | mod_I = I
165 | 
166 | if len(I["msas"][0]) > 1:
167 |   plt = cf.plot_msas(I["msas"], I["ori_sequence"])
168 |   plt.savefig(os.path.join(I["output_dir"],"msa_coverage.png"), bbox_inches = 'tight', dpi=200)
169 |   # plt.show()
170 | #%%
171 | trim = "" #@param {type:"string"}
172 | trim_inverse = False #@param {type:"boolean"}
173 | cov = 0 #@param [0,25,50,75,90,95] {type:"raw"}
174 | qid = 0 #@param [0,15,20,25,30,40,50] {type:"raw"}
175 | 
176 | mod_I = cf_af.prep_filter(I, trim, trim_inverse, cov, qid)
177 | 
178 | if I["msas"] != mod_I["msas"]:
179 |   plt.figure(figsize=(16,5),dpi=100)
180 |   plt.subplot(1,2,1)
181 |   plt.title("Sequence coverage (Before)")
182 |   cf.plot_msas(I["msas"], I["ori_sequence"], return_plt=False)
183 |   plt.subplot(1,2,2)
184 |   plt.title("Sequence coverage (After)")
185 |   cf.plot_msas(mod_I["msas"], mod_I["ori_sequence"], return_plt=False)
186 |   plt.savefig(os.path.join(I["output_dir"],"msa_coverage.filtered.png"), bbox_inches = 'tight', dpi=200)
187 |   plt.show()
188 | 
189 | #%%
190 | ##@title run alphafold
191 | # --------set parameters from command-line arguments--------
192 | num_relax = args.num_relax
193 | rank_by = args.rank_by
194 | 
195 | use_turbo = True if args.use_turbo else False
196 | max_msa = args.max_msa
197 | # --------set parameters from command-line arguments--------
198 | 
199 | max_msa_clusters, max_extra_msa = [int(x) for x in max_msa.split(":")]
200 | 
201 | show_images = True #@param {type:"boolean"}
202 | 
203 | # --------set parameters from command-line arguments--------
204 | num_models = args.num_models
205 | use_ptm = True if args.use_ptm else False
206 | num_ensemble = args.num_ensemble
207 | max_recycles = args.max_recycles
208 | tol = args.tol
209 | is_training = True if args.is_training else False
210 | num_samples = args.num_samples
211 | # --------set parameters from command-line arguments--------
212 | 
213 | subsample_msa = True #@param {type:"boolean"}
214 | 
215 | if not use_ptm and rank_by == "pTMscore":
216 |   print("WARNING: models will be ranked by pLDDT, 'use_ptm' is needed to compute pTMscore")
217 |   rank_by = "pLDDT"
218 | 
219 | # prep input features
220 | feature_dict = cf_af.prep_feats(mod_I, clean=IN_COLAB)
221 | Ls_plot = feature_dict["Ls"]
222 | 
223 | # prep model options
224 | opt = {"N":len(feature_dict["msa"]),
225 |         "L":len(feature_dict["residue_index"]),
226 |         "use_ptm":use_ptm,
227 |         "use_turbo":use_turbo,
228 |         "max_recycles":max_recycles,
229 |         "tol":tol,
230 |         "num_ensemble":num_ensemble,
231 |         "max_msa_clusters":max_msa_clusters,
232 |         "max_extra_msa":max_extra_msa,
233 |         "is_training":is_training}
234 | 
235 | if use_turbo:
236 |   if "runner" in dir():
237 |     # only recompile if options changed
238 |     runner = cf_af.prep_model_runner(opt, old_runner=runner)
239 |   else:
240 |     runner = cf_af.prep_model_runner(opt)
241 | else:
242 |   runner = None
243 | 
244 | ###########################
245 | # run alphafold
246 | ###########################
247 | outs, model_rank = cf_af.run_alphafold(feature_dict, opt, runner, num_models, num_samples, subsample_msa,
248 |                                        rank_by=rank_by, show_images=show_images)
249 | 
250 | #%%
251 | #@title Refine structures with Amber-Relax (Optional)
252 | 
253 | # --------set parameters from command-line arguments--------
254 | num_relax = args.num_relax
255 | # --------set parameters from command-line arguments--------
256 | 
257 | if num_relax == "None":
258 |   num_relax = 0
259 | elif num_relax == "Top1":
260 |   num_relax = 1
261 | elif num_relax == "Top5":
262 |   num_relax = 5
263 | else:
264 |   model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'][:num_models]
265 |   num_relax = len(model_names) * num_samples
266 | 
267 | if num_relax > 0:
268 |   if "relax" not in dir():
269 |     # add conda environment to path
270 |     sys.path.append('./colabfold-conda/lib/python3.7/site-packages')
271 | 
272 |     # import libraries
273 |     from alphafold.relax import relax
274 |     from alphafold.relax import utils
275 | 
276 |   with tqdm.notebook.tqdm(total=num_relax, bar_format=TQDM_BAR_FORMAT) as pbar:
277 |     pbar.set_description(f'AMBER relaxation')
278 |     for n,key in enumerate(model_rank):
279 |       if n < num_relax:
280 |         prefix = f"rank_{n+1}_{key}"
281 |         pred_output_path = os.path.join(I["output_dir"],f'{prefix}_relaxed.pdb')
282 |         if not os.path.isfile(pred_output_path):
283 |           amber_relaxer = relax.AmberRelaxation(
284 |               max_iterations=0,
285 |               tolerance=2.39,
286 |               stiffness=10.0,
287 |               exclude_residues=[],
288 |               max_outer_iterations=20)
289 |           relaxed_pdb_lines, _, _ = amber_relaxer.process(prot=outs[key]["unrelaxed_protein"])
290 |           with open(pred_output_path, 'w') as f:
291 |             f.write(relaxed_pdb_lines)
292 |         pbar.update(n=1)
293 | #%%
294 | #@title Display 3D structure {run: "auto"}
295 | rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"}
296 | color = "lDDT" #@param ["chain", "lDDT", "rainbow"]
297 | show_sidechains = False #@param {type:"boolean"}
298 | show_mainchains = False #@param {type:"boolean"}
299 | 
300 | key = model_rank[rank_num-1]
301 | prefix = f"rank_{rank_num}_{key}"
302 | pred_output_path = os.path.join(I["output_dir"],f'{prefix}_relaxed.pdb')
303 | if not os.path.isfile(pred_output_path):
304 |   pred_output_path = os.path.join(I["output_dir"],f'{prefix}_unrelaxed.pdb')
305 | 
306 | cf.show_pdb(pred_output_path, show_sidechains, show_mainchains, color, Ls=Ls_plot).show()
307 | if color == "lDDT": cf.plot_plddt_legend().show()
308 | if use_ptm:
309 |   cf.plot_confidence(outs[key]["plddt"], outs[key]["pae"], Ls=Ls_plot).show()
310 | else:
311 |   cf.plot_confidence(outs[key]["plddt"], Ls=Ls_plot).show()
312 | #%%
313 | #@title Extra outputs
314 | dpi =  300#@param {type:"integer"}
315 | save_to_txt = True #@param {type:"boolean"}
316 | save_pae_json = True #@param {type:"boolean"}
317 | 
318 | if use_ptm:
319 |   print("predicted alignment error")
320 |   cf.plot_paes([outs[k]["pae"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
321 |   plt.savefig(os.path.join(I["output_dir"],f'predicted_alignment_error.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
322 |   plt.show()
323 | 
324 | print("predicted contacts")
325 | cf.plot_adjs([outs[k]["adj"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
326 | plt.savefig(os.path.join(I["output_dir"],f'predicted_contacts.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
327 | plt.show()
328 | 
329 | print("predicted distogram")
330 | cf.plot_dists([outs[k]["dists"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
331 | plt.savefig(os.path.join(I["output_dir"],f'predicted_distogram.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
332 | plt.show()
333 | 
334 | print("predicted LDDT")
335 | cf.plot_plddts([outs[k]["plddt"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
336 | plt.savefig(os.path.join(I["output_dir"],f'predicted_LDDT.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
337 | plt.show()
338 | 
339 | def do_save_to_txt(filename, adj, dists, sequence):
340 |   adj = np.asarray(adj)
341 |   dists = np.asarray(dists)
342 |   L = len(adj)
343 |   with open(filename,"w") as out:
344 |     out.write("i\tj\taa_i\taa_j\tp(cbcb<8)\tmaxdistbin\n")
345 |     for i in range(L):
346 |       for j in range(i+1,L):
347 |         if dists[i][j] < 21.68 or adj[i][j] >= 0.001:
348 |           line = f"{i}\t{j}\t{sequence[i]}\t{sequence[j]}\t{adj[i][j]:.3f}"
349 |           line += f"\t>{dists[i][j]:.2f}" if dists[i][j] == 21.6875 else f"\t{dists[i][j]:.2f}"
350 |           out.write(f"{line}\n")
351 | 
352 | for n,key in enumerate(model_rank):
353 |   if save_to_txt:
354 |     txt_filename = os.path.join(I["output_dir"],f'rank_{n+1}_{key}.raw.txt')
355 |     do_save_to_txt(txt_filename,
356 |                    outs[key]["adj"],
357 |                    outs[key]["dists"],
358 |                    mod_I["full_sequence"])
359 | 
360 |   if use_ptm and save_pae_json:
361 |     pae = outs[key]["pae"]
362 |     max_pae = pae.max()
363 |     # Save pLDDT and predicted aligned error (if it exists)
364 |     pae_output_path = os.path.join(I["output_dir"],f'rank_{n+1}_{key}_pae.json')
365 |     # Save predicted aligned error in the same format as the AF EMBL DB
366 |     rounded_errors = np.round(np.asarray(pae), decimals=1)
367 |     indices = np.indices((len(rounded_errors), len(rounded_errors))) + 1
368 |     indices_1 = indices[0].flatten().tolist()
369 |     indices_2 = indices[1].flatten().tolist()
370 |     pae_data = json.dumps([{
371 |         'residue1': indices_1,
372 |         'residue2': indices_2,
373 |         'distance': rounded_errors.flatten().tolist(),
374 |         'max_predicted_aligned_error': max_pae.item()
375 |     }],
376 |                           indent=None,
377 |                           separators=(',', ':'))
378 |     with open(pae_output_path, 'w') as f:
379 |       f.write(pae_data)
380 | #%%
381 | 


--------------------------------------------------------------------------------
/v1.0.0/runner_af2advanced_old.py:
--------------------------------------------------------------------------------
  1 | #%%
  2 | ## command-line arguments
  3 | import argparse
  4 | parser = argparse.ArgumentParser(description="Runner script that can take command-line arguments")
  5 | parser.add_argument("-i", "--input", help="Path to a FASTA file. Required.", required=True)
  6 | parser.add_argument("-o", "--output_dir", default="", type=str,
  7 |                     help="Path to a directory that will store the results. "
  8 |                     "The default name is 'prediction_<hash>'. ")
  9 | parser.add_argument("-h", "--homooligomer", default="1", type=str,
 10 |                     help="homooligomer: Define number of copies in a homo-oligomeric assembly. "
 11 |                     "For example, sequence:ABC:DEF, homooligomer: 2:1, "
 12 |                     "the first protein ABC will be modeled as a homodimer (2 copies) and second DEF a monomer (1 copy). Default is 1.")
 13 | parser.add_argument("-m", "--msa_method", default="mmseqs2", type=str, choices=["mmseqs2", "single_sequence"],
 14 |                     help="Options to generate MSA."
 15 |                     "mmseqs2 - FAST method from ColabFold (default) "
 16 |                     "single_sequence - use single sequence input."
 17 |                     "Default is 'mmseqs2'.")
 18 | parser.add_argument("-p", "--pair_mode", default="unpaired", choices=["unpaired", "unpaired+paired", "paired"],
 19 |                     help="Experimental option for protein complexes. "
 20 |                     "Pairing currently only supported for proteins in same operon (prokaryotic genomes). "
 21 |                     "unpaired - generate separate MSA for each protein. (default) "
 22 |                     "unpaired+paired - attempt to pair sequences from the same operon within the genome. "
 23 |                     "paired - only use sequences that were successfully paired. "
 24 |                     "Default is 'unpaired'.")
 25 | parser.add_argument("-pc", "--pair_cov", default=50, type=int,
 26 |                     help="Options to prefilter each MSA before pairing. It might help if there are any paralogs in the complex. "
 27 |                     "prefilter each MSA to minimum coverage with query (%%) before pairing. "
 28 |                     "Default is 50.")
 29 | parser.add_argument("-pq", "--pair_qid", default=20, type=int,
 30 |                     help="Options to prefilter each MSA before pairing. It might help if there are any paralogs in the complex. "
 31 |                     "prefilter each MSA to minimum sequence identity with query (%%) before pairing. "
 32 |                     "Default is 20.")
 33 | parser.add_argument("-b", "--rank_by", default="pLDDT", type=str, choices=["pLDDT", "pTMscore"],
 34 |                     help="specify metric to use for ranking models (For protein-protein complexes, we recommend pTMscore). "
 35 |                     "Default is 'pLDDT'.")
 36 | parser.add_argument("-t", "--use_turbo", action='store_true',
 37 |                     help="introduces a few modifications (compile once, swap params, adjust max_msa) to speedup and reduce memory requirements. "
 38 |                     "Disable for default behavior.")
 39 | parser.add_argument("-mm", "--max_msa", default="512:1024", type=str,
 40 |                     help="max_msa defines: max_msa_clusters:max_extra_msa number of sequences to use. "
 41 |                     "This option ignored if use_turbo is disabled. Default is '512:1024'.")
 42 | parser.add_argument("-n", "--num_models", default=5, type=int, help="specify how many model params to try. (Default is 5)")
 43 | parser.add_argument("-pt", "--use_ptm", action='store_true',
 44 |                     help="uses Deepmind's ptm finetuned model parameters to get PAE per structure. "
 45 |                     "Disable to use the original model params. (Disabling may give alternative structures.)")
 46 | parser.add_argument("-e", "--num_ensemble", default=1, type=int, choices=[1, 8],
 47 |                     help="the trunk of the network is run multiple times with different random choices for the MSA cluster centers. "
 48 |                     "(1=default, 8=casp14 setting)")
 49 | parser.add_argument("-r", "--max_recycles", default=3, type=int, help="controls the maximum number of times the structure is fed back into the neural network for refinement. (default is 3)")
 50 | parser.add_argument("--tol", default=0, type=float, help="tolerance for deciding when to stop (CA-RMS between recycles)")
 51 | parser.add_argument("--is_training", action='store_true',
 52 |                     help="enables the stochastic part of the model (dropout), when coupled with num_samples can be used to 'sample' a diverse set of structures. False (NOT specifying this option) is recommended at first.")
 53 | parser.add_argument("--num_samples", default=1, type=int, help="number of random_seeds to try. Default is 1.")
 54 | parser.add_argument("--num_relax", default="None", choices=["None", "Top1", "Top5", "All"],
 55 |                     help="num_relax is 'None' (default), 'Top1', 'Top5' or 'All'. Specify how many of the top ranked structures to relax.")
 56 | args = parser.parse_args()
 57 | ## command-line arguments
 58 | 
 59 | ### Check your OS for localcolabfold
 60 | import platform
 61 | pf = platform.system()
 62 | if pf == 'Windows':
 63 |   print('ColabFold on Windows')
 64 | elif pf == 'Darwin':
 65 |   print('ColabFold on Mac')
 66 |   device="cpu"
 67 | elif pf == 'Linux':
 68 |   print('ColabFold on Linux')
 69 |   device="gpu"
 70 | #%%
 71 | ### python code of AlphaFold2_advanced.ipynb
 72 | import os
 73 | import tensorflow as tf
 74 | tf.config.set_visible_devices([], 'GPU')
 75 | 
 76 | import jax
 77 | 
 78 | from IPython.utils import io
 79 | import subprocess
 80 | import tqdm.notebook
 81 | 
 82 | # --- Python imports ---
 83 | import colabfold as cf
 84 | import pairmsa
 85 | import sys
 86 | import pickle
 87 | 
 88 | from urllib import request
 89 | from concurrent import futures
 90 | import json
 91 | from matplotlib import gridspec
 92 | import matplotlib.pyplot as plt
 93 | import numpy as np
 94 | import py3Dmol
 95 | 
 96 | from urllib import request
 97 | from concurrent import futures
 98 | import json
 99 | from matplotlib import gridspec
100 | import matplotlib.pyplot as plt
101 | import numpy as np
102 | import py3Dmol
103 | 
104 | from alphafold.model import model
105 | from alphafold.model import config
106 | from alphafold.model import data
107 | 
108 | from alphafold.data import parsers
109 | from alphafold.data import pipeline
110 | from alphafold.data.tools import jackhmmer
111 | 
112 | from alphafold.common import protein
113 | 
114 | def run_jackhmmer(sequence, prefix):
115 | 
116 |   fasta_path = f"{prefix}.fasta"
117 |   with open(fasta_path, 'wt') as f:
118 |     f.write(f'>query\n{sequence}')
119 | 
120 |   pickled_msa_path = f"{prefix}.jackhmmer.pickle"
121 |   if os.path.isfile(pickled_msa_path):
122 |     msas_dict = pickle.load(open(pickled_msa_path,"rb"))
123 |     msas, deletion_matrices, names = (msas_dict[k] for k in ['msas', 'deletion_matrices', 'names'])
124 |     full_msa = []
125 |     for msa in msas:
126 |       full_msa += msa
127 |   else:
128 |     # --- Find the closest source ---
129 |     test_url_pattern = 'https://storage.googleapis.com/alphafold-colab{:s}/latest/uniref90_2021_03.fasta.1'
130 |     ex = futures.ThreadPoolExecutor(3)
131 |     def fetch(source):
132 |       request.urlretrieve(test_url_pattern.format(source))
133 |       return source
134 |     fs = [ex.submit(fetch, source) for source in ['', '-europe', '-asia']]
135 |     source = None
136 |     for f in futures.as_completed(fs):
137 |       source = f.result()
138 |       ex.shutdown()
139 |       break
140 | 
141 |     jackhmmer_binary_path = '/usr/bin/jackhmmer'
142 |     dbs = []
143 | 
144 |     num_jackhmmer_chunks = {'uniref90': 59, 'smallbfd': 17, 'mgnify': 71}
145 |     total_jackhmmer_chunks = sum(num_jackhmmer_chunks.values())
146 |     with tqdm.notebook.tqdm(total=total_jackhmmer_chunks, bar_format=TQDM_BAR_FORMAT) as pbar:
147 |       def jackhmmer_chunk_callback(i):
148 |         pbar.update(n=1)
149 | 
150 |       pbar.set_description('Searching uniref90')
151 |       jackhmmer_uniref90_runner = jackhmmer.Jackhmmer(
152 |           binary_path=jackhmmer_binary_path,
153 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/uniref90_2021_03.fasta',
154 |           get_tblout=True,
155 |           num_streamed_chunks=num_jackhmmer_chunks['uniref90'],
156 |           streaming_callback=jackhmmer_chunk_callback,
157 |           z_value=135301051)
158 |       dbs.append(('uniref90', jackhmmer_uniref90_runner.query(fasta_path)))
159 | 
160 |       pbar.set_description('Searching smallbfd')
161 |       jackhmmer_smallbfd_runner = jackhmmer.Jackhmmer(
162 |           binary_path=jackhmmer_binary_path,
163 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/bfd-first_non_consensus_sequences.fasta',
164 |           get_tblout=True,
165 |           num_streamed_chunks=num_jackhmmer_chunks['smallbfd'],
166 |           streaming_callback=jackhmmer_chunk_callback,
167 |           z_value=65984053)
168 |       dbs.append(('smallbfd', jackhmmer_smallbfd_runner.query(fasta_path)))
169 | 
170 |       pbar.set_description('Searching mgnify')
171 |       jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
172 |           binary_path=jackhmmer_binary_path,
173 |           database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/mgy_clusters_2019_05.fasta',
174 |           get_tblout=True,
175 |           num_streamed_chunks=num_jackhmmer_chunks['mgnify'],
176 |           streaming_callback=jackhmmer_chunk_callback,
177 |           z_value=304820129)
178 |       dbs.append(('mgnify', jackhmmer_mgnify_runner.query(fasta_path)))
179 | 
180 |     # --- Extract the MSAs and visualize ---
181 |     # Extract the MSAs from the Stockholm files.
182 |     # NB: deduplication happens later in pipeline.make_msa_features.
183 | 
184 |     mgnify_max_hits = 501
185 |     msas = []
186 |     deletion_matrices = []
187 |     names = []
188 |     for db_name, db_results in dbs:
189 |       unsorted_results = []
190 |       for i, result in enumerate(db_results):
191 |         msa, deletion_matrix, target_names = parsers.parse_stockholm(result['sto'])
192 |         e_values_dict = parsers.parse_e_values_from_tblout(result['tbl'])
193 |         e_values = [e_values_dict[t.split('/')[0]] for t in target_names]
194 |         zipped_results = zip(msa, deletion_matrix, target_names, e_values)
195 |         if i != 0:
196 |           # Only take query from the first chunk
197 |           zipped_results = [x for x in zipped_results if x[2] != 'query']
198 |         unsorted_results.extend(zipped_results)
199 |       sorted_by_evalue = sorted(unsorted_results, key=lambda x: x[3])
200 |       db_msas, db_deletion_matrices, db_names, _ = zip(*sorted_by_evalue)
201 |       if db_msas:
202 |         if db_name == 'mgnify':
203 |           db_msas = db_msas[:mgnify_max_hits]
204 |           db_deletion_matrices = db_deletion_matrices[:mgnify_max_hits]
205 |           db_names = db_names[:mgnify_max_hits]
206 |         msas.append(db_msas)
207 |         deletion_matrices.append(db_deletion_matrices)
208 |         names.append(db_names)
209 |         msa_size = len(set(db_msas))
210 |         print(f'{msa_size} Sequences Found in {db_name}')
211 | 
212 |       pickle.dump({"msas":msas,
213 |                    "deletion_matrices":deletion_matrices,
214 |                    "names":names}, open(pickled_msa_path,"wb"))
215 |   return msas, deletion_matrices, names
216 | 
217 | #%%
218 | import re
219 | 
220 | # --read sequence from input file--
221 | from Bio import SeqIO
222 | 
223 | def readfastafile(fastafile):
224 |     records = list(SeqIO.parse(fastafile, "fasta"))
225 |     if(len(records) != 1):
226 |         raise ValueError('Input FASTA file must have a single ID/sequence.')
227 |     else:
228 |         return records[0].id, records[0].seq
229 | 
230 | 
231 | print("Input ID: {}".format(readfastafile(args.input)[0]))
232 | print("Input Sequence: {}".format(readfastafile(args.input)[1]))
233 | sequence = str(readfastafile(args.input)[1])
234 | # --read sequence from input file--
235 | sequence = re.sub("[^A-Z:/]", "", sequence.upper())
236 | sequence = re.sub(":+",":",sequence)
237 | sequence = re.sub("/+","/",sequence)
238 | sequence = re.sub("^[:/]+","",sequence)
239 | sequence = re.sub("[:/]+$","",sequence)
240 | 
241 | jobname = "test" #@param {type:"string"}
242 | jobname = re.sub(r'\W+', '', jobname)
243 | 
244 | # define number of copies
245 | homooligomer = args.homooligomer #@param {type:"string"}
246 | homooligomer = re.sub("[:/]+",":",homooligomer)
247 | homooligomer = re.sub("^[:/]+","",homooligomer)
248 | homooligomer = re.sub("[:/]+$","",homooligomer)
249 | 
250 | if len(homooligomer) == 0: homooligomer = "1"
251 | homooligomer = re.sub("[^0-9:]", "", homooligomer)
252 | homooligomers = [int(h) for h in homooligomer.split(":")]
253 | 
254 | #@markdown - `sequence` Specify protein sequence to be modelled.
255 | #@markdown  - Use `/` to specify intra-protein chainbreaks (for trimming regions within protein).
256 | #@markdown  - Use `:` to specify inter-protein chainbreaks (for modeling protein-protein hetero-complexes).
257 | #@markdown  - For example, sequence `AC/DE:FGH` will be modelled as polypeptides: `AC`, `DE` and `FGH`. A separate MSA will be generates for `ACDE` and `FGH`.
258 | #@markdown    If `pair_msa` is enabled, `ACDE`'s MSA will be paired with `FGH`'s MSA.
259 | #@markdown - `homooligomer` Define number of copies in a homo-oligomeric assembly.
260 | #@markdown  - Use `:` to specify different homooligomeric state (copy numer) for each component of the complex.
261 | #@markdown  - For example, **sequence:**`ABC:DEF`, **homooligomer:** `2:1`, the first protein `ABC` will be modeled as a homodimer (2 copies) and second `DEF` a monomer (1 copy).
262 | 
263 | ori_sequence = sequence
264 | sequence = sequence.replace("/","").replace(":","")
265 | seqs = ori_sequence.replace("/","").split(":")
266 | 
267 | if len(seqs) != len(homooligomers):
268 |   if len(homooligomers) == 1:
269 |     homooligomers = [homooligomers[0]] * len(seqs)
270 |     homooligomer = ":".join([str(h) for h in homooligomers])
271 |   else:
272 |     while len(seqs) > len(homooligomers):
273 |       homooligomers.append(1)
274 |     homooligomers = homooligomers[:len(seqs)]
275 |     homooligomer = ":".join([str(h) for h in homooligomers])
276 |     print("WARNING: Mismatch between number of breaks ':' in 'sequence' and 'homooligomer' definition")
277 | 
278 | full_sequence = "".join([s*h for s,h in zip(seqs,homooligomers)])
279 | 
280 | # prediction directory
281 | # --set the output directory from command-line arguments
282 | if args.output_dir == "":
283 |   output_dir = 'prediction_' + jobname + '_' + cf.get_hash(full_sequence)[:5]
284 | else:
285 |   output_dir = args.output_dir
286 | # --set the output directory from command-line arguments
287 | 
288 | os.makedirs(output_dir, exist_ok=True)
289 | # delete existing files in working directory
290 | for f in os.listdir(output_dir):
291 |   os.remove(os.path.join(output_dir, f))
292 | 
293 | MIN_SEQUENCE_LENGTH = 16
294 | MAX_SEQUENCE_LENGTH = 2500
295 | 
296 | aatypes = set('ACDEFGHIKLMNPQRSTVWY')  # 20 standard aatypes
297 | if not set(full_sequence).issubset(aatypes):
298 |   raise Exception(f'Input sequence contains non-amino acid letters: {set(sequence) - aatypes}. AlphaFold only supports 20 standard amino acids as inputs.')
299 | if len(full_sequence) < MIN_SEQUENCE_LENGTH:
300 |   raise Exception(f'Input sequence is too short: {len(full_sequence)} amino acids, while the minimum is {MIN_SEQUENCE_LENGTH}')
301 | if len(full_sequence) > MAX_SEQUENCE_LENGTH:
302 |   raise Exception(f'Input sequence is too long: {len(full_sequence)} amino acids, while the maximum is {MAX_SEQUENCE_LENGTH}. Please use the full AlphaFold system for long sequences.')
303 | 
304 | if len(full_sequence) > 1400:
305 |   print(f"WARNING: For a typical Google-Colab-GPU (16G) session, the max total length is ~1400 residues. You are at {len(full_sequence)}! Run Alphafold may crash.")
306 | 
307 | print(f"homooligomer: '{homooligomer}'")
308 | print(f"total_length: '{len(full_sequence)}'")
309 | print(f"working_directory: '{output_dir}'")
310 | #%%
311 | TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'
312 | #@markdown Once this cell has been executed, you will see
313 | #@markdown statistics about the multiple sequence alignment
314 | #@markdown (MSA) that will be used by AlphaFold. In particular,
315 | #@markdown you’ll see how well each residue is covered by similar
316 | #@markdown sequences in the MSA.
317 | #@markdown (Note that the search against databases and the actual prediction can take some time, from minutes to hours, depending on the length of the protein and what type of GPU you are allocated by Colab.)
318 | 
319 | #@markdown ---
320 | msa_method = args.msa_method #@param ["mmseqs2","jackhmmer","single_sequence","precomputed"]
321 | #@markdown ---
322 | #@markdown **custom msa options**
323 | add_custom_msa = False #@param {type:"boolean"}
324 | msa_format = "fas" #@param ["fas","a2m","a3m","sto","psi","clu"]
325 | #@markdown - `add_custom_msa` - If enabled, you'll get an option to upload your custom MSA in the specified `msa_format`. Note: Your MSA will be supplemented with those from 'mmseqs2' or 'jackhmmer', unless `msa_method` is set to 'single_sequence'.
326 | 
327 | # --set the output directory from command-line arguments
328 | pair_mode = args.pair_mode #@param ["unpaired","unpaired+paired","paired"] {type:"string"}
329 | 
330 | pair_cov = args.pair_cov #@param [0,25,50,75,90] {type:"raw"}
331 | pair_qid = args.pair_qid #@param [0,15,20,30,40,50] {type:"raw"}
332 | # --set the output directory from command-line arguments
333 | 
334 | # --- Search against genetic databases ---
335 | os.makedirs('tmp', exist_ok=True)
336 | msas, deletion_matrices = [],[]
337 | 
338 | if add_custom_msa:
339 |   print(f"upload custom msa in '{msa_format}' format")
340 |   msa_dict = files.upload()
341 |   lines = msa_dict[list(msa_dict.keys())[0]].decode()
342 | 
343 |   # convert to a3m
344 |   with open(f"tmp/upload.{msa_format}","w") as tmp_upload:
345 |     tmp_upload.write(lines)
346 |   os.system(f"reformat.pl {msa_format} a3m tmp/upload.{msa_format} tmp/upload.a3m")
347 |   a3m_lines = open("tmp/upload.a3m","r").read()
348 | 
349 |   # parse
350 |   msa, mtx = parsers.parse_a3m(a3m_lines)
351 |   msas.append(msa)
352 |   deletion_matrices.append(mtx)
353 | 
354 |   if len(msas[0][0]) != len(sequence):
355 |     raise ValueError("ERROR: the length of msa does not match input sequence")
356 | 
357 | if msa_method == "precomputed":
358 |   print("upload precomputed pickled msa from previous run")
359 |   pickled_msa_dict = files.upload()
360 |   msas_dict = pickle.loads(pickled_msa_dict[list(pickled_msa_dict.keys())[0]])
361 |   msas, deletion_matrices = (msas_dict[k] for k in ['msas', 'deletion_matrices'])
362 | 
363 | elif msa_method == "single_sequence":
364 |   if len(msas) == 0:
365 |     msas.append([sequence])
366 |     deletion_matrices.append([[0]*len(sequence)])
367 | 
368 | else:
369 |   seqs = ori_sequence.replace('/','').split(':')
370 |   _blank_seq = ["-" * len(seq) for seq in seqs]
371 |   _blank_mtx = [[0] * len(seq) for seq in seqs]
372 |   def _pad(ns,vals,mode):
373 |     if mode == "seq": _blank = _blank_seq.copy()
374 |     if mode == "mtx": _blank = _blank_mtx.copy()
375 |     if isinstance(ns, list):
376 |       for n,val in zip(ns,vals): _blank[n] = val
377 |     else: _blank[ns] = vals
378 |     if mode == "seq": return "".join(_blank)
379 |     if mode == "mtx": return sum(_blank,[])
380 | 
381 |   if len(seqs) == 1 or "unpaired" in pair_mode:
382 |     # gather msas
383 |     if msa_method == "mmseqs2":
384 |       prefix = cf.get_hash("".join(seqs))
385 |       prefix = os.path.join('tmp',prefix)
386 |       print(f"running mmseqs2")
387 |       A3M_LINES = cf.run_mmseqs2(seqs, prefix, filter=True)
388 | 
389 |     for n, seq in enumerate(seqs):
390 |       # tmp directory
391 |       prefix = cf.get_hash(seq)
392 |       prefix = os.path.join('tmp',prefix)
393 | 
394 |       if msa_method == "mmseqs2":
395 |         # run mmseqs2
396 |         a3m_lines = A3M_LINES[n]
397 |         msa, mtx = parsers.parse_a3m(a3m_lines)
398 |         msas_, mtxs_ = [msa],[mtx]
399 | 
400 |       elif msa_method == "jackhmmer":
401 |         print(f"running jackhmmer on seq_{n}")
402 |         # run jackhmmer
403 |         msas_, mtxs_, names_ = ([sum(x,())] for x in run_jackhmmer(seq, prefix))
404 | 
405 |       # pad sequences
406 |       for msa_,mtx_ in zip(msas_,mtxs_):
407 |         msa,mtx = [sequence],[[0]*len(sequence)]
408 |         for s,m in zip(msa_,mtx_):
409 |           msa.append(_pad(n,s,"seq"))
410 |           mtx.append(_pad(n,m,"mtx"))
411 | 
412 |         msas.append(msa)
413 |         deletion_matrices.append(mtx)
414 | 
415 |   ####################################################################################
416 |   # PAIR_MSA
417 |   ####################################################################################
418 | 
419 |   if len(seqs) > 1 and (pair_mode == "paired" or pair_mode == "unpaired+paired"):
420 |     print("attempting to pair some sequences...")
421 | 
422 |     if msa_method == "mmseqs2":
423 |       prefix = cf.get_hash("".join(seqs))
424 |       prefix = os.path.join('tmp',prefix)
425 |       print(f"running mmseqs2_noenv_nofilter on all seqs")
426 |       A3M_LINES = cf.run_mmseqs2(seqs, prefix, use_env=False, use_filter=False)
427 | 
428 |     _data = []
429 |     for a in range(len(seqs)):
430 |       print(f"prepping seq_{a}")
431 |       _seq = seqs[a]
432 |       _prefix = os.path.join('tmp',cf.get_hash(_seq))
433 | 
434 |       if msa_method == "mmseqs2":
435 |         a3m_lines = A3M_LINES[a]
436 |         _msa, _mtx, _lab = pairmsa.parse_a3m(a3m_lines,
437 |                                              filter_qid=pair_qid/100,
438 |                                              filter_cov=pair_cov/100)
439 | 
440 |       elif msa_method == "jackhmmer":
441 |         _msas, _mtxs, _names = run_jackhmmer(_seq, _prefix)
442 |         _msa, _mtx, _lab = pairmsa.get_uni_jackhmmer(_msas[0], _mtxs[0], _names[0],
443 |                                                      filter_qid=pair_qid/100,
444 |                                                      filter_cov=pair_cov/100)
445 | 
446 |       if len(_msa) > 1:
447 |         _data.append(pairmsa.hash_it(_msa, _lab, _mtx, call_uniprot=False))
448 |       else:
449 |         _data.append(None)
450 | 
451 |     Ln = len(seqs)
452 |     O = [[None for _ in seqs] for _ in seqs]
453 |     for a in range(Ln):
454 |       if _data[a] is not None:
455 |         for b in range(a+1,Ln):
456 |           if _data[b] is not None:
457 |             print(f"attempting pairwise stitch for {a} {b}")
458 |             O[a][b] = pairmsa._stitch(_data[a],_data[b])
459 |             _seq_a, _seq_b, _mtx_a, _mtx_b = (*O[a][b]["seq"],*O[a][b]["mtx"])
460 | 
461 |             ##############################################
462 |             # filter to remove redundant sequences
463 |             ##############################################
464 |             ok = []
465 |             with open("tmp/tmp.fas","w") as fas_file:
466 |               fas_file.writelines([f">{n}\n{a+b}\n" for n,(a,b) in enumerate(zip(_seq_a,_seq_b))])
467 |             os.system("hhfilter -maxseq 1000000 -i tmp/tmp.fas -o tmp/tmp.id90.fas -id 90")
468 |             for line in open("tmp/tmp.id90.fas","r"):
469 |               if line.startswith(">"): ok.append(int(line[1:]))
470 |             ##############################################
471 |             print(f"found {len(_seq_a)} pairs ({len(ok)} after filtering)")
472 | 
473 |             if len(_seq_a) > 0:
474 |               msa,mtx = [sequence],[[0]*len(sequence)]
475 |               for s_a,s_b,m_a,m_b in zip(_seq_a, _seq_b, _mtx_a, _mtx_b):
476 |                 msa.append(_pad([a,b],[s_a,s_b],"seq"))
477 |                 mtx.append(_pad([a,b],[m_a,m_b],"mtx"))
478 |               msas.append(msa)
479 |               deletion_matrices.append(mtx)
480 | 
481 |     '''
482 |     # triwise stitching (WIP)
483 |     if Ln > 2:
484 |       for a in range(Ln):
485 |         for b in range(a+1,Ln):
486 |           for c in range(b+1,Ln):
487 |             if O[a][b] is not None and O[b][c] is not None:
488 |               print(f"attempting triwise stitch for {a} {b} {c}")
489 |               list_ab = O[a][b]["lab"][1]
490 |               list_bc = O[b][c]["lab"][0]
491 |               msa,mtx = [sequence],[[0]*len(sequence)]
492 |               for i,l_b in enumerate(list_ab):
493 |                 if l_b in list_bc:
494 |                   j = list_bc.index(l_b)
495 |                   s_a = O[a][b]["seq"][0][i]
496 |                   s_b = O[a][b]["seq"][1][i]
497 |                   s_c = O[b][c]["seq"][1][j]
498 | 
499 |                   m_a = O[a][b]["mtx"][0][i]
500 |                   m_b = O[a][b]["mtx"][1][i]
501 |                   m_c = O[b][c]["mtx"][1][j]
502 | 
503 |                   msa.append(_pad([a,b,c],[s_a,s_b,s_c],"seq"))
504 |                   mtx.append(_pad([a,b,c],[m_a,m_b,m_c],"mtx"))
505 |               if len(msa) > 1:
506 |                 msas.append(msa)
507 |                 deletion_matrices.append(mtx)
508 |                 print(f"found {len(msa)} triplets")
509 |     '''
510 | ####################################################################################
511 | ####################################################################################
512 | 
513 | # save MSA as pickle
514 | pickle.dump({"msas":msas,"deletion_matrices":deletion_matrices},
515 |             open(os.path.join(output_dir,"msa.pickle"),"wb"))
516 | 
517 | make_msa_plot = len(msas[0]) > 1
518 | if make_msa_plot:
519 |   plt = cf.plot_msas(msas, ori_sequence)
520 |   plt.savefig(os.path.join(output_dir,"msa_coverage.png"), bbox_inches = 'tight', dpi=300)
521 | #%%
522 | ##@title run alphafold
523 | # --------set parameters from command-line arguments--------
524 | num_relax = args.num_relax
525 | rank_by = args.rank_by
526 | 
527 | use_turbo = True if args.use_turbo else False
528 | max_msa = args.max_msa
529 | # --------set parameters from command-line arguments--------
530 | 
531 | max_msa_clusters, max_extra_msa = [int(x) for x in max_msa.split(":")]
532 | 
533 | 
534 | 
535 | #@markdown - `rank_by` specify metric to use for ranking models (For protein-protein complexes, we recommend pTMscore)
536 | #@markdown - `use_turbo` introduces a few modifications (compile once, swap params, adjust max_msa) to speedup and reduce memory requirements. Disable for default behavior.
537 | #@markdown - `max_msa` defines: `max_msa_clusters:max_extra_msa` number of sequences to use. When adjusting after GPU crash, be sure to `Runtime` → `Restart runtime`. (Lowering will reduce GPU requirements, but may result in poor model quality. This option ignored if `use_turbo` is disabled)
538 | show_images = True #@param {type:"boolean"}
539 | #@markdown - `show_images` To make things more exciting we show images of the predicted structures as they are being generated. (WARNING: the order of images displayed does not reflect any ranking).
540 | #@markdown ---
541 | #@markdown #### Sampling options
542 | #@markdown There are two stochastic parts of the pipeline. Within the feature generation (choice of cluster centers) and within the model (dropout).
543 | #@markdown To get structure diversity, you can iterate through a fixed number of random_seeds (using `num_samples`) and/or enable dropout (using `is_training`).
544 | 
545 | # --------set parameters from command-line arguments--------
546 | num_models = args.num_models
547 | use_ptm = True if args.use_ptm else False
548 | num_ensemble = args.num_ensemble
549 | max_recycles = args.max_recycles
550 | tol = args.tol
551 | is_training = True if args.is_training else False
552 | num_samples = args.num_samples
553 | # --------set parameters from command-line arguments--------
554 | 
555 | subsample_msa = True #@param {type:"boolean"}
556 | #@markdown - `subsample_msa` subsample large MSA to `3E7/length` sequences to avoid crashing the preprocessing protocol. (This option ignored if `use_turbo` is disabled.)
557 | 
558 | save_pae_json = True
559 | save_tmp_pdb = True
560 | 
561 | 
562 | if use_ptm == False and rank_by == "pTMscore":
563 |   print("WARNING: models will be ranked by pLDDT, 'use_ptm' is needed to compute pTMscore")
564 |   rank_by = "pLDDT"
565 | 
566 | #############################
567 | # delete old files
568 | #############################
569 | for f in os.listdir(output_dir):
570 |   if "rank_" in f:
571 |     os.remove(os.path.join(output_dir, f))
572 | 
573 | #############################
574 | # homooligomerize
575 | #############################
576 | lengths = [len(seq) for seq in seqs]
577 | msas_mod, deletion_matrices_mod = cf.homooligomerize_heterooligomer(msas, deletion_matrices,
578 |                                                                     lengths, homooligomers)
579 | #############################
580 | # define input features
581 | #############################
582 | def _placeholder_template_feats(num_templates_, num_res_):
583 |   return {
584 |       'template_aatype': np.zeros([num_templates_, num_res_, 22], np.float32),
585 |       'template_all_atom_masks': np.zeros([num_templates_, num_res_, 37, 3], np.float32),
586 |       'template_all_atom_positions': np.zeros([num_templates_, num_res_, 37], np.float32),
587 |       'template_domain_names': np.zeros([num_templates_], np.float32),
588 |       'template_sum_probs': np.zeros([num_templates_], np.float32),
589 |   }
590 | 
591 | num_res = len(full_sequence)
592 | feature_dict = {}
593 | feature_dict.update(pipeline.make_sequence_features(full_sequence, 'test', num_res))
594 | feature_dict.update(pipeline.make_msa_features(msas_mod, deletion_matrices=deletion_matrices_mod))
595 | if not use_turbo:
596 |   feature_dict.update(_placeholder_template_feats(0, num_res))
597 | 
598 | def do_subsample_msa(F, random_seed=0):
599 |   '''subsample msa to avoid running out of memory'''
600 |   N = len(F["msa"])
601 |   L = len(F["residue_index"])
602 |   N_ = int(3E7/L)
603 |   if N > N_:
604 |     print(f"whhhaaa... too many sequences ({N}) subsampling to {N_}")
605 |     np.random.seed(random_seed)
606 |     idx = np.append(0,np.random.permutation(np.arange(1,N)))[:N_]
607 |     F_ = {}
608 |     F_["msa"] = F["msa"][idx]
609 |     F_["deletion_matrix_int"] = F["deletion_matrix_int"][idx]
610 |     F_["num_alignments"] = np.full_like(F["num_alignments"],N_)
611 |     for k in ['aatype', 'between_segment_residues',
612 |               'domain_name', 'residue_index',
613 |               'seq_length', 'sequence']:
614 |               F_[k] = F[k]
615 |     return F_
616 |   else:
617 |     return F
618 | 
619 | ################################
620 | # set chain breaks
621 | ################################
622 | Ls = []
623 | for seq,h in zip(ori_sequence.split(":"),homooligomers):
624 |   Ls += [len(s) for s in seq.split("/")] * h
625 | Ls_plot = sum([[len(seq)]*h for seq,h in zip(seqs,homooligomers)],[])
626 | feature_dict['residue_index'] = cf.chain_break(feature_dict['residue_index'], Ls)
627 | 
628 | ###########################
629 | # run alphafold
630 | ###########################
631 | def parse_results(prediction_result, processed_feature_dict):
632 |   b_factors = prediction_result['plddt'][:,None] * prediction_result['structure_module']['final_atom_mask']
633 |   dist_bins = jax.numpy.append(0,prediction_result["distogram"]["bin_edges"])
634 |   dist_mtx = dist_bins[prediction_result["distogram"]["logits"].argmax(-1)]
635 |   contact_mtx = jax.nn.softmax(prediction_result["distogram"]["logits"])[:,:,dist_bins < 8].sum(-1)
636 | 
637 |   out = {"unrelaxed_protein": protein.from_prediction(processed_feature_dict, prediction_result, b_factors=b_factors),
638 |          "plddt": prediction_result['plddt'],
639 |          "pLDDT": prediction_result['plddt'].mean(),
640 |          "dists": dist_mtx,
641 |          "adj": contact_mtx}
642 | 
643 |   if "ptm" in prediction_result:
644 |     out.update({"pae": prediction_result['predicted_aligned_error'],
645 |                 "pTMscore": prediction_result['ptm']})
646 |   return out
647 | 
648 | model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'][:num_models]
649 | total = len(model_names) * num_samples
650 | with tqdm.notebook.tqdm(total=total, bar_format=TQDM_BAR_FORMAT) as pbar:
651 |   #######################################################################
652 |   # precompile model and recompile only if length changes
653 |   #######################################################################
654 |   if use_turbo:
655 |     name = "model_5_ptm" if use_ptm else "model_5"
656 |     N = len(feature_dict["msa"])
657 |     L = len(feature_dict["residue_index"])
658 |     compiled = (N, L, use_ptm, max_recycles, tol, num_ensemble, max_msa, is_training)
659 |     if "COMPILED" in dir():
660 |       if COMPILED != compiled: recompile = True
661 |     else: recompile = True
662 |     if recompile:
663 |       cf.clear_mem("gpu")
664 |       cfg = config.model_config(name)
665 | 
666 |       # set size of msa (to reduce memory requirements)
667 |       msa_clusters = min(N, max_msa_clusters)
668 |       cfg.data.eval.max_msa_clusters = msa_clusters
669 |       cfg.data.common.max_extra_msa = max(min(N-msa_clusters,max_extra_msa),1)
670 | 
671 |       cfg.data.common.num_recycle = max_recycles
672 |       cfg.model.num_recycle = max_recycles
673 |       cfg.model.recycle_tol = tol
674 |       cfg.data.eval.num_ensemble = num_ensemble
675 | 
676 |       params = data.get_model_haiku_params(name,'./alphafold/data')
677 |       model_runner = model.RunModel(cfg, params, is_training=is_training)
678 |       COMPILED = compiled
679 |       recompile = False
680 | 
681 |   else:
682 |     cf.clear_mem("gpu")
683 |     recompile = True
684 | 
685 |   # cleanup
686 |   if "outs" in dir(): del outs
687 |   outs = {}
688 |   cf.clear_mem("cpu")
689 | 
690 |   #######################################################################
691 |   def report(key):
692 |     pbar.update(n=1)
693 |     o = outs[key]
694 |     line = f"{key} recycles:{o['recycles']} tol:{o['tol']:.2f} pLDDT:{o['pLDDT']:.2f}"
695 |     if use_ptm: line += f" pTMscore:{o['pTMscore']:.2f}"
696 |     print(line)
697 |     if show_images:
698 |       fig = cf.plot_protein(o['unrelaxed_protein'], Ls=Ls_plot, dpi=100)
699 |       # plt.show()
700 |       plt.ion()
701 |     if save_tmp_pdb:
702 |       tmp_pdb_path = os.path.join(output_dir,f'unranked_{key}_unrelaxed.pdb')
703 |       pdb_lines = protein.to_pdb(o['unrelaxed_protein'])
704 |       with open(tmp_pdb_path, 'w') as f: f.write(pdb_lines)
705 | 
706 |   if use_turbo:
707 |     # go through each random_seed
708 |     for seed in range(num_samples):
709 | 
710 |       # prep input features
711 |       if subsample_msa:
712 |         sampled_feats_dict = do_subsample_msa(feature_dict, random_seed=seed)
713 |         processed_feature_dict = model_runner.process_features(sampled_feats_dict, random_seed=seed)
714 |       else:
715 |         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed)
716 | 
717 |       # go through each model
718 |       for num, model_name in enumerate(model_names):
719 |         name = model_name+"_ptm" if use_ptm else model_name
720 |         key = f"{name}_seed_{seed}"
721 |         pbar.set_description(f'Running {key}')
722 | 
723 |         # replace model parameters
724 |         params = data.get_model_haiku_params(name, './alphafold/data')
725 |         for k in model_runner.params.keys():
726 |           model_runner.params[k] = params[k]
727 | 
728 |         # predict
729 |         prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
730 | 
731 |         # save results
732 |         outs[key] = parse_results(prediction_result, processed_feature_dict)
733 |         outs[key].update({"recycles":r, "tol":t})
734 |         report(key)
735 | 
736 |         del prediction_result, params
737 |       del sampled_feats_dict, processed_feature_dict
738 | 
739 |   else:
740 |     # go through each model
741 |     for num, model_name in enumerate(model_names):
742 |       name = model_name+"_ptm" if use_ptm else model_name
743 |       params = data.get_model_haiku_params(name, './alphafold/data')
744 |       cfg = config.model_config(name)
745 |       cfg.data.common.num_recycle = cfg.model.num_recycle = max_recycles
746 |       cfg.model.recycle_tol = tol
747 |       cfg.data.eval.num_ensemble = num_ensemble
748 |       model_runner = model.RunModel(cfg, params, is_training=is_training)
749 | 
750 |       # go through each random_seed
751 |       for seed in range(num_samples):
752 |         key = f"{name}_seed_{seed}"
753 |         pbar.set_description(f'Running {key}')
754 |         processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed)
755 |         prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
756 |         outs[key] = parse_results(prediction_result, processed_feature_dict)
757 |         outs[key].update({"recycles":r, "tol":t})
758 |         report(key)
759 | 
760 |         # cleanup
761 |         del processed_feature_dict, prediction_result
762 | 
763 |       del params, model_runner, cfg
764 |       cf.clear_mem("gpu")
765 | 
766 |   # delete old files
767 |   for f in os.listdir(output_dir):
768 |     if "rank" in f:
769 |       os.remove(os.path.join(output_dir, f))
770 | 
771 |   # Find the best model according to the mean pLDDT.
772 |   model_rank = list(outs.keys())
773 |   model_rank = [model_rank[i] for i in np.argsort([outs[x][rank_by] for x in model_rank])[::-1]]
774 | 
775 |   # Write out the prediction
776 |   for n,key in enumerate(model_rank):
777 |     prefix = f"rank_{n+1}_{key}"
778 |     pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb')
779 |     fig = cf.plot_protein(outs[key]["unrelaxed_protein"], Ls=Ls_plot, dpi=200)
780 |     plt.savefig(os.path.join(output_dir,f'{prefix}.png'), bbox_inches = 'tight')
781 |     plt.close(fig)
782 | 
783 |     pdb_lines = protein.to_pdb(outs[key]["unrelaxed_protein"])
784 |     with open(pred_output_path, 'w') as f:
785 |       f.write(pdb_lines)
786 | 
787 | ############################################################
788 | print(f"model rank based on {rank_by}")
789 | for n,key in enumerate(model_rank):
790 |   print(f"rank_{n+1}_{key} {rank_by}:{outs[key][rank_by]:.2f}")
791 | #%%
792 | #@title Refine structures with Amber-Relax (Optional)
793 | 
794 | # --------set parameters from command-line arguments--------
795 | num_relax = args.num_relax
796 | # --------set parameters from command-line arguments--------
797 | 
798 | if num_relax == "None":
799 |   num_relax = 0
800 | elif num_relax == "Top1":
801 |   num_relax = 1
802 | elif num_relax == "Top5":
803 |   num_relax = 5
804 | else:
805 |   num_relax = len(model_names) * num_samples
806 | 
807 | if num_relax > 0:
808 |   if "relax" not in dir():
809 |     # add conda environment to path
810 |     sys.path.append('./colabfold-conda/lib/python3.7/site-packages')
811 | 
812 |     # import libraries
813 |     from alphafold.relax import relax
814 |     from alphafold.relax import utils
815 | 
816 |   with tqdm.notebook.tqdm(total=num_relax, bar_format=TQDM_BAR_FORMAT) as pbar:
817 |     pbar.set_description(f'AMBER relaxation')
818 |     for n,key in enumerate(model_rank):
819 |       if n < num_relax:
820 |         prefix = f"rank_{n+1}_{key}"
821 |         pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb')
822 |         if not os.path.isfile(pred_output_path):
823 |           amber_relaxer = relax.AmberRelaxation(
824 |               max_iterations=0,
825 |               tolerance=2.39,
826 |               stiffness=10.0,
827 |               exclude_residues=[],
828 |               max_outer_iterations=20)
829 |           relaxed_pdb_lines, _, _ = amber_relaxer.process(prot=outs[key]["unrelaxed_protein"])
830 |           with open(pred_output_path, 'w') as f:
831 |             f.write(relaxed_pdb_lines)
832 |         pbar.update(n=1)
833 | #%%
834 | #@title Display 3D structure {run: "auto"}
835 | rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"}
836 | color = "lDDT" #@param ["chain", "lDDT", "rainbow"]
837 | show_sidechains = False #@param {type:"boolean"}
838 | show_mainchains = False #@param {type:"boolean"}
839 | 
840 | key = model_rank[rank_num-1]
841 | prefix = f"rank_{rank_num}_{key}"
842 | pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb')
843 | if not os.path.isfile(pred_output_path):
844 |   pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb')
845 | 
846 | cf.show_pdb(pred_output_path, show_sidechains, show_mainchains, color, Ls=Ls_plot).show()
847 | if color == "lDDT": cf.plot_plddt_legend().show()
848 | if use_ptm:
849 |   cf.plot_confidence(outs[key]["plddt"], outs[key]["pae"], Ls=Ls_plot).show()
850 | else:
851 |   cf.plot_confidence(outs[key]["plddt"], Ls=Ls_plot).show()
852 | #%%
853 | #@title Extra outputs
854 | dpi =  300#@param {type:"integer"}
855 | save_to_txt = True #@param {type:"boolean"}
856 | save_pae_json = True #@param {type:"boolean"}
857 | #@markdown - save data used to generate contact and distogram plots below to text file (pae values can be found in json file if `use_ptm` is enabled)
858 | 
859 | if use_ptm:
860 |   print("predicted alignment error")
861 |   cf.plot_paes([outs[k]["pae"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
862 |   plt.savefig(os.path.join(output_dir,f'predicted_alignment_error.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
863 |   # plt.show()
864 | 
865 | print("predicted contacts")
866 | cf.plot_adjs([outs[k]["adj"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
867 | plt.savefig(os.path.join(output_dir,f'predicted_contacts.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
868 | # plt.show()
869 | 
870 | print("predicted distogram")
871 | cf.plot_dists([outs[k]["dists"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
872 | plt.savefig(os.path.join(output_dir,f'predicted_distogram.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
873 | # plt.show()
874 | 
875 | print("predicted LDDT")
876 | cf.plot_plddts([outs[k]["plddt"] for k in model_rank], Ls=Ls_plot, dpi=dpi)
877 | plt.savefig(os.path.join(output_dir,f'predicted_LDDT.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi))
878 | # plt.show()
879 | 
880 | def do_save_to_txt(filename, adj, dists):
881 |   adj = np.asarray(adj)
882 |   dists = np.asarray(dists)
883 |   L = len(adj)
884 |   with open(filename,"w") as out:
885 |     out.write("i\tj\taa_i\taa_j\tp(cbcb<8)\tmaxdistbin\n")
886 |     for i in range(L):
887 |       for j in range(i+1,L):
888 |         if dists[i][j] < 21.68 or adj[i][j] >= 0.001:
889 |           line = f"{i+1}\t{j+1}\t{full_sequence[i]}\t{full_sequence[j]}\t{adj[i][j]:.3f}"
890 |           line += f"\t>{dists[i][j]:.2f}" if dists[i][j] == 21.6875 else f"\t{dists[i][j]:.2f}"
891 |           out.write(f"{line}\n")
892 | 
893 | for n,key in enumerate(model_rank):
894 |   if save_to_txt:
895 |     txt_filename = os.path.join(output_dir,f'rank_{n+1}_{key}.raw.txt')
896 |     do_save_to_txt(txt_filename,adj=outs[key]["adj"],dists=outs[key]["dists"])
897 | 
898 |   if use_ptm and save_pae_json:
899 |     pae = outs[key]["pae"]
900 |     max_pae = pae.max()
901 |     # Save pLDDT and predicted aligned error (if it exists)
902 |     pae_output_path = os.path.join(output_dir,f'rank_{n+1}_{key}_pae.json')
903 |     # Save predicted aligned error in the same format as the AF EMBL DB
904 |     rounded_errors = np.round(np.asarray(pae), decimals=1)
905 |     indices = np.indices((len(rounded_errors), len(rounded_errors))) + 1
906 |     indices_1 = indices[0].flatten().tolist()
907 |     indices_2 = indices[1].flatten().tolist()
908 |     pae_data = json.dumps([{
909 |         'residue1': indices_1,
910 |         'residue2': indices_2,
911 |         'distance': rounded_errors.flatten().tolist(),
912 |         'max_predicted_aligned_error': max_pae.item()
913 |     }],
914 |                           indent=None,
915 |                           separators=(',', ':'))
916 |     with open(pae_output_path, 'w') as f:
917 |       f.write(pae_data)
918 | #%%
919 | 


--------------------------------------------------------------------------------