├── .github └── ISSUE_TEMPLATE │ └── question.md ├── LICENSE ├── README.md ├── README_ja.md ├── beta_install_colabbatch_linux.sh ├── beta_update_linux.sh ├── install_colabbatch_M1mac.sh ├── install_colabbatch_intelmac.sh ├── install_colabbatch_linux.sh ├── update_M1mac.sh ├── update_intelmac.sh ├── update_linux.sh └── v1.0.0 ├── README.md ├── README_ja.md ├── colabfold_alphafold.patch ├── gpurelaxation.patch ├── install_colabfold_M1mac.sh ├── install_colabfold_intelmac.sh ├── install_colabfold_linux.sh ├── residue_constants.patch ├── runner.py ├── runner_af2advanced.py └── runner_af2advanced_old.py /.github/ISSUE_TEMPLATE/question.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Question 3 | about: Question template 4 | title: 'Question:' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Caution: Please only report your issue related to the installation on your local PC or macOS.** If you can get the help message by `colabfold_batch --help` or run a test prediction successfully, your installation is successful. Requests or questions regarding ColabFold features should be directed to [ColabFold repo's issues](https://github.com/sokrypton/ColabFold/issues). 11 | 12 | ---- 13 | 14 | **What is your installation issue?** 15 | 16 | Describe your question here. 17 | 18 | **Computational environment** 19 | 20 | - OS: [e.g. Ubuntu 22.04, Windows10 & WSL2, macOS...] 21 | - CUDA version if Linux (Show the output of `/usr/local/cuda/bin/nvcc --version`.) 22 | 23 | **To Reproduce** 24 | 25 | Steps to reproduce the behavior: 26 | 1. Go to '...' 27 | 2. Click on '....' 28 | 3. Scroll down to '....' 29 | 4. See error 30 | 31 | **Expected behavior** 32 | 33 | A clear and concise description of what you expected to happen. 34 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Yoshitaka Moriwaki 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LocalColabFold 2 | 3 | [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) on your local PC (or macOS). See also [ColabFold repository](https://github.com/sokrypton/ColabFold). 4 | 5 | ## What is LocalColabFold? 6 | 7 | LocalColabFold is an installer script designed to make ColabFold functionality available on users' local machines. It supports wide range of operating systems, such as Windows 10 or later (using Windows Subsystem for Linux 2), macOS, and Linux. 8 | 9 | **If you only intend to predict a small number of naturally occurring proteins, I recommend using [ColabFold notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) or downloading structures from the [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/) or [UniProt](https://www.uniprot.org/). LocalColabFold is suitable for more advanced applications, such as batch processing of structure predictions for natural complexes, non-natural proteins, or predictions with manually specified MSAs/templates.** 10 | 11 | ## Advantages of LocalColabFold 12 | 13 | - **Structure inference and relaxation will be accelerated if your PC has Nvidia GPU and CUDA drivers.** 14 | - **No Time out (90 minutes and 12 hours)** 15 | - **No GPU limitations** 16 | - **NOT necessary to prepare the large database required for native AlphaFold2**. 17 | 18 | ## Note (May 21, 2024) 19 | 20 | - Since current GPU-supported jax > 0.4.26 requires CUDA 12.1 or later and cudnn 9, please upgrade or install your CUDA driver and cudnn. CUDA 12.4 is recommended. 21 | 22 | ## Note (Jan 30, 2024) 23 | 24 | - ColabFold now upgrade to 1.5.5 (compatible with AlphaFold 2.3.2). Now LocalColabFold requires **CUDA 12.1 or later**. Please update your CUDA driver if you have not done so. 25 | - Now (Local)ColabFold can predict protein structures without connecting the Internet. Use [`setup_databases.sh`](https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh) script to download and build the databases (See also [ColabFold Downloads](https://colabfold.mmseqs.com/)). An instruction to run `colabfold_search` to obtain the MSA and templates locally is written in [this comment](https://github.com/sokrypton/ColabFold/issues/563). 26 | 27 | ## New Updates 28 | 29 | - 30Jan2024, ColabFold 1.5.5 (Compatible with AlphaFold 2.3.2). Now LocalColabFold requires **CUDA 12.1 or later**. Please update your CUDA driver. 30 | - 30Apr2023, Updated to use python 3.10 for compatibility with Google Colaboratory. 31 | - 09Mar2023, version 1.5.1 released. The base directory has been changed to `localcolabfold` from `colabfold_batch` to distinguish it from the execution command. 32 | - 09Mar2023, version 1.5.0 released. See [Release v1.5.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.5.0) 33 | - 05Feb2023, version 1.5.0-pre released. 34 | - 16Jun2022, version 1.4.0 released. See [Release v1.4.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.4.0) 35 | - 07May2022, **Updated `update_linux.sh`.** See also [How to update](#how-to-update). Please use a new option `--use-gpu-relax` if GPU relaxation is required (recommended). 36 | - 12Apr2022, version 1.3.0 released. See [Release v1.3.0](https://github.com/YoshitakaMo/localcolabfold/releases/tag/v1.3.0) 37 | - 09Dec2021, version 1.2.0-beta released. easy-to-use updater scripts added. See [How to update](#how-to-update). 38 | - 04Dec2021, LocalColabFold is now compatible with the latest [pip installable ColabFold](https://github.com/sokrypton/ColabFold#running-locally). In this repository, I will provide a script to install ColabFold with some external parameter files to perform relaxation with AMBER. The weight parameters of AlphaFold and AlphaFold-Multimer will be downloaded automatically at your first run. 39 | 40 | ## Installation 41 | 42 | ### For Linux 43 | 44 | 1. Make sure `curl`, `git`, and `wget` commands are already installed on your PC. If not present, you need install them at first. For Ubuntu, type `sudo apt -y install curl git wget`. 45 | 2. Make sure your Cuda compiler driver is **11.8 or later** (the latest version 12.4 is preferable). If you don't have a GPU or don't plan to use a GPU, you can skip this step :
$ nvcc --version 46 | nvcc: NVIDIA (R) Cuda compiler driver 47 | Copyright (c) 2005-2022 NVIDIA Corporation 48 | Built on Wed_Sep_21_10:33:58_PDT_2022 49 | Cuda compilation tools, release 11.8, V11.8.89 50 | Build cuda_11.8.r11.8/compiler.31833905_0 51 |DO NOT use `nvidia-smi` to check the version.
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.shand run it in the directory where you want to install:
$ bash install_colabbatch_linux.shAbout 5 minutes later, `localcolabfold` directory will be created. Do not move this directory after the installation. 55 | 56 | Keep the network unblocked. And **check the log** output to see if there are any errors. 57 | 58 | If you find errors in the output log, the easiest way is to check the network and delete the localcolabfold directory, then re-run the installation script. 59 | 60 | 5. Add environment variable PATH:
# For bash or zsh61 | It is recommended to add this export command to `~/.bashrc` and restart bash (`~/.bashrc` will be executed every time bash is started) 62 | 63 | 6. To run the prediction, type
# e.g. export PATH="/home/moriwaki/Desktop/localcolabfold/colabfold-conda/bin:\$PATH"
export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:\$PATH"
colabfold_batch input outputdir/The result files will be created in the `outputdir`. This command will execute the prediction without templates and relaxation (energy minimization). If you want to use templates and relaxation, add `--templates` and `--amber` flags, respectively. For example, 64 | 65 |
colabfold_batch --templates --amber input outputdir/66 | 67 | `colabfold_batch` will automatically detect whether the prediction is for monomeric or complex prediction. In most cases, users don't have to add `--model-type alphafold2_multimer_v3` to turn on multimer prediction. `alphafold2_multimer_v1, alphafold2_multimer_v2` are also available. Default is `auto` (use `alphafold2_ptm` for monomers and `alphafold2_multimer_v3` for complexes.) 68 | 69 | If you have some errors on `--amber` relaxation, adding `export LD_LIBRARY_PATH=“/path/to/your/localcolabfold/colabfold-conda/lib:${LD_LIBRARY_PATH}”` may solve this issue before running `colabfold_batch`. 70 | For more details, see [Flags](#flags) and `colabfold_batch --help`. 71 | 72 | ### For WSL2 (in Windows) 73 | 74 | **Caution: If your installation fails due to symbolic link (`symlink`) creation issues, this is due to the Windows file system being case-insensitive (while the Linux file system is case-sensitive).** To resolve this, run the following command on Windows Powershell: 75 | ``` 76 | fsutil file SetCaseSensitiveInfo path\to\localcolabfold\installation enable 77 | ``` 78 | 79 | Replace `path\to\colabfold\installation` with the path to the directory where you are installing LocalColabFold. Also, make sure that you are running the command on Windows Powershell (not WSL). For more details, see [Adjust Case Sensitivty (Microsoft)](https://learn.microsoft.com/en-us/windows/wsl/case-sensitivity). 80 | 81 | Before running the prediction: 82 | 83 | ``` 84 | export TF_FORCE_UNIFIED_MEMORY="1" 85 | export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0" 86 | export XLA_PYTHON_CLIENT_ALLOCATOR="platform" 87 | export TF_FORCE_GPU_ALLOW_GROWTH="true" 88 | ``` 89 | 90 | It is recommended to add these export commands to `~/.bashrc` and restart bash (`~/.bashrc` will be executed every time bash is started) 91 | 92 | ### For macOS 93 | 94 | **Caution: Due to the lack of Nvidia GPU/CUDA driver, the structure prediction on macOS are 5-10 times slower than on Linux+GPU**. For the test sequence (58 a.a.), it may take 30 minutes. However, it may be useful to play with it before preparing Linux+GPU environment. 95 | 96 | You can check whether your Mac is Intel or Apple Silicon by typing `uname -m` on Terminal. 97 | 98 | ```bash 99 | $ uname -m 100 | x86_64 # Intel 101 | arm64 # Apple Silicon 102 | ``` 103 | 104 | Please use the correct installer for your Mac. 105 | 106 | #### For Mac with Intel CPU 107 | 108 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"109 | 2. Install `wget`, `gnu-sed`, [HH-suite](https://github.com/soedinglab/hh-suite) and [kalign](https://github.com/TimoLassmann/kalign) using Homebrew:
$ brew install wget gnu-sed110 | 3. Download `install_colabbatch_intelmac.sh` from this repository:
\$ brew install brewsci/bio/hh-suite brewsci/bio/kalign
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_intelmac.shand run it in the directory where you want to install:
$ bash install_colabbatch_intelmac.shAbout 5 minutes later, `colabfold_batch` directory will be created. Do not move this directory after the installation. 111 | 4. The rest procedure is the same as "For Linux". 112 | 113 | #### For Mac with Apple Silicon (M1 chip) 114 | 115 | **Note: This installer is experimental because most of the dependent packages are not fully tested on Apple Silicon Mac.** 116 | 117 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"118 | 2. Install several commands using Homebrew (Now kalign 3.3.2 is available!):
$ brew install wget cmake gnu-sed119 | 3. Install `miniforge` command using Homebrew:
$ brew install brewsci/bio/hh-suite
$ brew install brewsci/bio/kalign
$ brew install --cask miniforge120 | 4. Download `install_colabbatch_M1mac.sh` from this repository:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.shand run it in the directory where you want to install:
$ bash install_colabbatch_M1mac.shAbout 5 minutes later, `colabfold_batch` directory will be created. Do not move this directory after the installation. **You can ignore the installation errors that appear along the way**. 121 | 5. The rest procedure is the same as "For Linux". 122 | 123 | ### Input Examples 124 | 125 | ColabFold can accept multiple file formats or directory. 126 | 127 | ``` 128 | positional arguments: 129 | input Can be one of the following: Directory with fasta/a3m 130 | files, a csv/tsv file, a fasta file or an a3m file 131 | results Directory to write the results to 132 | ``` 133 | 134 | #### fasta format 135 | 136 | It is recommended that the header line starting with `>` be short since the description will be the prefix of the output file. It is acceptable to insert line breaks in the amino acid sequence. 137 | 138 | ```:P61823.fasta 139 | >sp|P61823 140 | MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN 141 | LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN 142 | CAYKTTQANKHIIVACEGNPYVPVHFDASV 143 | ``` 144 | 145 | **For prediction of multimers, insert `:` between the protein sequences.** 146 | 147 | ``` 148 | >1BJP_homohexamer 149 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR: 150 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR: 151 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR: 152 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR: 153 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR: 154 | PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR 155 | ``` 156 | 157 | ``` 158 | >3KUD_RasRaf_complex 159 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ 160 | YMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIP 161 | YIETSAKTRQGVEDAFYTLVREIRQH: 162 | PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAAS 163 | LIGEELQVDFL 164 | ``` 165 | 166 | Multiple `>` header lines with sequences in a FASTA format file yield multiple predictions at once in the specified output directory. 167 | 168 | #### csv format 169 | 170 | In a csv format, `id` and `sequence` should be separated by `,`. 171 | 172 | ```:test.csv 173 | id,sequence 174 | 5AWL_1,YYDPETGTWY 175 | 3G5O_A_3G5O_B,MRILPISTIKGKLNEFVDAVSSTQDQITITKNGAPAAVLVGADEWESLQETLYWLAQPGIRESIAEADADIASGRTYGEDEIRAEFGVPRRPH:MPYTVRFTTTARRDLHKLPPRILAAVVEFAFGDLSREPLRVGKPLRRELAGTFSARRGTYRLLYRIDDEHTTVVILRVDHRADIYRR 176 | ``` 177 | 178 | #### a3m format 179 | 180 | You can input your a3m format MSA file. For multimer predictions, the a3m file should be compatible with colabfold format. 181 | 182 | ### Flags 183 | 184 | These flags are useful for the predictions. 185 | 186 | - **`--amber`** : Use amber for structure refinement (relaxation / energy minimization). To control number of top ranked structures are relaxed set `--num-relax`. 187 | - **`--templates`** : Use templates from pdb. 188 | - **`--use-gpu-relax`** : Run amber on NVidia GPU instead of CPU. This feature is only available on a machine with Nvidia GPUs. 189 | - **`--num-recycle
$ nvcc --version 19 | nvcc: NVIDIA (R) Cuda compiler driver 20 | Copyright (c) 2005-2022 NVIDIA Corporation 21 | Built on Wed_Sep_21_10:33:58_PDT_2022 22 | Cuda compilation tools, release 11.8, V11.8.89 23 | Build cuda_11.8.r11.8/compiler.31833905_0 24 |バージョンチェックの時に`nvidia-smi`コマンドを使わないでください。こちらでは不正確です。
$ gcc --version 26 | gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 27 | Copyright (C) 2019 Free Software Foundation, Inc. 28 | This is free software; see the source for copying conditions. There is NO 29 | warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 30 |もしバージョンが4.8.5以前の場合は(CentOS 7だとよくありがち)、新しいGCCをインストールしてそれにPATHを通してください。スパコンの場合はEnvironment moduleの`module avail`の中にあるかもしれません。 31 | 1. このリポジトリにある`install_colabbatch_linux.sh`をダウンロードします。
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabbatch_linux.shおよそ5分後に`colabfold_batch`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。 32 | 2. `cd colabfold_batch`を入力してこのディレクトリに入ります。 33 | 3. 環境変数`PATH`を追加します。
# For bash or zshこの1行を`~/.bashrc`または`~/.zshrc`に追記しておくと便利です。 34 | 4. 以下のコマンドでColabFoldを実行します。
# e.g. export PATH="/home/moriwaki/Desktop/colabfold_batch/bin:\$PATH"
export PATH="/bin:\$PATH"
colabfold_batch --amber --templates --num-recycle 3 inputfile outputdir/結果のファイルは`outputdir`に生成されます. 詳細な使い方は`colabfold_batch --help`コマンドで確認してください。 35 | 36 | ### macOSの場合 37 | 38 | **注意: macOSではNvidia GPUとCUDAドライバがないため、構造推論部分がLinux+GPU環境に比べて5〜10倍ほど遅くなります**。テスト用のアミノ酸配列(58アミノ酸)ではおよそ30分ほど計算に時間がかかります。ただ、Linux+GPU環境を準備する前にこれで遊んでみるのはありかもしれません。 39 | 40 | また、自身の持っているMacがIntel CPUのものか、M1 chip入りのもの(Apple Silicon)かを先に確認してください。ターミナルで`uname -m`の結果でどちらかが判明します。 41 | 42 | ```bash 43 | $ uname -m 44 | x86_64 # Intel 45 | arm64 # Apple Silicon 46 | ``` 47 | 48 | (Apple SiliconでRosetta2を使っている場合はApple Siliconでもx86_64って表示されますけれど……今のところこれには対応していません。) 49 | 50 | 以上の結果を踏まえて適切なインストーラーを選択してください。 51 | 52 | #### Intel CPUのMacの場合 53 | 54 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"55 | 2. Homebrewで`wget`, `gnu-sed`, [HH-suite](https://github.com/soedinglab/hh-suite)と[kalign](https://github.com/TimoLassmann/kalign)をインストールします
$ brew install wget gnu-sed56 | 3. `install_colabbatch_intelmac.sh`をこのリポジトリからダウンロードします:
\$ brew install brewsci/bio/hh-suite brewsci/bio/kalign
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_intelmac.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabbatch_intelmac.shおよそ5分後に`colabfold_batch`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。 57 | 4. 残りの手順は"Linux+GPUの場合"と同様です. 58 | 59 | #### Apple Silicon (M1 chip)のMacの場合 60 | 61 | **Note: 依存するPythonパッケージのほとんどがまだApple Silicon Macで十分にテストされていないため、このインストーラーによる動作は試験的なものです。** 62 | 63 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"64 | 1. いくつかのコマンドをHomebrewでインストールします。(現在kalignはM1 macでインストールすることはできないみたいですが、問題ありません):
$ brew install wget cmake gnu-sed65 | 2. `miniforge`をHomebrewでインストールします:
$ brew install brewsci/bio/hh-suite
$ brew install --cask miniforge66 | 3. インストーラー`install_colabbatch_M1mac.sh`をこのリポジトリからダウンロードします:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabbatch_M1mac.shおよそ5分後に`colabfold_batch`ディレクトリができます。途中色々WarningsやErrorが出るかもしれませんが大丈夫です。インストール後はこのディレクトリを移動させないでください。 67 | 4. 残りの手順は"Linux+GPUの場合"と同様です. 68 | 69 | ## アップデートのやり方 70 | 71 | [ColabFold](https://github.com/sokrypton/ColabFold)はいまだ開発途中であるため、最新の機能を利用するためにはこのlocalcolabfoldも頻繁にアップデートする必要があります。そこでお手軽にアップデートするためのスクリプトを用意しました。 72 | 73 | アップデートは`localcolabfold`ディレクトリで以下のように入力するだけです。 74 | 75 | ```bash 76 | $ ./update_linux.sh . # if Linux 77 | $ ./update_intelmac.sh . # if Intel Mac 78 | $ ./update_M1mac.sh . # if M1 Mac 79 | ``` 80 | 81 | また、もしすでに1.2.0-beta以前からlocalcolabfoldをインストールしていた場合は、まずこれらのアップデートスクリプトをダウンロードしてきてから実行してください。例として以下のような感じです。 82 | 83 | ```bash 84 | # set your OS. Select one of the following variables {linux,intelmac,M1mac} 85 | $ OS=linux # if Linux 86 | # navigate to the directory where you installed localcolabfold, e.g. 87 | $ cd /home/moriwaki/Desktop/localcolabfold/ 88 | $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_${OS}.sh 89 | $ chmod +x update_${OS}.sh 90 | $ ./update_${OS}.sh /path/to/your/localcolabfold 91 | ``` 92 | 93 | ## LocalColabFoldを利用する利点 94 | 95 | - **お使いのパソコンにNvidia GPUとCUDAドライバがあれば、AlphaFold2による構造推論(Structure inference)と構造最適化(relax)が高速になります。** 96 | - **Google Colabは90分アイドルにしていたり、12時間以上の利用でタイムアウトしますが、その制限がありません。また、GPUの使用についても当然制限がありません。** 97 | - **データベースをダウンロードしてくる必要がないです**。 98 | 99 | ## FAQ 100 | - インストールの事前準備は? 101 | - `curl`, `wget`コマンド以外は不要です 102 | - BFD, Mgnify, PDB70, Uniclust30などの巨大なデータベースを用意する必要はありますか? 103 | - **必要ないです**。 104 | - AlphaFold2の最初の動作に必要なMSA作成はどのように行っていますか? 105 | - MSA作成はColabFoldと同様にMMseqs2のウェブサーバーによって行われています。 106 | - ColabFoldで表示されるようなpLDDTスコアやPAEの図も生成されますか? 107 | - はい、生成されます。 108 | - ホモ多量体予測、複合体予測も可能ですか? 109 | - はい、可能です。配列の入力方法は[ColabFold: AlphaFold2 using MMseqs2](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb)のやり方と同じです。 110 | - jackhmmerによるMSA作成は可能ですか? 111 | - **現在のところ対応していません**。 112 | - 複数のGPUを利用して計算を行いたい。 113 | - **AlphaFold, ColabFoldは複数GPUを利用した構造予測はできないようです**。1つのGPUでしか計算できません。 114 | - 長いアミノ酸を予測しようとしたときに`ResourceExhausted`というエラーが発生するのを解決したい。 115 | - 上と同じissueを読んでください。 116 | - `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`というエラーメッセージが出る 117 | - CUDA 11.1以降にアップデートされていない可能性があります。`nvcc --version`コマンドでCuda compilerのバージョンを確認してみてください。 118 | - Windows 10の上でも利用することはできますか? 119 | - [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10)を入れればWindows 10の上でも同様に動作させることができます。 120 | - (New!) 自作したA3Mファイルを利用して構造予測を行いたい。 121 | - **現在ColabFoldはFASTAファイル以外にも様々な入力を受け取ることが可能です**。詳細な使い方はヘルプメッセージを読んでください。手持ちのA3Mフォーマットファイル、FASTAフォーマットで入力された複数のアミノ酸配列を含む1つのfastaファイル、さらにはディレクトリ自体をインプットに指定する事が可能です。 122 | 123 | ## Tutorials & Presentations 124 | 125 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI). 126 | 127 | ## Acknowledgments 128 | 129 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)). 130 | 131 | ## How do I reference this work? 132 | 133 | - Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all.
$ nvcc --version 11 | nvcc: NVIDIA (R) Cuda compiler driver 12 | Copyright (c) 2005-2020 NVIDIA Corporation 13 | Built on Mon_Oct_12_20:09:46_PDT_2020 14 | Cuda compilation tools, release 11.1, V11.1.105 15 | Build cuda_11.1.TC455_06.29190527_0 16 |DO NOT use `nvidia-smi` for checking the version.
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_linux.shand run it in the directory where you want to install:
$ bash install_colabfold_linux.shAbout 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation. 18 | 1. Type `cd colabfold` to enter the directory. 19 | 1. Modify the variables such as `sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`, `jobname = "test"`, and etc. in `runner.py` for your prediction. For more information, please refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb). 20 | 1. To run the prediction, type
$ colabfold-conda/bin/python3.7 runner.pyin the `colabfold` directory. The result files will be created in the `predition_
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"39 | 1. Install `wget` command using Homebrew:
$ brew install wget40 | 1. Download `install_colabfold_intelmac.sh` from this repository:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_intelmac.shand run it in the directory where you want to install:
$ bash install_colabfold_intelmac.shAbout 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation. 41 | 1. The rest procedure is the same as "For Linux". 42 | 43 | #### For Mac with Apple Silicon (M1 chip) 44 | 45 | **Note: This installer is experimental because most of the dependent packages are not fully tested on Apple Silicon Mac.** 46 | 47 | 1. Install [Homebrew](https://brew.sh/index_ja) if not present:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"48 | 1. Install `wget` and `cmake` commands using Homebrew:
$ brew install wget cmake49 | 1. Install `miniforge` command using Homebrew:
$ brew install --cask miniforge50 | 1. Download `install_colabfold_M1mac.sh` from this repository:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_M1mac.shand run it in the directory where you want to install:
$ bash install_colabfold_M1mac.shAbout 5 minutes later, `colabfold` directory will be created. Do not move this directory after the installation. 51 | 1. Type `cd colabfold` to enter the directory. 52 | 1. Modify the variables such as `sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`, `jobname = "test"`, and etc. in `runner.py` for your prediction. For more information, please refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb). 53 | 1. To run the prediction, type
$ colabfold-conda/bin/python3.8 runner.pyin the `colabfold` directory. The result files will be created in the `predition_
>6X9Z_1|Chain A|Transmembrane beta-barrels|synthetic construct (32630) 69 | MEQKPGTLMVYVVVGYNTDNTVDVVGGAQYAVSPYLFLDVGYGWNNSSLNFLEVGGGVSYKVSPDLEPYVKAGFEYNTDNTIKPTAGAGALYRVSPNLALMVEYGWNNSSLQKVAIGIAYKVKD70 | 2. Type `export PATH="/path/to/colabfold/bin:$PATH"` to add a path to the PATH environment variable. For example, `export PATH="/home/foo/bar/colabfold/bin:$PATH"` if you installed localcolabfold on `/home/foo/bar/colabfold`. 71 | 3. Run colabfold command with your FASTA file. For example,
$ colabfold --input 6x9z.fasta \\ 72 | --output_dir 6x9z \\ 73 | --max_recycle 18 \\ 74 | --use_ptm \\ 75 | --use_turbo \\ 76 | --num_relax Top5This will predict a protein structure [6x9z](https://www.rcsb.org/structure/6x9z) with increasing the number of 'recycling' to 18. This may be effective for *de novo* structure prediction. For another example, [PDB: 3KUD](https://www.rcsb.org/structure/3KUD),
$ colabfold --input 3kud_complex.fasta \\ 77 | --output_dir 3kud \\ 78 | --homooligomer 1:1 \\ 79 | --use_ptm \\ 80 | --use_turbo \\ 81 | --max_recycle 3 \\ 82 | --num_relax Top5where the input sequence `3kud_complex.fasta` is
>3KUD_complex 83 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH: 84 | PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFLThis will predict a heterooligomer. For more information about the options, type `colabfold --help` or refer to the original [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb). 85 | 86 | ## Advantages of LocalColabFold 87 | - **Structure inference and relaxation will be accelerated if your PC has Nvidia GPU and CUDA drivers.** 88 | - **No Time out (90 minutes and 12 hours)** 89 | - **No GPU limitations** 90 | - **NOT necessary to prepare the large database required for native AlphaFold2**. 91 | 92 | ## FAQ 93 | - What else do I need to do before installation? Do I need sudo privileges? 94 | - No, except for installation of `curl` and `wget` commands. 95 | - Do I need to prepare the large database such as PDB70, BFD, Uniclust30, MGnify...? 96 | - **No. it is not necessary.** Generation of MSA is performed by the MMseqs2 web server, just as implemented in ColabFold. 97 | - Are the pLDDT score and PAE figures available? 98 | - Yes, they will be generated just like the ColabFold. 99 | - Is it possible to predict homooligomers and complexes? 100 | - Yes, the sequence input is the same as ColabFold. See [ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb). 101 | - Is it possible to create MSA by jackhmmer? 102 | - **No, it is not currently supported**. 103 | - I want to run the predictions step-by-step like Google Colab. 104 | - You can use VSCode and Python plugin to do the same. See https://code.visualstudio.com/docs/python/jupyter-support-py. 105 | - I want to use multiple GPUs to perform the prediction. 106 | - You need to set the environment variables `TF_FORCE_UNIFIED_MEMORY`,`XLA_PYTHON_CLIENT_MEM_FRACTION` before execution. See [this discussion](https://github.com/YoshitakaMo/localcolabfold/issues/7#issuecomment-923027641). 107 | - I want to solve the `ResourceExhausted` error when trying to predict for a sequence with > 1000 residues. 108 | - See the same discussion as above. 109 | - I got an error message `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`. 110 | - You may not have updated to CUDA 11.1 or later. Please check the version of Cuda compiler with `nvcc --version` command, not `nvidia-smi`. 111 | - Is this available on Windows 10? 112 | - You can run LocalColabFold on your Windows 10 with [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10). 113 | 114 | ## Tutorials & Presentations 115 | 116 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI). 117 | 118 | ## Acknowledgments 119 | 120 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)). 121 | 122 | ## How do I reference this work? 123 | 124 | - Mirdita M, Schuetze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. *bioRxiv*, doi: [10.1101/2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2) (2021) 125 | - John Jumper, Richard Evans, Alexander Pritzel, et al. - Highly accurate protein structure prediction with AlphaFold. *Nature*, 1–11, doi: [10.1038/s41586-021-03819-2](https://www.nature.com/articles/s41586-021-03819-2) (2021) 126 | 127 | [](https://doi.org/10.5281/zenodo.5123296) 128 | -------------------------------------------------------------------------------- /v1.0.0/README_ja.md: -------------------------------------------------------------------------------- 1 | # LocalColabFold 2 | 3 | 個人用パソコンのCPUとGPUで動かす[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)。 4 | 5 | ## インストール方法 6 | 7 | ### Linux+GPUの場合 8 | 9 | 1. ターミナル上で`curl`, `git`と`wget`コマンドがすでにインストールされていることを確認します。存在しない場合は先にこれらをインストールしてください。Ubuntuの場合はtype `sudo apt -y install curl git wget`でインストールできます。 10 | 2. **Cuda compilerのバージョンが11.1以降であることを確認します。**
$ nvcc --version 11 | nvcc: NVIDIA (R) Cuda compiler driver 12 | Copyright (c) 2005-2020 NVIDIA Corporation 13 | Built on Mon_Oct_12_20:09:46_PDT_2020 14 | Cuda compilation tools, release 11.1, V11.1.105 15 | Build cuda_11.1.TC455_06.29190527_0 16 |バージョンチェックの時に`nvidia-smi`コマンドを使わないでください。こちらでは不正確です。
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_linux.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabfold_linux.shおよそ5分後に`colabfold`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。 18 | 4. `cd colabfold`を入力してこのディレクトリに入ります。 19 | 5. `runner.py`ファイル中の`sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`や`jobname = "test"`などのパラメータを変更し、構造予測のために必要な情報を入力します。詳細な設定方法についてはオリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)を参考にしてください。こちらで可能な設定はほとんど利用可能です(MSA_methods以外)。 20 | 6. 予測を行うには、`colabfold`ディレクトリ内で以下のコマンドをターミナルで入力してください:
$ colabfold-conda/bin/python3.7 runner.py予測結果のファイルは`predition_
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"41 | 2. Homebrewで`wget`コマンドをインストールします:
$ brew install wget42 | 3. `install_colabfold_intelmac.sh`をこのリポジトリからダウンロードします:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_intelmac.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabfold_intelmac.shおよそ5分後に`colabfold`ディレクトリができます。インストール後はこのディレクトリを移動させないでください。 43 | 4. 残りの手順は"Linux+GPUの場合"と同様です. 44 | 45 | #### Apple Silicon (M1 chip)のMacの場合 46 | 47 | **Note: 依存するPythonパッケージのほとんどがまだApple Silicon Macで十分にテストされていないため、このインストーラーによる動作は試験的なものです。** 48 | 49 | 1. [Homebrew](https://qiita.com/zaburo/items/29fe23c1ceb6056109fd)をインストールします:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"50 | 1. Homebrweで`wget`と`cmake`コマンドをインストールします:
$ brew install wget cmake51 | 1. `miniforge`をHomebrewでインストールします:
$ brew install --cask miniforge52 | 1. インストーラー`install_colabfold_M1mac.sh`をこのリポジトリからダウンロードします:
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabfold_M1mac.shこれをインストールしたいディレクトリの上に置いた後、以下のコマンドを入力します:
$ bash install_colabfold_M1mac.shおよそ5分後に`colabfold`ディレクトリができます。途中色々WarningsやErrorが出るかもしれません。インストール後はこのディレクトリを移動させないでください。 53 | 1. `cd colabfold`を入力してこのディレクトリに入ります。 54 | 1. `runner.py`ファイル中の`sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'`や`jobname = "test"`などのパラメータを変更し、構造予測のために必要な情報を入力します。詳細な設定方法についてはオリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)を参考にしてください。こちらで可能な設定はほとんど利用可能です(MSA_methods以外)。 55 | 1. 予測を行うには、`colabfold`ディレクトリ内で以下のコマンドをターミナルで入力してください:
$ colabfold-conda/bin/python3.8 runner.py予測結果のファイルは`predition_
>6X9Z_1|Chain A|Transmembrane beta-barrels|synthetic construct (32630) 72 | MEQKPGTLMVYVVVGYNTDNTVDVVGGAQYAVSPYLFLDVGYGWNNSSLNFLEVGGGVSYKVSPDLEPYVKAGFEYNTDNTIKPTAGAGALYRVSPNLALMVEYGWNNSSLQKVAIGIAYKVKD73 | 1. `export PATH="/path/to/colabfold/bin:$PATH"`と打つことで環境変数PATHにこのcolabfoldシェルスクリプトのファイルパスを設定します。例えばLocalColabFoldを`/home/foo/bar/colabfold`にインストールした場合は、`export PATH="/home/foo/bar/colabfold/bin:$PATH"`と入力します。 74 | 1. 入力のアミノ酸配列ファイルを`--input`の引数に指定し、`colabfold`コマンドを実行します。例えばこんな感じ
$ colabfold --input 6x9z.fasta \\ 75 | --output_dir 6x9z \\ 76 | --max_recycle 18 \\ 77 | --use_ptm \\ 78 | --use_turbo \\ 79 | --num_relax Top5上記コマンドは*de novo*タンパク質構造[PDB: 6X9Z](https://www.rcsb.org/structure/6x9z)を予想するときに、'recycling'回数を最大18回まで引き上げています。この回数の引き上げは*de novo*タンパク質構造を予測する時には効果的であることが示されています(通常のタンパク質は3回で十分なことがほとんどです)。
$ colabfold --input 3kud_complex.fasta \\ 80 | --output_dir 3kud \\ 81 | --homooligomer 1:1 \\ 82 | --use_ptm \\ 83 | --use_turbo \\ 84 | --max_recycle 3 \\ 85 | --num_relax Top5ここで入力配列`3kud_complex.fasta`は以下の通りです。
>3KUD_complex 86 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH: 87 | PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL 88 |`:`記号でアミノ酸配列を隔てることで複合体予測をすることができます。この場合はヘテロ複合体予測になっています。ホモオリゴマー予測を行いたいときなど、他の設定については`colabfold --help`で設定方法を読むか、オリジナルの[ColabFold / AlphaFold2_advanced](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb)にある説明を読んでください。 89 | 90 | ## LocalColabFoldを利用するメリット 91 | 92 | - **お使いのパソコンにNvidia GPUとCUDAドライバがあれば、AlphaFold2による構造推論(Structure inference)と構造最適化(relax)が高速になります。** 93 | - **Google Colabは90分アイドルにしていたり、12時間以上の利用でタイムアウトしますが、その制限がありません。また、GPUの使用についても当然制限がありません。** 94 | - **データベースをダウンロードしてくる必要がないです**。 95 | 96 | ## FAQ 97 | - インストールの事前準備は? 98 | - `curl`, `wget`コマンド以外は不要です 99 | - BFD, Mgnify, PDB70, Uniclust30などの巨大なデータベースを用意する必要はありますか? 100 | - **必要ないです**。 101 | - AlphaFold2の最初の動作に必要なMSA作成はどのように行っていますか? 102 | - MSA作成はColabFoldと同様にMMseqs2のウェブサーバーによって行われています。 103 | - ColabFoldで表示されるようなpLDDTスコアやPAEの図も生成されますか? 104 | - はい、生成されます。 105 | - ホモ多量体予測、複合体予測も可能ですか? 106 | - はい、可能です。配列の入力方法はGoogle Colabのやり方と同じです。 107 | - jackhmmerによるMSA作成は可能ですか? 108 | - **現在のところ対応していません**。 109 | - Google Colabのようにセルごとに実行したい。 110 | - VSCodeとPythonプラグインを使えば同様のことができます。See https://code.visualstudio.com/docs/python/jupyter-support-py . 111 | - 複数のGPUを利用して計算を行いたい。 112 | - 実行前に環境変数`TF_FORCE_UNIFIED_MEMORY`,`XLA_PYTHON_CLIENT_MEM_FRACTION`を設定する必要があります。[こちらのissue](https://github.com/YoshitakaMo/localcolabfold/issues/7#issuecomment-923027641)を読んでください。 113 | - 長いアミノ酸を予測しようとしたときに`ResourceExhausted`というエラーが発生するのを解決したい。 114 | - 上と同じissueを読んでください。 115 | - `CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered`というエラーメッセージが出る 116 | - CUDA 11.1以降にアップデートされていない可能性があります。`nvcc --version`コマンドでCuda compilerのバージョンを確認してみてください。 117 | - Windows 10の上でも利用することはできますか? 118 | - [WSL2](https://docs.microsoft.com/en-us/windows/wsl/install-win10)を入れればWindows 10の上でも同様に動作させることができます。 119 | 120 | ## Tutorials & Presentations 121 | 122 | - ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [[video]](https://www.youtube.com/watch?v=Rfw7thgGTwI) [[slides]](https://docs.google.com/presentation/d/1mnffk23ev2QMDzGZ5w1skXEadTe54l8-Uei6ACce8eI). 123 | 124 | ## Acknowledgments 125 | 126 | - The original colabfold was first created by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)). 127 | 128 | ## How do I reference this work? 129 | 130 | - Mirdita M, Schuetze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. *bioRxiv*, doi: [10.1101/2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2) (2021) 131 | - John Jumper, Richard Evans, Alexander Pritzel, et al. - Highly accurate protein structure prediction with AlphaFold. *Nature*, 1–11, doi: [10.1038/s41586-021-03819-2](https://www.nature.com/articles/s41586-021-03819-2) (2021) 132 | 133 | 134 | [](https://doi.org/10.5281/zenodo.5123296) 135 | -------------------------------------------------------------------------------- /v1.0.0/colabfold_alphafold.patch: -------------------------------------------------------------------------------- 1 | --- colabfold_alphafold.py.orig 2021-10-24 10:56:09.887461716 +0900 2 | +++ colabfold_alphafold.py 2021-10-24 11:25:12.811888920 +0900 3 | @@ -32,6 +32,13 @@ try: 4 | except: 5 | IN_COLAB = False 6 | 7 | +if os.getenv('COLABFOLD_PATH'): 8 | + print("COLABFOLD_PATH is set to " + os.getenv('COLABFOLD_PATH')) 9 | + colabfold_path = os.getenv('COLABFOLD_PATH') 10 | +else: 11 | + print("COLABFOLD_PATH is not set.") 12 | + colabfold_path = '.' 13 | + 14 | import tqdm.notebook 15 | TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]' 16 | 17 | @@ -641,7 +648,7 @@ def prep_model_runner(opt=None, model_na 18 | cfg.model.recycle_tol = opt["tol"] 19 | cfg.data.eval.num_ensemble = opt["num_ensemble"] 20 | 21 | - params = data.get_model_haiku_params(name, params_loc) 22 | + params = data.get_model_haiku_params(name, colabfold_path + "/" + params_loc) 23 | return {"model":model.RunModel(cfg, params, is_training=opt["is_training"]), "opt":opt} 24 | else: 25 | return old_runner 26 | @@ -749,7 +756,7 @@ def run_alphafold(feature_dict, opt=None 27 | pbar.set_description(f'Running {key}') 28 | 29 | # replace model parameters 30 | - params = data.get_model_haiku_params(name, params_loc) 31 | + params = data.get_model_haiku_params(name, colabfold_path + "/" + params_loc) 32 | for k in runner["model"].params.keys(): 33 | runner["model"].params[k] = params[k] 34 | 35 | -------------------------------------------------------------------------------- /v1.0.0/gpurelaxation.patch: -------------------------------------------------------------------------------- 1 | --- alphafold/relax/amber_minimize.py.org 2021-08-31 16:59:21.161164190 +0900 2 | +++ alphafold/relax/amber_minimize.py 2021-08-31 16:59:32.073226369 +0900 3 | @@ -90,7 +90,7 @@ def _openmm_minimize( 4 | _add_restraints(system, pdb, stiffness, restraint_set, exclude_residues) 5 | 6 | integrator = openmm.LangevinIntegrator(0, 0.01, 0.0) 7 | - platform = openmm.Platform.getPlatformByName("CPU") 8 | + platform = openmm.Platform.getPlatformByName("CUDA") 9 | simulation = openmm_app.Simulation( 10 | pdb.topology, system, integrator, platform) 11 | simulation.context.setPositions(pdb.positions) 12 | @@ -530,7 +530,7 @@ def get_initial_energies(pdb_strs: Seque 13 | simulation = openmm_app.Simulation(openmm_pdbs[0].topology, 14 | system, 15 | openmm.LangevinIntegrator(0, 0.01, 0.0), 16 | - openmm.Platform.getPlatformByName("CPU")) 17 | + openmm.Platform.getPlatformByName("CUDA")) 18 | energies = [] 19 | for pdb in openmm_pdbs: 20 | try: 21 | -------------------------------------------------------------------------------- /v1.0.0/install_colabfold_M1mac.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # check whether `wget` and `cmake` are installed 4 | type wget || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; } 5 | type cmake || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; } 6 | 7 | # check whether miniforge is present 8 | test -f "/opt/homebrew/Caskroom/miniforge/base/etc/profile.d/conda.sh" || { echo "Install miniforge by using Homebrew before installation. \n 'brew install --cask miniforge'" ; exit 1 ; } 9 | 10 | # check whether Apple Silicon (M1 mac) or Intel Mac 11 | arch_name="$(uname -m)" 12 | 13 | if [ "${arch_name}" = "x86_64" ]; then 14 | if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then 15 | echo "Running on Rosetta 2" 16 | else 17 | echo "Running on native Intel" 18 | fi 19 | echo "This installer is only for Apple Silicon. Use install_colabfold_intelmac.sh to install on this Mac." 20 | exit 1 21 | elif [ "${arch_name}" = "arm64" ]; then 22 | echo "Running on Apple Silicon (M1 mac)" 23 | else 24 | echo "Unknown architecture: ${arch_name}" 25 | exit 1 26 | fi 27 | 28 | GIT_REPO="https://github.com/deepmind/alphafold" 29 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar" 30 | CURRENTPATH=`pwd` 31 | COLABFOLDDIR="${CURRENTPATH}/colabfold" 32 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params" 33 | MSATOOLS="${COLABFOLDDIR}/tools" 34 | 35 | # download the original alphafold as "${COLABFOLDDIR}" 36 | echo "downloading the original alphafold as ${COLABFOLDDIR}..." 37 | rm -rf ${COLABFOLDDIR} 38 | git clone ${GIT_REPO} ${COLABFOLDDIR} 39 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet) 40 | 41 | # colabfold patches 42 | echo "Applying several patches to be Alphafold2_advanced..." 43 | cd ${COLABFOLDDIR} 44 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py 45 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py 46 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py 47 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch 48 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch 49 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch 50 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch 51 | # GPU relaxation patch 52 | # wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch 53 | 54 | # donwload reformat.pl from hh-suite 55 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl 56 | # Apply multi-chain patch from Lim Heo @huhlim 57 | patch -u alphafold/common/protein.py -i protein.patch 58 | patch -u alphafold/model/model.py -i model.patch 59 | patch -u alphafold/model/modules.py -i modules.patch 60 | patch -u alphafold/model/config.py -i config.patch 61 | cd .. 62 | 63 | # Downloading parameter files 64 | echo "Downloading AlphaFold2 trained parameters..." 65 | mkdir -p ${PARAMS_DIR} 66 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR} 67 | 68 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure 69 | echo "Downloading stereo_chemical_props.txt..." 70 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt 71 | mkdir -p ${COLABFOLDDIR}/alphafold/common 72 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common 73 | 74 | # echo "installing HH-suite 3.3.0..." 75 | # mkdir -p ${MSATOOLS} 76 | # git clone --branch v3.3.0 https://github.com/soedinglab/hh-suite.git hh-suite-3.3.0 77 | # (cd hh-suite-3.3.0 ; mkdir build ; cd build ; cmake -DCMAKE_INSTALL_PREFIX=${MSATOOLS}/hh-suite .. ; make -j4 ; make install) 78 | # rm -rf hh-suite-3.3.0 79 | 80 | # echo "installing HMMER 3.3.2..." 81 | # wget http://eddylab.org/software/hmmer/hmmer-3.3.2.tar.gz 82 | # (tar xzvf hmmer-3.3.2.tar.gz ; cd hmmer-3.3.2 ; ./configure --prefix=${MSATOOLS}/hmmer ; make -j4 ; make install) 83 | # rm -rf hmmer-3.3.2.tar.gz hmmer-3.3.2 84 | 85 | echo "Creating conda environments with python3.8 as ${COLABFOLDDIR}/colabfold-conda" 86 | . "/opt/homebrew/Caskroom/miniforge/base/etc/profile.d/conda.sh" 87 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.8 -y 88 | conda activate $COLABFOLDDIR/colabfold-conda 89 | conda update -y conda 90 | 91 | echo "Installing conda-forge packages" 92 | conda install -y -c conda-forge python=3.8 openmm==7.5.1 pdbfixer jupyter matplotlib py3Dmol tqdm biopython==1.79 immutabledict==2.0.0 93 | conda install -y -c conda-forge jax==0.2.20 94 | conda install -y -c apple tensorflow-deps 95 | python3.8 -m pip install tensorflow-macos 96 | python3.8 -m pip install jaxlib==0.1.70 -f "https://dfm.io/custom-wheels/jaxlib/index.html" 97 | python3.8 -m pip install numpy==1.21.2 98 | python3.8 -m pip install git+git://github.com/deepmind/tree.git 99 | python3.8 -m pip install git+git://github.com/google/ml_collections.git 100 | python3.8 -m pip install git+git://github.com/deepmind/dm-haiku.git 101 | 102 | # Apply OpenMM patch. 103 | echo "Applying OpenMM patch..." 104 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.8/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch) 105 | 106 | # Enable GPU-accelerated relaxation. 107 | # echo "Enable GPU-accelerated relaxation..." 108 | # (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch) 109 | 110 | echo "Downloading runner.py" 111 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py") 112 | 113 | echo "Installation of Alphafold2_advanced finished." 114 | -------------------------------------------------------------------------------- /v1.0.0/install_colabfold_intelmac.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # check whether `wget` are installed 4 | type wget || { echo "wget command is not installed. Please install it at first using Homebrew." ; exit 1 ; } 5 | 6 | # check whether Apple Silicon (M1 mac) or Intel Mac 7 | arch_name="$(uname -m)" 8 | 9 | if [ "${arch_name}" = "x86_64" ]; then 10 | if [ "$(sysctl -in sysctl.proc_translated)" = "1" ]; then 11 | echo "Running on Rosetta 2" 12 | else 13 | echo "Running on native Intel" 14 | fi 15 | elif [ "${arch_name}" = "arm64" ]; then 16 | echo "Running on Apple Silicon (M1 mac)" 17 | echo "This installer is only for intel Mac. Use install_colabfold_M1mac.sh to install on this Mac." 18 | exit 1 19 | else 20 | echo "Unknown architecture: ${arch_name}" 21 | exit 1 22 | fi 23 | 24 | GIT_REPO="https://github.com/deepmind/alphafold" 25 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar" 26 | CURRENTPATH=`pwd` 27 | COLABFOLDDIR="${CURRENTPATH}/colabfold" 28 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params" 29 | MSATOOLS="${COLABFOLDDIR}/tools" 30 | 31 | # download the original alphafold as "${COLABFOLDDIR}" 32 | echo "downloading the original alphafold as ${COLABFOLDDIR}..." 33 | rm -rf ${COLABFOLDDIR} 34 | git clone ${GIT_REPO} ${COLABFOLDDIR} 35 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet) 36 | 37 | # colabfold patches 38 | echo "Applying several patches to be Alphafold2_advanced..." 39 | cd ${COLABFOLDDIR} 40 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py 41 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py 42 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py 43 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch 44 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch 45 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch 46 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch 47 | # GPU relaxation patch 48 | # wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch 49 | 50 | # donwload reformat.pl from hh-suite 51 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl 52 | # Apply multi-chain patch from Lim Heo @huhlim 53 | patch -u alphafold/common/protein.py -i protein.patch 54 | patch -u alphafold/model/model.py -i model.patch 55 | patch -u alphafold/model/modules.py -i modules.patch 56 | patch -u alphafold/model/config.py -i config.patch 57 | cd .. 58 | 59 | # Downloading parameter files 60 | echo "Downloading AlphaFold2 trained parameters..." 61 | mkdir -p ${PARAMS_DIR} 62 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR} 63 | 64 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure 65 | echo "Downloading stereo_chemical_props.txt..." 66 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt 67 | mkdir -p ${COLABFOLDDIR}/alphafold/common 68 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common 69 | 70 | # Install Miniconda3 for Linux 71 | echo "Installing Miniconda3 for macOS..." 72 | cd ${COLABFOLDDIR} 73 | wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh 74 | bash ./Miniconda3-latest-MacOSX-x86_64.sh -b -p ${COLABFOLDDIR}/conda 75 | rm Miniconda3-latest-MacOSX-x86_64.sh 76 | cd .. 77 | 78 | echo "Creating conda environments with python3.7 as ${COLABFOLDDIR}/colabfold-conda" 79 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh" 80 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}" 81 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.7 -y 82 | conda activate $COLABFOLDDIR/colabfold-conda 83 | conda update -y conda 84 | 85 | echo "Installing conda-forge packages" 86 | conda install -c conda-forge python=3.7 openmm==7.5.1 pdbfixer -y 87 | conda install -c bioconda hmmer==3.3.2 hhsuite==3.3.0 -y 88 | echo "Installing alphafold dependencies by pip" 89 | python3.7 -m pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 jaxlib==0.1.69 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0 90 | python3.7 -m pip install jupyter matplotlib py3Dmol tqdm 91 | 92 | # Apply OpenMM patch. 93 | echo "Applying OpenMM patch..." 94 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.7/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch) 95 | 96 | # Enable GPU-accelerated relaxation. 97 | # echo "Enable GPU-accelerated relaxation..." 98 | # (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch) 99 | 100 | echo "Downloading runner.py" 101 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py") 102 | 103 | echo "Installation of Alphafold2_advanced finished." 104 | -------------------------------------------------------------------------------- /v1.0.0/install_colabfold_linux.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # check whether `wget` and `curl` are installed 4 | type wget || { echo "wget command is not installed. Please install it at first using apt or yum." ; exit 1 ; } 5 | type curl || { echo "curl command is not installed. Please install it at first using apt or yum. " ; exit 1 ; } 6 | 7 | GIT_REPO="https://github.com/deepmind/alphafold" 8 | SOURCE_URL="https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar" 9 | CURRENTPATH=`pwd` 10 | COLABFOLDDIR="${CURRENTPATH}/colabfold" 11 | PARAMS_DIR="${COLABFOLDDIR}/alphafold/data/params" 12 | MSATOOLS="${COLABFOLDDIR}/tools" 13 | 14 | # download the original alphafold as "${COLABFOLDDIR}" 15 | echo "downloading the original alphafold as ${COLABFOLDDIR}..." 16 | rm -rf ${COLABFOLDDIR} 17 | git clone ${GIT_REPO} ${COLABFOLDDIR} 18 | (cd ${COLABFOLDDIR}; git checkout 1d43aaff941c84dc56311076b58795797e49107b --quiet) 19 | 20 | # colabfold patches 21 | echo "Applying several patches to be Alphafold2_advanced..." 22 | cd ${COLABFOLDDIR} 23 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold.py 24 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/colabfold_alphafold.py 25 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/pairmsa.py 26 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/protein.patch 27 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/config.patch 28 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/model.patch 29 | wget -qnc https://raw.githubusercontent.com/sokrypton/ColabFold/main/beta/modules.patch 30 | # GPU relaxation patch 31 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/gpurelaxation.patch -O gpurelaxation.patch 32 | 33 | # donwload reformat.pl from hh-suite 34 | wget -qnc https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl 35 | # Apply multi-chain patch from Lim Heo @huhlim 36 | patch -u alphafold/common/protein.py -i protein.patch 37 | patch -u alphafold/model/model.py -i model.patch 38 | patch -u alphafold/model/modules.py -i modules.patch 39 | patch -u alphafold/model/config.py -i config.patch 40 | cd .. 41 | 42 | # Downloading parameter files 43 | echo "Downloading AlphaFold2 trained parameters..." 44 | mkdir -p ${PARAMS_DIR} 45 | curl -fL ${SOURCE_URL} | tar x -C ${PARAMS_DIR} 46 | 47 | # Downloading stereo_chemical_props.txt from https://git.scicore.unibas.ch/schwede/openstructure 48 | echo "Downloading stereo_chemical_props.txt..." 49 | wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate 50 | mkdir -p ${COLABFOLDDIR}/alphafold/common 51 | mv stereo_chemical_props.txt ${COLABFOLDDIR}/alphafold/common 52 | 53 | # Install Miniconda3 for Linux 54 | echo "Installing Miniconda3 for Linux..." 55 | cd ${COLABFOLDDIR} 56 | wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 57 | bash ./Miniconda3-latest-Linux-x86_64.sh -b -p ${COLABFOLDDIR}/conda 58 | rm Miniconda3-latest-Linux-x86_64.sh 59 | cd .. 60 | 61 | echo "Creating conda environments with python3.7 as ${COLABFOLDDIR}/colabfold-conda" 62 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh" 63 | export PATH="${COLABFOLDDIR}/conda/condabin:${PATH}" 64 | conda create -p $COLABFOLDDIR/colabfold-conda python=3.7 -y 65 | conda activate $COLABFOLDDIR/colabfold-conda 66 | conda update -n base conda -y 67 | 68 | echo "Installing conda-forge packages" 69 | conda install -c conda-forge python=3.7 cudnn==8.2.1.32 cudatoolkit==11.1.1 openmm==7.5.1 pdbfixer -y 70 | conda install -c bioconda hmmer==3.3.2 hhsuite==3.3.0 -y 71 | echo "Installing alphafold dependencies by pip" 72 | python3.7 -m pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow-gpu==2.5.0 73 | python3.7 -m pip install jupyter matplotlib py3Dmol tqdm 74 | python3.7 -m pip install --upgrade jax jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html 75 | 76 | # Apply OpenMM patch. 77 | echo "Applying OpenMM patch..." 78 | (cd ${COLABFOLDDIR}/colabfold-conda/lib/python3.7/site-packages/ && patch -p0 < ${COLABFOLDDIR}/docker/openmm.patch) 79 | 80 | # Enable GPU-accelerated relaxation. 81 | echo "Enable GPU-accelerated relaxation..." 82 | (cd ${COLABFOLDDIR} && patch -u alphafold/relax/amber_minimize.py -i gpurelaxation.patch) 83 | 84 | echo "Downloading runner.py..." 85 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner.py") 86 | (cd ${COLABFOLDDIR} && wget -q "https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/runner_af2advanced.py") 87 | 88 | echo "Making standalone command 'colabfold'..." 89 | cd ${COLABFOLDDIR} 90 | mkdir -p bin && cd bin 91 | cat << EOF > colabfold 92 | #!/bin/sh 93 | 94 | . "${COLABFOLDDIR}/conda/etc/profile.d/conda.sh" 95 | conda activate ${COLABFOLDDIR}/colabfold-conda 96 | export NVIDIA_VISIBLE_DEVICES="all" 97 | export TF_FORCE_UNIFIED_MEMORY="1" 98 | export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0" 99 | export COLABFOLD_PATH="${COLABFOLDDIR}" 100 | python3.7 ${COLABFOLDDIR}/runner_af2advanced.py \$@ 101 | EOF 102 | chmod +x ./colabfold 103 | cd ${COLABFOLDDIR} 104 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/residue_constants.patch -O residue_constants.patch 105 | wget -qnc https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/colabfold_alphafold.patch -O colabfold_alphafold.patch 106 | patch -u alphafold/common/residue_constants.py -i residue_constants.patch 107 | patch -u colabfold_alphafold.py -i colabfold_alphafold.patch 108 | 109 | echo "Installation of Alphafold2_advanced finished." 110 | -------------------------------------------------------------------------------- /v1.0.0/residue_constants.patch: -------------------------------------------------------------------------------- 1 | --- residue_constants.py.orig 2021-10-24 11:30:58.275400080 +0900 2 | +++ residue_constants.py 2021-10-24 11:20:08.028085425 +0900 3 | @@ -20,6 +20,8 @@ from typing import List, Mapping, Tuple 4 | 5 | import numpy as np 6 | import tree 7 | +import os 8 | +colabfold_path = os.getenv('COLABFOLD_PATH', '.') 9 | 10 | # Internal import (35fd). 11 | 12 | @@ -403,7 +405,7 @@ def load_stereo_chemical_props() -> Tupl 13 | residue_bond_angles: dict that maps resname --> list of BondAngle tuples 14 | """ 15 | stereo_chemical_props_path = ( 16 | - 'alphafold/common/stereo_chemical_props.txt') 17 | + colabfold_path + '/alphafold/common/stereo_chemical_props.txt') 18 | with open(stereo_chemical_props_path, 'rt') as f: 19 | stereo_chemical_props = f.read() 20 | lines_iter = iter(stereo_chemical_props.splitlines()) 21 | 22 | -------------------------------------------------------------------------------- /v1.0.0/runner.py: -------------------------------------------------------------------------------- 1 | #%% 2 | import os 3 | import tensorflow as tf 4 | tf.config.set_visible_devices([], 'GPU') 5 | 6 | import jax 7 | 8 | from IPython.utils import io 9 | import subprocess 10 | import tqdm.notebook 11 | 12 | # --- Python imports --- 13 | import colabfold as cf 14 | import pairmsa 15 | import sys 16 | import pickle 17 | 18 | from urllib import request 19 | from concurrent import futures 20 | import json 21 | from matplotlib import gridspec 22 | import matplotlib.pyplot as plt 23 | import numpy as np 24 | import py3Dmol 25 | 26 | from urllib import request 27 | from concurrent import futures 28 | import json 29 | from matplotlib import gridspec 30 | import matplotlib.pyplot as plt 31 | import numpy as np 32 | import py3Dmol 33 | 34 | from alphafold.model import model 35 | from alphafold.model import config 36 | from alphafold.model import data 37 | 38 | from alphafold.data import parsers 39 | from alphafold.data import pipeline 40 | from alphafold.data.tools import jackhmmer 41 | 42 | from alphafold.common import protein 43 | 44 | ### Check your OS for localcolabfold 45 | import platform 46 | pf = platform.system() 47 | if pf == 'Windows': 48 | print('ColabFold on Windows') 49 | elif pf == 'Darwin': 50 | print('ColabFold on Mac') 51 | device="cpu" 52 | elif pf == 'Linux': 53 | print('ColabFold on Linux') 54 | device="gpu" 55 | #%% 56 | 57 | def run_jackhmmer(sequence, prefix): 58 | 59 | fasta_path = f"{prefix}.fasta" 60 | with open(fasta_path, 'wt') as f: 61 | f.write(f'>query\n{sequence}') 62 | 63 | pickled_msa_path = f"{prefix}.jackhmmer.pickle" 64 | if os.path.isfile(pickled_msa_path): 65 | msas_dict = pickle.load(open(pickled_msa_path,"rb")) 66 | msas, deletion_matrices, names = (msas_dict[k] for k in ['msas', 'deletion_matrices', 'names']) 67 | full_msa = [] 68 | for msa in msas: 69 | full_msa += msa 70 | else: 71 | # --- Find the closest source --- 72 | test_url_pattern = 'https://storage.googleapis.com/alphafold-colab{:s}/latest/uniref90_2021_03.fasta.1' 73 | ex = futures.ThreadPoolExecutor(3) 74 | def fetch(source): 75 | request.urlretrieve(test_url_pattern.format(source)) 76 | return source 77 | fs = [ex.submit(fetch, source) for source in ['', '-europe', '-asia']] 78 | source = None 79 | for f in futures.as_completed(fs): 80 | source = f.result() 81 | ex.shutdown() 82 | break 83 | 84 | jackhmmer_binary_path = '/usr/bin/jackhmmer' 85 | dbs = [] 86 | 87 | num_jackhmmer_chunks = {'uniref90': 59, 'smallbfd': 17, 'mgnify': 71} 88 | total_jackhmmer_chunks = sum(num_jackhmmer_chunks.values()) 89 | with tqdm.notebook.tqdm(total=total_jackhmmer_chunks, bar_format=TQDM_BAR_FORMAT) as pbar: 90 | def jackhmmer_chunk_callback(i): 91 | pbar.update(n=1) 92 | 93 | pbar.set_description('Searching uniref90') 94 | jackhmmer_uniref90_runner = jackhmmer.Jackhmmer( 95 | binary_path=jackhmmer_binary_path, 96 | database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/uniref90_2021_03.fasta', 97 | get_tblout=True, 98 | num_streamed_chunks=num_jackhmmer_chunks['uniref90'], 99 | streaming_callback=jackhmmer_chunk_callback, 100 | z_value=135301051) 101 | dbs.append(('uniref90', jackhmmer_uniref90_runner.query(fasta_path))) 102 | 103 | pbar.set_description('Searching smallbfd') 104 | jackhmmer_smallbfd_runner = jackhmmer.Jackhmmer( 105 | binary_path=jackhmmer_binary_path, 106 | database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/bfd-first_non_consensus_sequences.fasta', 107 | get_tblout=True, 108 | num_streamed_chunks=num_jackhmmer_chunks['smallbfd'], 109 | streaming_callback=jackhmmer_chunk_callback, 110 | z_value=65984053) 111 | dbs.append(('smallbfd', jackhmmer_smallbfd_runner.query(fasta_path))) 112 | 113 | pbar.set_description('Searching mgnify') 114 | jackhmmer_mgnify_runner = jackhmmer.Jackhmmer( 115 | binary_path=jackhmmer_binary_path, 116 | database_path=f'https://storage.googleapis.com/alphafold-colab{source}/latest/mgy_clusters_2019_05.fasta', 117 | get_tblout=True, 118 | num_streamed_chunks=num_jackhmmer_chunks['mgnify'], 119 | streaming_callback=jackhmmer_chunk_callback, 120 | z_value=304820129) 121 | dbs.append(('mgnify', jackhmmer_mgnify_runner.query(fasta_path))) 122 | 123 | # --- Extract the MSAs and visualize --- 124 | # Extract the MSAs from the Stockholm files. 125 | # NB: deduplication happens later in pipeline.make_msa_features. 126 | 127 | mgnify_max_hits = 501 128 | msas = [] 129 | deletion_matrices = [] 130 | names = [] 131 | for db_name, db_results in dbs: 132 | unsorted_results = [] 133 | for i, result in enumerate(db_results): 134 | msa, deletion_matrix, target_names = parsers.parse_stockholm(result['sto']) 135 | e_values_dict = parsers.parse_e_values_from_tblout(result['tbl']) 136 | e_values = [e_values_dict[t.split('/')[0]] for t in target_names] 137 | zipped_results = zip(msa, deletion_matrix, target_names, e_values) 138 | if i != 0: 139 | # Only take query from the first chunk 140 | zipped_results = [x for x in zipped_results if x[2] != 'query'] 141 | unsorted_results.extend(zipped_results) 142 | sorted_by_evalue = sorted(unsorted_results, key=lambda x: x[3]) 143 | db_msas, db_deletion_matrices, db_names, _ = zip(*sorted_by_evalue) 144 | if db_msas: 145 | if db_name == 'mgnify': 146 | db_msas = db_msas[:mgnify_max_hits] 147 | db_deletion_matrices = db_deletion_matrices[:mgnify_max_hits] 148 | db_names = db_names[:mgnify_max_hits] 149 | msas.append(db_msas) 150 | deletion_matrices.append(db_deletion_matrices) 151 | names.append(db_names) 152 | msa_size = len(set(db_msas)) 153 | print(f'{msa_size} Sequences Found in {db_name}') 154 | 155 | pickle.dump({"msas":msas, 156 | "deletion_matrices":deletion_matrices, 157 | "names":names}, open(pickled_msa_path,"wb")) 158 | return msas, deletion_matrices, names 159 | 160 | import re 161 | 162 | # define sequence 163 | sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK' #@param {type:"string"} 164 | sequence = re.sub("[^A-Z:/]", "", sequence.upper()) 165 | sequence = re.sub(":+",":",sequence) 166 | sequence = re.sub("/+","/",sequence) 167 | sequence = re.sub("^[:/]+","",sequence) 168 | sequence = re.sub("[:/]+$","",sequence) 169 | 170 | jobname = "test" #@param {type:"string"} 171 | jobname = re.sub(r'\W+', '', jobname) 172 | 173 | # define number of copies 174 | homooligomer = "1" #@param {type:"string"} 175 | homooligomer = re.sub("[:/]+",":",homooligomer) 176 | homooligomer = re.sub("^[:/]+","",homooligomer) 177 | homooligomer = re.sub("[:/]+$","",homooligomer) 178 | 179 | if len(homooligomer) == 0: homooligomer = "1" 180 | homooligomer = re.sub("[^0-9:]", "", homooligomer) 181 | homooligomers = [int(h) for h in homooligomer.split(":")] 182 | 183 | #@markdown - `sequence` Specify protein sequence to be modelled. 184 | #@markdown - Use `/` to specify intra-protein chainbreaks (for trimming regions within protein). 185 | #@markdown - Use `:` to specify inter-protein chainbreaks (for modeling protein-protein hetero-complexes). 186 | #@markdown - For example, sequence `AC/DE:FGH` will be modelled as polypeptides: `AC`, `DE` and `FGH`. A separate MSA will be generates for `ACDE` and `FGH`. 187 | #@markdown If `pair_msa` is enabled, `ACDE`'s MSA will be paired with `FGH`'s MSA. 188 | #@markdown - `homooligomer` Define number of copies in a homo-oligomeric assembly. 189 | #@markdown - Use `:` to specify different homooligomeric state (copy numer) for each component of the complex. 190 | #@markdown - For example, **sequence:**`ABC:DEF`, **homooligomer:** `2:1`, the first protein `ABC` will be modeled as a homodimer (2 copies) and second `DEF` a monomer (1 copy). 191 | 192 | ori_sequence = sequence 193 | sequence = sequence.replace("/","").replace(":","") 194 | seqs = ori_sequence.replace("/","").split(":") 195 | 196 | if len(seqs) != len(homooligomers): 197 | if len(homooligomers) == 1: 198 | homooligomers = [homooligomers[0]] * len(seqs) 199 | homooligomer = ":".join([str(h) for h in homooligomers]) 200 | else: 201 | while len(seqs) > len(homooligomers): 202 | homooligomers.append(1) 203 | homooligomers = homooligomers[:len(seqs)] 204 | homooligomer = ":".join([str(h) for h in homooligomers]) 205 | print("WARNING: Mismatch between number of breaks ':' in 'sequence' and 'homooligomer' definition") 206 | 207 | full_sequence = "".join([s*h for s,h in zip(seqs,homooligomers)]) 208 | 209 | # prediction directory 210 | output_dir = 'prediction_' + jobname + '_' + cf.get_hash(full_sequence)[:5] 211 | os.makedirs(output_dir, exist_ok=True) 212 | # delete existing files in working directory 213 | for f in os.listdir(output_dir): 214 | os.remove(os.path.join(output_dir, f)) 215 | 216 | MIN_SEQUENCE_LENGTH = 16 217 | MAX_SEQUENCE_LENGTH = 2500 218 | 219 | aatypes = set('ACDEFGHIKLMNPQRSTVWY') # 20 standard aatypes 220 | if not set(full_sequence).issubset(aatypes): 221 | raise Exception(f'Input sequence contains non-amino acid letters: {set(sequence) - aatypes}. AlphaFold only supports 20 standard amino acids as inputs.') 222 | if len(full_sequence) < MIN_SEQUENCE_LENGTH: 223 | raise Exception(f'Input sequence is too short: {len(full_sequence)} amino acids, while the minimum is {MIN_SEQUENCE_LENGTH}') 224 | if len(full_sequence) > MAX_SEQUENCE_LENGTH: 225 | raise Exception(f'Input sequence is too long: {len(full_sequence)} amino acids, while the maximum is {MAX_SEQUENCE_LENGTH}. Please use the full AlphaFold system for long sequences.') 226 | 227 | if len(full_sequence) > 1400: 228 | print(f"WARNING: For a typical Google-Colab-GPU (16G) session, the max total length is ~1400 residues. You are at {len(full_sequence)}! Run Alphafold may crash.") 229 | 230 | print(f"homooligomer: '{homooligomer}'") 231 | print(f"total_length: '{len(full_sequence)}'") 232 | print(f"working_directory: '{output_dir}'") 233 | #%% 234 | TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]' 235 | #@markdown Once this cell has been executed, you will see 236 | #@markdown statistics about the multiple sequence alignment 237 | #@markdown (MSA) that will be used by AlphaFold. In particular, 238 | #@markdown you’ll see how well each residue is covered by similar 239 | #@markdown sequences in the MSA. 240 | #@markdown (Note that the search against databases and the actual prediction can take some time, from minutes to hours, depending on the length of the protein and what type of GPU you are allocated by Colab.) 241 | 242 | #@markdown --- 243 | msa_method = "mmseqs2" #@param ["mmseqs2","jackhmmer","single_sequence","precomputed"] 244 | #@markdown - `mmseqs2` - FAST method from [ColabFold](https://github.com/sokrypton/ColabFold) 245 | #@markdown - `jackhmmer` - default method from Deepmind (SLOW, but may find more/less sequences). 246 | #@markdown - `single_sequence` - use single sequence input 247 | #@markdown - `precomputed` If you have previously run this notebook and saved the results, 248 | #@markdown you can skip this step by uploading 249 | #@markdown the previously generated `prediction_?????/msa.pickle` 250 | 251 | 252 | #@markdown --- 253 | #@markdown **custom msa options** 254 | add_custom_msa = False #@param {type:"boolean"} 255 | msa_format = "fas" #@param ["fas","a2m","a3m","sto","psi","clu"] 256 | #@markdown - `add_custom_msa` - If enabled, you'll get an option to upload your custom MSA in the specified `msa_format`. Note: Your MSA will be supplemented with those from 'mmseqs2' or 'jackhmmer', unless `msa_method` is set to 'single_sequence'. 257 | 258 | #@markdown --- 259 | #@markdown **pair msa options** 260 | 261 | #@markdown Experimental option for protein complexes. Pairing currently only supported for proteins in same operon (prokaryotic genomes). 262 | pair_mode = "unpaired" #@param ["unpaired","unpaired+paired","paired"] {type:"string"} 263 | #@markdown - `unpaired` - generate separate MSA for each protein. 264 | #@markdown - `unpaired+paired` - attempt to pair sequences from the same operon within the genome. 265 | #@markdown - `paired` - only use sequences that were successfully paired. 266 | 267 | #@markdown Options to prefilter each MSA before pairing. (It might help if there are any paralogs in the complex.) 268 | pair_cov = 50 #@param [0,25,50,75,90] {type:"raw"} 269 | pair_qid = 20 #@param [0,15,20,30,40,50] {type:"raw"} 270 | #@markdown - `pair_cov` prefilter each MSA to minimum coverage with query (%) before pairing. 271 | #@markdown - `pair_qid` prefilter each MSA to minimum sequence identity with query (%) before pairing. 272 | 273 | # --- Search against genetic databases --- 274 | os.makedirs('tmp', exist_ok=True) 275 | msas, deletion_matrices = [],[] 276 | 277 | if add_custom_msa: 278 | print(f"upload custom msa in '{msa_format}' format") 279 | msa_dict = files.upload() 280 | lines = msa_dict[list(msa_dict.keys())[0]].decode() 281 | 282 | # convert to a3m 283 | with open(f"tmp/upload.{msa_format}","w") as tmp_upload: 284 | tmp_upload.write(lines) 285 | os.system(f"reformat.pl {msa_format} a3m tmp/upload.{msa_format} tmp/upload.a3m") 286 | a3m_lines = open("tmp/upload.a3m","r").read() 287 | 288 | # parse 289 | msa, mtx = parsers.parse_a3m(a3m_lines) 290 | msas.append(msa) 291 | deletion_matrices.append(mtx) 292 | 293 | if len(msas[0][0]) != len(sequence): 294 | raise ValueError("ERROR: the length of msa does not match input sequence") 295 | 296 | if msa_method == "precomputed": 297 | print("upload precomputed pickled msa from previous run") 298 | pickled_msa_dict = files.upload() 299 | msas_dict = pickle.loads(pickled_msa_dict[list(pickled_msa_dict.keys())[0]]) 300 | msas, deletion_matrices = (msas_dict[k] for k in ['msas', 'deletion_matrices']) 301 | 302 | elif msa_method == "single_sequence": 303 | if len(msas) == 0: 304 | msas.append([sequence]) 305 | deletion_matrices.append([[0]*len(sequence)]) 306 | 307 | else: 308 | seqs = ori_sequence.replace('/','').split(':') 309 | _blank_seq = ["-" * len(seq) for seq in seqs] 310 | _blank_mtx = [[0] * len(seq) for seq in seqs] 311 | def _pad(ns,vals,mode): 312 | if mode == "seq": _blank = _blank_seq.copy() 313 | if mode == "mtx": _blank = _blank_mtx.copy() 314 | if isinstance(ns, list): 315 | for n,val in zip(ns,vals): _blank[n] = val 316 | else: _blank[ns] = vals 317 | if mode == "seq": return "".join(_blank) 318 | if mode == "mtx": return sum(_blank,[]) 319 | 320 | if len(seqs) == 1 or "unpaired" in pair_mode: 321 | # gather msas 322 | if msa_method == "mmseqs2": 323 | prefix = cf.get_hash("".join(seqs)) 324 | prefix = os.path.join('tmp',prefix) 325 | print(f"running mmseqs2") 326 | A3M_LINES = cf.run_mmseqs2(seqs, prefix, filter=True) 327 | 328 | for n, seq in enumerate(seqs): 329 | # tmp directory 330 | prefix = cf.get_hash(seq) 331 | prefix = os.path.join('tmp',prefix) 332 | 333 | if msa_method == "mmseqs2": 334 | # run mmseqs2 335 | a3m_lines = A3M_LINES[n] 336 | msa, mtx = parsers.parse_a3m(a3m_lines) 337 | msas_, mtxs_ = [msa],[mtx] 338 | 339 | elif msa_method == "jackhmmer": 340 | print(f"running jackhmmer on seq_{n}") 341 | # run jackhmmer 342 | msas_, mtxs_, names_ = ([sum(x,())] for x in run_jackhmmer(seq, prefix)) 343 | 344 | # pad sequences 345 | for msa_,mtx_ in zip(msas_,mtxs_): 346 | msa,mtx = [sequence],[[0]*len(sequence)] 347 | for s,m in zip(msa_,mtx_): 348 | msa.append(_pad(n,s,"seq")) 349 | mtx.append(_pad(n,m,"mtx")) 350 | 351 | msas.append(msa) 352 | deletion_matrices.append(mtx) 353 | 354 | #################################################################################### 355 | # PAIR_MSA 356 | #################################################################################### 357 | 358 | if len(seqs) > 1 and (pair_mode == "paired" or pair_mode == "unpaired+paired"): 359 | print("attempting to pair some sequences...") 360 | 361 | if msa_method == "mmseqs2": 362 | prefix = cf.get_hash("".join(seqs)) 363 | prefix = os.path.join('tmp',prefix) 364 | print(f"running mmseqs2_noenv_nofilter on all seqs") 365 | A3M_LINES = cf.run_mmseqs2(seqs, prefix, use_env=False, use_filter=False) 366 | 367 | _data = [] 368 | for a in range(len(seqs)): 369 | print(f"prepping seq_{a}") 370 | _seq = seqs[a] 371 | _prefix = os.path.join('tmp',cf.get_hash(_seq)) 372 | 373 | if msa_method == "mmseqs2": 374 | a3m_lines = A3M_LINES[a] 375 | _msa, _mtx, _lab = pairmsa.parse_a3m(a3m_lines, 376 | filter_qid=pair_qid/100, 377 | filter_cov=pair_cov/100) 378 | 379 | elif msa_method == "jackhmmer": 380 | _msas, _mtxs, _names = run_jackhmmer(_seq, _prefix) 381 | _msa, _mtx, _lab = pairmsa.get_uni_jackhmmer(_msas[0], _mtxs[0], _names[0], 382 | filter_qid=pair_qid/100, 383 | filter_cov=pair_cov/100) 384 | 385 | if len(_msa) > 1: 386 | _data.append(pairmsa.hash_it(_msa, _lab, _mtx, call_uniprot=False)) 387 | else: 388 | _data.append(None) 389 | 390 | Ln = len(seqs) 391 | O = [[None for _ in seqs] for _ in seqs] 392 | for a in range(Ln): 393 | if _data[a] is not None: 394 | for b in range(a+1,Ln): 395 | if _data[b] is not None: 396 | print(f"attempting pairwise stitch for {a} {b}") 397 | O[a][b] = pairmsa._stitch(_data[a],_data[b]) 398 | _seq_a, _seq_b, _mtx_a, _mtx_b = (*O[a][b]["seq"],*O[a][b]["mtx"]) 399 | 400 | ############################################## 401 | # filter to remove redundant sequences 402 | ############################################## 403 | ok = [] 404 | with open("tmp/tmp.fas","w") as fas_file: 405 | fas_file.writelines([f">{n}\n{a+b}\n" for n,(a,b) in enumerate(zip(_seq_a,_seq_b))]) 406 | os.system("hhfilter -maxseq 1000000 -i tmp/tmp.fas -o tmp/tmp.id90.fas -id 90") 407 | for line in open("tmp/tmp.id90.fas","r"): 408 | if line.startswith(">"): ok.append(int(line[1:])) 409 | ############################################## 410 | print(f"found {len(_seq_a)} pairs ({len(ok)} after filtering)") 411 | 412 | if len(_seq_a) > 0: 413 | msa,mtx = [sequence],[[0]*len(sequence)] 414 | for s_a,s_b,m_a,m_b in zip(_seq_a, _seq_b, _mtx_a, _mtx_b): 415 | msa.append(_pad([a,b],[s_a,s_b],"seq")) 416 | mtx.append(_pad([a,b],[m_a,m_b],"mtx")) 417 | msas.append(msa) 418 | deletion_matrices.append(mtx) 419 | 420 | ''' 421 | # triwise stitching (WIP) 422 | if Ln > 2: 423 | for a in range(Ln): 424 | for b in range(a+1,Ln): 425 | for c in range(b+1,Ln): 426 | if O[a][b] is not None and O[b][c] is not None: 427 | print(f"attempting triwise stitch for {a} {b} {c}") 428 | list_ab = O[a][b]["lab"][1] 429 | list_bc = O[b][c]["lab"][0] 430 | msa,mtx = [sequence],[[0]*len(sequence)] 431 | for i,l_b in enumerate(list_ab): 432 | if l_b in list_bc: 433 | j = list_bc.index(l_b) 434 | s_a = O[a][b]["seq"][0][i] 435 | s_b = O[a][b]["seq"][1][i] 436 | s_c = O[b][c]["seq"][1][j] 437 | 438 | m_a = O[a][b]["mtx"][0][i] 439 | m_b = O[a][b]["mtx"][1][i] 440 | m_c = O[b][c]["mtx"][1][j] 441 | 442 | msa.append(_pad([a,b,c],[s_a,s_b,s_c],"seq")) 443 | mtx.append(_pad([a,b,c],[m_a,m_b,m_c],"mtx")) 444 | if len(msa) > 1: 445 | msas.append(msa) 446 | deletion_matrices.append(mtx) 447 | print(f"found {len(msa)} triplets") 448 | ''' 449 | #################################################################################### 450 | #################################################################################### 451 | 452 | # save MSA as pickle 453 | pickle.dump({"msas":msas,"deletion_matrices":deletion_matrices}, 454 | open(os.path.join(output_dir,"msa.pickle"),"wb")) 455 | 456 | make_msa_plot = len(msas[0]) > 1 457 | if make_msa_plot: 458 | plt = cf.plot_msas(msas, ori_sequence) 459 | plt.savefig(os.path.join(output_dir,"msa_coverage.png"), bbox_inches = 'tight', dpi=300) 460 | #%% 461 | #@title run alphafold 462 | num_relax = "None" 463 | rank_by = "pLDDT" #@param ["pLDDT","pTMscore"] 464 | use_turbo = True #@param {type:"boolean"} 465 | max_msa = "512:1024" #@param ["512:1024", "256:512", "128:256", "64:128", "32:64"] 466 | max_msa_clusters, max_extra_msa = [int(x) for x in max_msa.split(":")] 467 | 468 | 469 | 470 | #@markdown - `rank_by` specify metric to use for ranking models (For protein-protein complexes, we recommend pTMscore) 471 | #@markdown - `use_turbo` introduces a few modifications (compile once, swap params, adjust max_msa) to speedup and reduce memory requirements. Disable for default behavior. 472 | #@markdown - `max_msa` defines: `max_msa_clusters:max_extra_msa` number of sequences to use. When adjusting after GPU crash, be sure to `Runtime` → `Restart runtime`. (Lowering will reduce GPU requirements, but may result in poor model quality. This option ignored if `use_turbo` is disabled) 473 | show_images = True #@param {type:"boolean"} 474 | #@markdown - `show_images` To make things more exciting we show images of the predicted structures as they are being generated. (WARNING: the order of images displayed does not reflect any ranking). 475 | #@markdown --- 476 | #@markdown #### Sampling options 477 | #@markdown There are two stochastic parts of the pipeline. Within the feature generation (choice of cluster centers) and within the model (dropout). 478 | #@markdown To get structure diversity, you can iterate through a fixed number of random_seeds (using `num_samples`) and/or enable dropout (using `is_training`). 479 | 480 | num_models = 5 #@param [1,2,3,4,5] {type:"raw"} 481 | use_ptm = True #@param {type:"boolean"} 482 | num_ensemble = 1 #@param [1,8] {type:"raw"} 483 | max_recycles = 3 #@param [1,3,6,12,24,48] {type:"raw"} 484 | tol = 0 #@param [0,0.1,0.5,1] {type:"raw"} 485 | is_training = False #@param {type:"boolean"} 486 | num_samples = 1 #@param [1,2,4,8,16,32] {type:"raw"} 487 | 488 | subsample_msa = True #@param {type:"boolean"} 489 | #@markdown - `subsample_msa` subsample large MSA to `3E7/length` sequences to avoid crashing the preprocessing protocol. (This option ignored if `use_turbo` is disabled.) 490 | 491 | save_pae_json = True 492 | save_tmp_pdb = True 493 | 494 | 495 | if use_ptm == False and rank_by == "pTMscore": 496 | print("WARNING: models will be ranked by pLDDT, 'use_ptm' is needed to compute pTMscore") 497 | rank_by = "pLDDT" 498 | 499 | ############################# 500 | # delete old files 501 | ############################# 502 | for f in os.listdir(output_dir): 503 | if "rank_" in f: 504 | os.remove(os.path.join(output_dir, f)) 505 | 506 | ############################# 507 | # homooligomerize 508 | ############################# 509 | lengths = [len(seq) for seq in seqs] 510 | msas_mod, deletion_matrices_mod = cf.homooligomerize_heterooligomer(msas, deletion_matrices, 511 | lengths, homooligomers) 512 | ############################# 513 | # define input features 514 | ############################# 515 | def _placeholder_template_feats(num_templates_, num_res_): 516 | return { 517 | 'template_aatype': np.zeros([num_templates_, num_res_, 22], np.float32), 518 | 'template_all_atom_masks': np.zeros([num_templates_, num_res_, 37, 3], np.float32), 519 | 'template_all_atom_positions': np.zeros([num_templates_, num_res_, 37], np.float32), 520 | 'template_domain_names': np.zeros([num_templates_], np.float32), 521 | 'template_sum_probs': np.zeros([num_templates_], np.float32), 522 | } 523 | 524 | num_res = len(full_sequence) 525 | feature_dict = {} 526 | feature_dict.update(pipeline.make_sequence_features(full_sequence, 'test', num_res)) 527 | feature_dict.update(pipeline.make_msa_features(msas_mod, deletion_matrices=deletion_matrices_mod)) 528 | if not use_turbo: 529 | feature_dict.update(_placeholder_template_feats(0, num_res)) 530 | 531 | def do_subsample_msa(F, random_seed=0): 532 | '''subsample msa to avoid running out of memory''' 533 | N = len(F["msa"]) 534 | L = len(F["residue_index"]) 535 | N_ = int(3E7/L) 536 | if N > N_: 537 | print(f"whhhaaa... too many sequences ({N}) subsampling to {N_}") 538 | np.random.seed(random_seed) 539 | idx = np.append(0,np.random.permutation(np.arange(1,N)))[:N_] 540 | F_ = {} 541 | F_["msa"] = F["msa"][idx] 542 | F_["deletion_matrix_int"] = F["deletion_matrix_int"][idx] 543 | F_["num_alignments"] = np.full_like(F["num_alignments"],N_) 544 | for k in ['aatype', 'between_segment_residues', 545 | 'domain_name', 'residue_index', 546 | 'seq_length', 'sequence']: 547 | F_[k] = F[k] 548 | return F_ 549 | else: 550 | return F 551 | 552 | ################################ 553 | # set chain breaks 554 | ################################ 555 | Ls = [] 556 | for seq,h in zip(ori_sequence.split(":"),homooligomers): 557 | Ls += [len(s) for s in seq.split("/")] * h 558 | Ls_plot = sum([[len(seq)]*h for seq,h in zip(seqs,homooligomers)],[]) 559 | feature_dict['residue_index'] = cf.chain_break(feature_dict['residue_index'], Ls) 560 | 561 | ########################### 562 | # run alphafold 563 | ########################### 564 | def parse_results(prediction_result, processed_feature_dict): 565 | b_factors = prediction_result['plddt'][:,None] * prediction_result['structure_module']['final_atom_mask'] 566 | dist_bins = jax.numpy.append(0,prediction_result["distogram"]["bin_edges"]) 567 | dist_mtx = dist_bins[prediction_result["distogram"]["logits"].argmax(-1)] 568 | contact_mtx = jax.nn.softmax(prediction_result["distogram"]["logits"])[:,:,dist_bins < 8].sum(-1) 569 | 570 | out = {"unrelaxed_protein": protein.from_prediction(processed_feature_dict, prediction_result, b_factors=b_factors), 571 | "plddt": prediction_result['plddt'], 572 | "pLDDT": prediction_result['plddt'].mean(), 573 | "dists": dist_mtx, 574 | "adj": contact_mtx} 575 | 576 | if "ptm" in prediction_result: 577 | out.update({"pae": prediction_result['predicted_aligned_error'], 578 | "pTMscore": prediction_result['ptm']}) 579 | return out 580 | 581 | model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'][:num_models] 582 | total = len(model_names) * num_samples 583 | with tqdm.notebook.tqdm(total=total, bar_format=TQDM_BAR_FORMAT) as pbar: 584 | ####################################################################### 585 | # precompile model and recompile only if length changes 586 | ####################################################################### 587 | if use_turbo: 588 | name = "model_5_ptm" if use_ptm else "model_5" 589 | N = len(feature_dict["msa"]) 590 | L = len(feature_dict["residue_index"]) 591 | compiled = (N, L, use_ptm, max_recycles, tol, num_ensemble, max_msa, is_training) 592 | if "COMPILED" in dir(): 593 | if COMPILED != compiled: recompile = True 594 | else: recompile = True 595 | if recompile: 596 | cf.clear_mem(device) 597 | cfg = config.model_config(name) 598 | 599 | # set size of msa (to reduce memory requirements) 600 | msa_clusters = min(N, max_msa_clusters) 601 | cfg.data.eval.max_msa_clusters = msa_clusters 602 | cfg.data.common.max_extra_msa = max(min(N-msa_clusters,max_extra_msa),1) 603 | 604 | cfg.data.common.num_recycle = max_recycles 605 | cfg.model.num_recycle = max_recycles 606 | cfg.model.recycle_tol = tol 607 | cfg.data.eval.num_ensemble = num_ensemble 608 | 609 | params = data.get_model_haiku_params(name,'./alphafold/data') 610 | model_runner = model.RunModel(cfg, params, is_training=is_training) 611 | COMPILED = compiled 612 | recompile = False 613 | 614 | else: 615 | cf.clear_mem(device) 616 | recompile = True 617 | 618 | # cleanup 619 | if "outs" in dir(): del outs 620 | outs = {} 621 | cf.clear_mem("cpu") 622 | 623 | ####################################################################### 624 | def report(key): 625 | pbar.update(n=1) 626 | o = outs[key] 627 | line = f"{key} recycles:{o['recycles']} tol:{o['tol']:.2f} pLDDT:{o['pLDDT']:.2f}" 628 | if use_ptm: line += f" pTMscore:{o['pTMscore']:.2f}" 629 | print(line) 630 | if show_images: 631 | fig = cf.plot_protein(o['unrelaxed_protein'], Ls=Ls_plot, dpi=100) 632 | # plt.show() 633 | plt.ion() 634 | if save_tmp_pdb: 635 | tmp_pdb_path = os.path.join(output_dir,f'unranked_{key}_unrelaxed.pdb') 636 | pdb_lines = protein.to_pdb(o['unrelaxed_protein']) 637 | with open(tmp_pdb_path, 'w') as f: f.write(pdb_lines) 638 | 639 | if use_turbo: 640 | # go through each random_seed 641 | for seed in range(num_samples): 642 | 643 | # prep input features 644 | if subsample_msa: 645 | sampled_feats_dict = do_subsample_msa(feature_dict, random_seed=seed) 646 | processed_feature_dict = model_runner.process_features(sampled_feats_dict, random_seed=seed) 647 | else: 648 | processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed) 649 | 650 | # go through each model 651 | for num, model_name in enumerate(model_names): 652 | name = model_name+"_ptm" if use_ptm else model_name 653 | key = f"{name}_seed_{seed}" 654 | pbar.set_description(f'Running {key}') 655 | 656 | # replace model parameters 657 | params = data.get_model_haiku_params(name, './alphafold/data') 658 | for k in model_runner.params.keys(): 659 | model_runner.params[k] = params[k] 660 | 661 | # predict 662 | prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu") 663 | 664 | # save results 665 | outs[key] = parse_results(prediction_result, processed_feature_dict) 666 | outs[key].update({"recycles":r, "tol":t}) 667 | report(key) 668 | 669 | del prediction_result, params 670 | del sampled_feats_dict, processed_feature_dict 671 | 672 | else: 673 | # go through each model 674 | for num, model_name in enumerate(model_names): 675 | name = model_name+"_ptm" if use_ptm else model_name 676 | params = data.get_model_haiku_params(name, './alphafold/data') 677 | cfg = config.model_config(name) 678 | cfg.data.common.num_recycle = cfg.model.num_recycle = max_recycles 679 | cfg.model.recycle_tol = tol 680 | cfg.data.eval.num_ensemble = num_ensemble 681 | model_runner = model.RunModel(cfg, params, is_training=is_training) 682 | 683 | # go through each random_seed 684 | for seed in range(num_samples): 685 | key = f"{name}_seed_{seed}" 686 | pbar.set_description(f'Running {key}') 687 | processed_feature_dict = model_runner.process_features(feature_dict, random_seed=seed) 688 | prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu") 689 | outs[key] = parse_results(prediction_result, processed_feature_dict) 690 | outs[key].update({"recycles":r, "tol":t}) 691 | report(key) 692 | 693 | # cleanup 694 | del processed_feature_dict, prediction_result 695 | 696 | del params, model_runner, cfg 697 | cf.clear_mem("gpu") 698 | 699 | # delete old files 700 | for f in os.listdir(output_dir): 701 | if "rank" in f: 702 | os.remove(os.path.join(output_dir, f)) 703 | 704 | # Find the best model according to the mean pLDDT. 705 | model_rank = list(outs.keys()) 706 | model_rank = [model_rank[i] for i in np.argsort([outs[x][rank_by] for x in model_rank])[::-1]] 707 | 708 | # Write out the prediction 709 | for n,key in enumerate(model_rank): 710 | prefix = f"rank_{n+1}_{key}" 711 | pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb') 712 | fig = cf.plot_protein(outs[key]["unrelaxed_protein"], Ls=Ls_plot, dpi=200) 713 | plt.savefig(os.path.join(output_dir,f'{prefix}.png'), bbox_inches = 'tight') 714 | plt.close(fig) 715 | 716 | pdb_lines = protein.to_pdb(outs[key]["unrelaxed_protein"]) 717 | with open(pred_output_path, 'w') as f: 718 | f.write(pdb_lines) 719 | 720 | ############################################################ 721 | print(f"model rank based on {rank_by}") 722 | for n,key in enumerate(model_rank): 723 | print(f"rank_{n+1}_{key} {rank_by}:{outs[key][rank_by]:.2f}") 724 | #%% 725 | #@title Refine structures with Amber-Relax (Optional) 726 | num_relax = "None" #@param ["None", "Top1", "Top5", "All"] {type:"string"} 727 | if num_relax == "None": 728 | num_relax = 0 729 | elif num_relax == "Top1": 730 | num_relax = 1 731 | elif num_relax == "Top5": 732 | num_relax = 5 733 | else: 734 | num_relax = len(model_names) * num_samples 735 | 736 | if num_relax > 0: 737 | if "relax" not in dir(): 738 | # add conda environment to path 739 | sys.path.append('./colabfold-conda/lib/python3.7/site-packages') 740 | 741 | # import libraries 742 | from alphafold.relax import relax 743 | from alphafold.relax import utils 744 | 745 | with tqdm.notebook.tqdm(total=num_relax, bar_format=TQDM_BAR_FORMAT) as pbar: 746 | pbar.set_description(f'AMBER relaxation') 747 | for n,key in enumerate(model_rank): 748 | if n < num_relax: 749 | prefix = f"rank_{n+1}_{key}" 750 | pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb') 751 | if not os.path.isfile(pred_output_path): 752 | amber_relaxer = relax.AmberRelaxation( 753 | max_iterations=0, 754 | tolerance=2.39, 755 | stiffness=10.0, 756 | exclude_residues=[], 757 | max_outer_iterations=20) 758 | relaxed_pdb_lines, _, _ = amber_relaxer.process(prot=outs[key]["unrelaxed_protein"]) 759 | with open(pred_output_path, 'w') as f: 760 | f.write(relaxed_pdb_lines) 761 | pbar.update(n=1) 762 | #%% 763 | #@title Display 3D structure {run: "auto"} 764 | rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"} 765 | color = "lDDT" #@param ["chain", "lDDT", "rainbow"] 766 | show_sidechains = False #@param {type:"boolean"} 767 | show_mainchains = False #@param {type:"boolean"} 768 | 769 | key = model_rank[rank_num-1] 770 | prefix = f"rank_{rank_num}_{key}" 771 | pred_output_path = os.path.join(output_dir,f'{prefix}_relaxed.pdb') 772 | if not os.path.isfile(pred_output_path): 773 | pred_output_path = os.path.join(output_dir,f'{prefix}_unrelaxed.pdb') 774 | 775 | cf.show_pdb(pred_output_path, show_sidechains, show_mainchains, color, Ls=Ls_plot).show() 776 | if color == "lDDT": cf.plot_plddt_legend().show() 777 | if use_ptm: 778 | cf.plot_confidence(outs[key]["plddt"], outs[key]["pae"], Ls=Ls_plot).show() 779 | else: 780 | cf.plot_confidence(outs[key]["plddt"], Ls=Ls_plot).show() 781 | #%% 782 | #@title Extra outputs 783 | dpi = 300#@param {type:"integer"} 784 | save_to_txt = True #@param {type:"boolean"} 785 | save_pae_json = True #@param {type:"boolean"} 786 | #@markdown - save data used to generate contact and distogram plots below to text file (pae values can be found in json file if `use_ptm` is enabled) 787 | 788 | if use_ptm: 789 | print("predicted alignment error") 790 | cf.plot_paes([outs[k]["pae"] for k in model_rank], Ls=Ls_plot, dpi=dpi) 791 | plt.savefig(os.path.join(output_dir,f'predicted_alignment_error.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi)) 792 | # plt.show() 793 | 794 | print("predicted contacts") 795 | cf.plot_adjs([outs[k]["adj"] for k in model_rank], Ls=Ls_plot, dpi=dpi) 796 | plt.savefig(os.path.join(output_dir,f'predicted_contacts.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi)) 797 | # plt.show() 798 | 799 | print("predicted distogram") 800 | cf.plot_dists([outs[k]["dists"] for k in model_rank], Ls=Ls_plot, dpi=dpi) 801 | plt.savefig(os.path.join(output_dir,f'predicted_distogram.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi)) 802 | # plt.show() 803 | 804 | print("predicted LDDT") 805 | cf.plot_plddts([outs[k]["plddt"] for k in model_rank], Ls=Ls_plot, dpi=dpi) 806 | plt.savefig(os.path.join(output_dir,f'predicted_LDDT.png'), bbox_inches = 'tight', dpi=np.maximum(200,dpi)) 807 | # plt.show() 808 | 809 | def do_save_to_txt(filename, adj, dists): 810 | adj = np.asarray(adj) 811 | dists = np.asarray(dists) 812 | L = len(adj) 813 | with open(filename,"w") as out: 814 | out.write("i\tj\taa_i\taa_j\tp(cbcb<8)\tmaxdistbin\n") 815 | for i in range(L): 816 | for j in range(i+1,L): 817 | if dists[i][j] < 21.68 or adj[i][j] >= 0.001: 818 | line = f"{i+1}\t{j+1}\t{full_sequence[i]}\t{full_sequence[j]}\t{adj[i][j]:.3f}" 819 | line += f"\t>{dists[i][j]:.2f}" if dists[i][j] == 21.6875 else f"\t{dists[i][j]:.2f}" 820 | out.write(f"{line}\n") 821 | 822 | for n,key in enumerate(model_rank): 823 | if save_to_txt: 824 | txt_filename = os.path.join(output_dir,f'rank_{n+1}_{key}.raw.txt') 825 | do_save_to_txt(txt_filename,adj=outs[key]["adj"],dists=outs[key]["dists"]) 826 | 827 | if use_ptm and save_pae_json: 828 | pae = outs[key]["pae"] 829 | max_pae = pae.max() 830 | # Save pLDDT and predicted aligned error (if it exists) 831 | pae_output_path = os.path.join(output_dir,f'rank_{n+1}_{key}_pae.json') 832 | # Save predicted aligned error in the same format as the AF EMBL DB 833 | rounded_errors = np.round(np.asarray(pae), decimals=1) 834 | indices = np.indices((len(rounded_errors), len(rounded_errors))) + 1 835 | indices_1 = indices[0].flatten().tolist() 836 | indices_2 = indices[1].flatten().tolist() 837 | pae_data = json.dumps([{ 838 | 'residue1': indices_1, 839 | 'residue2': indices_2, 840 | 'distance': rounded_errors.flatten().tolist(), 841 | 'max_predicted_aligned_error': max_pae.item() 842 | }], 843 | indent=None, 844 | separators=(',', ':')) 845 | with open(pae_output_path, 'w') as f: 846 | f.write(pae_data) 847 | #%% -------------------------------------------------------------------------------- /v1.0.0/runner_af2advanced.py: -------------------------------------------------------------------------------- 1 | #%% 2 | ## command-line arguments 3 | import argparse 4 | parser = argparse.ArgumentParser(description="Runner script that can take command-line arguments") 5 | parser.add_argument("-i", "--input", help="Path to a FASTA file. Required.", required=True) 6 | parser.add_argument("-o", "--output_dir", default="", type=str, 7 | help="Path to a directory that will store the results. " 8 | "The default name is 'prediction_