├── .gitignore ├── DRKG Drug Repurposing Knowledge Graph.pdf ├── LICENSE ├── Readme.md ├── connectivity.png ├── drkg_with_dgl ├── .ipynb_checkpoints │ └── loading_drkg_in_dgl-checkpoint.ipynb ├── Readme.md └── loading_drkg_in_dgl.ipynb ├── drug_repurpose ├── COVID-19_drug_repurposing.ipynb ├── COVID-19_drug_repurposing_via_genes.ipynb ├── COVID19_clinical_trial_drugs.tsv ├── Readme.md ├── coronavirus-related-host-genes.tsv ├── covid19-host-genes.tsv └── infer_drug.tsv ├── drugbank_info ├── README.md ├── drugbank_biotech.txt ├── drugbank_small_molecule.txt ├── drugbank_smiles.txt └── drugbank_weight.txt ├── embedding_analysis ├── Edge_score_analysis.ipynb ├── Edge_similarity_based_on_link_recommendation_results.ipynb ├── Entity_similarity_analysis.ipynb ├── Readme.md ├── Relation_similarity_analysis.ipynb └── Train_embeddings.ipynb ├── licenses └── Readme.md ├── raw_graph_analysis ├── Jaccard_scores_among_all_edge_types_in_DRKG.ipynb └── Readme.md └── utils └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea -------------------------------------------------------------------------------- /DRKG Drug Repurposing Knowledge Graph.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gnn4dr/DRKG/d4bb1974312013c4bd79a13e42c1d9492033f8c7/DRKG Drug Repurposing Knowledge Graph.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Drug Repurposing Knowledge Graph (DRKG) 2 | Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms. DRKG includes information from six existing databases including DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and data collected from recent publications particularly related to Covid19. It includes 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. These 107 edge-types show a type of interaction between one of the 17 entity-type pairs (multiple types of interactions are possible between the same entity-pair), as depicted in the figure below. It also includes a bunch of notebooks about how to explore and analysis the DRKG using statistical methodologies or using machine learning methodologies such as knowledge graph embedding. 3 | 4 | 5 |

6 | DRKG schema 7 |
8 | Figure: Interactions in the DRKG. The number next to an edge indicates the number of relation-types for that entity-pair in DRKG. 9 |

10 | 11 | ## Statistics of DRKG 12 | The type-wise distribution of the entities in DRKG and their original data-source(s) is shown in following table. 13 | 14 | | Entity type | Drugbank | GNBR | Hetionet | STRING | IntAct | DGIdb | Bibliography | Total Entities | 15 | |:--------------------|---------:|------:|---------:|-------:|-------:|------:|-------------:|---------------:| 16 | | Anatomy | \- | \- | 400 | \- | \- | \- | \- | 400 | 17 | | Atc | 4,048 | \- | \- | \- | \- | \- | \- | 4,048 | 18 | | Biological Process | \- | \- | 11,381 | \- | \- | \- | \- | 11,381 | 19 | | Cellular Component | \- | \- | 1,391 | \- | \- | \- | \- | 1,391 | 20 | | Compound | 9,708 | 11,961 | 1,538 | \- | 153 | 6,348 | 6,250 | 24,313 | 21 | | Disease | 1,182 | 4,746 | 257 | \- | \- | \- | 33 | 5,103 | 22 | | Gene | 4,973 | 27,111 | 19,145 | 18,316 | 16,321 | 2,551 | 3,181 | 39,220 | 23 | | Molecular Function | \- | \- | 2,884 | \- | \- | \- | \- | 2,884 | 24 | | Pathway | \- | \- | 1,822 | \- | \- | \- | \- | 1,822 | 25 | | Pharmacologic Class | \- | \- | 345 | \- | \- | \- | \- | 345 | 26 | | Side Effect | \- | \- | 5,701 | \- | \- | \- | \- | 5,701 | 27 | | Symptom | \- | \- | 415 | \- | \- | \- | \- | 415 | 28 | | Tax | \- | 215 | \- | \- | \- | \- | \- | 215 | 29 | | Total | 19,911 | 44,033 | 45,279 | 18,316 | 16,474 | 8,899 | 9,464 | 97,238 | 30 | 31 | 32 | The following table shows the number of triplets between different entity-type pairs in DRKG for DRKG and various datasources. 33 | 34 | | Entity\-type pair | Drugbank | GNBR | Hetionet | STRING | IntAct | DGIdb | Bibliography | Total interactions | 35 | |:--------------------------------------|-----------:|-------:|---------:|--------:|-------:|------:|-------------:|-------------------:| 36 | | \(Gene, Gene\) | \- | 66,722 | 474,526 | 1,496,708 | 254,346 | \- | 58,629 | 2,350,931 | 37 | | \(Compound, Gene\) | 24,801 | 80,803 | 51,429 | \- | 1,805 | 26,290 | 25,666 | 210,794 | 38 | | \(Disease, Gene\) | \- | 95,399 | 27,977 | \- | \- | \- | 461 | 123,837 | 39 | | \(Atc, Compound\) | 15,750 | \- | \- | \- | \- | \- | \- | 15,750 | 40 | | \(Compound, Compound\) | 1,379,271 | \- | 6,486 | \- | \- | \- | \- | 1,385,757 | 41 | | \(Compound, Disease\) | 4,968 | 77,782 | 1,145 | \- | \- | \- | \- | 83,895 | 42 | | \(Gene, Tax\) | \- | 14,663 | \- | \- | \- | \- | \- | 14,663 | 43 | | \(Biological Process, Gene\) | \- | \- | 559,504 | \- | \- | \- | \- | 559,504 | 44 | | \(Disease, Symptom\) | \- | \- | 3,357 | \- | \- | \- | \- | 3,357 | 45 | | \(Anatomy, Disease\) | \- | \- | 3,602 | \- | \- | \- | \- | 3,602 | 46 | | \(Disease, Disease\) | \- | \- | 543 | \- | \- | \- | \- | 543 | 47 | | \(Anatomy, Gene\) | \- | \- | 726,495 | \- | \- | \- | \- | 726,495 | 48 | | \(Gene, Molecular Function\) | \- | \- | 97,222 | \- | \- | \- | \- | 97,222 | 49 | | \(Compound, Pharmacologic Class\) | \- | \- | 1,029 | \- | \- | \- | \- | 1,029 | 50 | | \(Cellular Component, Gene\) | \- | \- | 73,566 | \- | \- | \- | \- | 73,566 | 51 | | \(Gene, Pathway\) | \- | \- | 84,372 | \- | \- | \- | \- | 84,372 | 52 | | \(Compound, Side Effect\) | \- | \- | 138,944 | \- | \- | \- | \- | 138,944 | 53 | | Total | 1,424,790 | 335,369 | 2,250,197 | 1,496,708 | 256,151 | 26,290 | 84,756 | 5,874,261 | 54 | 55 | 56 | ## Download DRKG 57 | To analyze DRKG, you can directly download drkg by following commands: 58 | ``` 59 | wget https://dgl-data.s3-us-west-2.amazonaws.com/dataset/DRKG/drkg.tar.gz 60 | ``` 61 | If you use our notebooks provided in this repository, you don't need to download the file manually. The notebooks can automatically download the file for you. 62 | 63 | When you untar `drkg.tar.gz`, you will see the following files: 64 | 65 | ``` 66 | ./drkg.tsv 67 | ./entity2src.tsv 68 | ./relation_glossary.tsv 69 | ./embed 70 | ./embed/DRKG_TransE_l2_relation.npy 71 | ./embed/relations.tsv 72 | ./embed/entities.tsv 73 | ./embed/Readme.md 74 | ./embed/DRKG_TransE_l2_entity.npy 75 | ./embed/mol_contextpred.npy 76 | ./embed/mol_masking.npy 77 | ./embed/mol_infomax.npy 78 | ./embed/mol_edgepred.npy 79 | ``` 80 | 81 | ### DRKG dataset 82 | The whole dataset contains four part: 83 | - drkg.tsv, a tsv file containing the original drkg in the format of (h, r, t) triplets. 84 | - embed, a folder containing the pretrained Knowledge Graph Embedding using the entire drkg.tsv as the training set and pretrained GNN-based molecule embeddings from [molecule SMILES](./drugbank_info/drugbank_smiles.txt) 85 | - entity2src.tsv, a file mapping entities in drkg to their original sources. 86 | - relation_glossary.tsv, a file containing rge glossary of the relations in DRKG, and other associated information with sources (if available). 87 | 88 | ### Pretrained DRKG embedding 89 | The DRKG mebedding is trained using TransE\_l2 model with dimention size of 400, there are four files: 90 | 91 | - DRKG\_TransE\_l2\_entity.npy, NumPy binary data, storing the entity embedding 92 | - DRKG\_TransE\_l2\_relation.npy, NumPy binary data, storing the relation embedding 93 | - entities.tsv, mapping from entity\_name to tentity\_id. 94 | - relations.tsv, mapping from relation\_name to relation\_id 95 | 96 | To use the pretrained embedding, one can use np.load to load the entity embeddings and relation embeddings separately: 97 | 98 | ``` 99 | import numpy as np 100 | entity_emb = np.load('./embed/DRKG_TransE_l2_entity.npy') 101 | rel_emb = np.load('./embed/DRKG_TransE_l2_relation.npy') 102 | ``` 103 | 104 | ### Pretrained Molecule Embedding 105 | 106 | We also provide molecule embeddings for most small-molecule drugs in DrugBank using pre-trained GNNs. In particular, 107 | [Strategies for Pre-training Graph Neural Networks](https://arxiv.org/abs/1905.12265) develops multiple approaches for 108 | pre-training GNN-based molecular representations, combining supervised molecular property prediction with 109 | self-supervised learning approaches. We employ their method to compute four variants of molecule embeddings 110 | using [DGL-LifeSci](https://github.com/awslabs/dgl-lifesci/tree/master/examples/molecule_embeddings). 111 | 112 | - `mol_contextpred.npy`: From a model pre-trained to predict surrounding graph structures of molecular subgraphs 113 | - `mol_infomax.npy`: From a model pre-trained to maximize the mutual information between local node representations 114 | and a global graph representation 115 | - `mol_edgepred.npy`: From a model pre-trained to encourage nearby nodes to have similar representations and enforcing 116 | disparate notes to have distinct representations 117 | - `mol_masking.npy`: From a model pre-trained to predict randomly masked node and edge attributes 118 | 119 | ## Tools to analyze DRKG 120 | We analyze DRKG with some deep learning frameworks, including [DGL](https://github.com/dmlc/dgl) (a framework for graph neural networks) and [DGL-KE](https://github.com/awslabs/dgl-ke) (a library for computing knowledge graph embeddings). Please follow the instructions below to install the deep learning frameworks. 121 | 122 | ### Install PyTorch 123 | Currently all notebooks use PyTorch as Deep Learning backend. For install other version of pytorch please goto [Install PyTorch](https://pytorch.org/) 124 | ``` 125 | sudo pip3 install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html 126 | ``` 127 | 128 | ### Install DGL 129 | Please install [DGL](https://www.dgl.ai/) (a framework for graph neural networks) with the following command. It installs DGL with CUDA support. 130 | ``` 131 | sudo pip3 install dgl-cu101 132 | ``` 133 | For installing other versions of DGL, please go to [Install DGL](https://docs.dgl.ai/en/latest/install/index.html) 134 | 135 | ### Install DGL-KE 136 | If you want to training the model with notebooks (e.g., using Train_embeddings.ipynb or Edge_score_analysis.ipynb) at [knowledge-graph-embedding-based-analysis-of-drkg], you need to install both DGL and [DGL-KE](https://github.com/awslabs/dgl-ke) package here. 137 | DGL-KE can work with DGL >= 0.4.3 (either CPU or GPU) 138 | ``` 139 | sudo pip3 install dglke 140 | ``` 141 | 142 | ## Notebooks for analyzing DRKG 143 | We provide a set of notebooks to analyze DRKG. Some of the notebooks use the tools installed in the previous section. 144 | 145 | ### Basic Graph Analysis of DRKG 146 | To evaluate the structural similarity among a pair of relation types we compute their Jaccard similarity coefficient and the overlap among the two edge types via the overlap coeffcient. This analysis is given in 147 | - [Jaccard_scores_among_all_edge_types_in_DRKG.ipynb](raw_graph_analysis/Jaccard_scores_among_all_edge_types_in_DRKG.ipynb) 148 | 149 | ### Knowledge Graph Embedding Based Analysis of DRKG 150 | We analyze the extracted DRKG by learning a TransE KGE model that utilizes the ![$\ell_2$](https://render.githubusercontent.com/render/math?math=%24%5Cell_2%24) distance. As DRKG combines information from different data sources, we want to verify that meaningful entity and relation embeddings can be generated using knowledge graph embedding technology. 151 | 152 | We split the edge triplets in training, validation and test sets as follows 90%, 5%, and 5% and train the KGE model as shown in following notebook: 153 | - [Train_embeddings.ipynb](embedding_analysis/Train_embeddings.ipynb) 154 | 155 | Finally, we obtain the entity and relation embeddings for the DRKG. We can do various embedding based analysis as provided in the following notebooks: 156 | - [Relation_similarity_analysis.ipynb](embedding_analysis/Relation_similarity_analysis.ipynb), analyzing the generate relation embedding similarity. 157 | - [Entity_similarity_analysis.ipynb](embedding_analysis/Entity_similarity_analysis.ipynb), analyzing the generate entity embedding similarity. 158 | - [Edge_score_analysis.ipynb](embedding_analysis/Edge_score_analysis.ipynb), evaluating whether the learned KGE model can predict the edges of DRGK 159 | - [Edge_similarity_based_on_link_recommendation_results.ipynb](embedding_analysis/Edge_similarity_based_on_link_recommendation_results.ipynb), evaluating how similar are the predicted links among different relation types. 160 | 161 | ### Drug Repurposing Using Pretrained Model for COVID-19 162 | We present an example of using pretrained DRKG model for drug repurposing for COVID-19. In the example, we directly use the pretrained model provided at [DRKG dataset](#drkg-dataset) and proposed 100 drugs for COVID-19. The following notebook provides the details: 163 | 164 | - [COVID-19_drug_repurposing.ipynb](drug_repurpose/COVID-19_drug_repurposing.ipynb) 165 | 166 | ### DRKG with DGL 167 | We provide a notebook, with example of using DRKG with Deep Graph Library (DGL). 168 | 169 | The following notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph: 170 | - [loading_drkg_in_dgl.ipynb](drkg_with_dgl/loading_drkg_in_dgl.ipynb) 171 | 172 | ## Additional Information for DrugBank 173 | 174 | Some additional information about compounds from DrugBank is included in [drugbank_info](/drugbank_info), including the 175 | type and weight of drugs, and the SMILES of small-molecule drugs. 176 | 177 | ## Licence 178 | This project is licensed under the Apache-2.0 License. However, the DRKG integrates data from many resources and users should consider the licensing of each source (see this [table](https://github.com/shuix007/COVID-19-KG/blob/master/licenses/Readme.md)) . We apply a license attribute on a per node and per edge basis for sources with defined licenses. 179 | 180 | ## Cite 181 | 182 | Please cite our dataset if you use this code and data in your work. 183 | 184 | ``` 185 | @misc{drkg2020, 186 | author = {Ioannidis, Vassilis N. and Song, Xiang and Manchanda, Saurav and Li, Mufei and Pan, Xiaoqin 187 | and Zheng, Da and Ning, Xia and Zeng, Xiangxiang and Karypis, George}, 188 | title = {DRKG - Drug Repurposing Knowledge Graph for Covid-19}, 189 | howpublished = "\url{https://github.com/gnn4dr/DRKG/}", 190 | year = {2020} 191 | } 192 | ``` 193 | A preprint describing this work will be available soon. 194 | 195 | -------------------------------------------------------------------------------- /connectivity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gnn4dr/DRKG/d4bb1974312013c4bd79a13e42c1d9492033f8c7/connectivity.png -------------------------------------------------------------------------------- /drkg_with_dgl/.ipynb_checkpoints/loading_drkg_in_dgl-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Working with DRKG in Deep Graph Library (DGL)\n", 8 | "This notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ \"" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import pandas as pd\n", 18 | "import numpy as np\n", 19 | "import dgl\n", 20 | "drkg_file = '~/covid/data/processed/filtered/drkg.tsv'\n", 21 | "df = pd.read_csv(drkg_file, sep =\"\\t\")\n", 22 | "triplets = df.values.tolist()" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "entity_dictionary = {}\n", 39 | "def insert_entry(entry, ent_type, dic):\n", 40 | " if ent_type not in dic:\n", 41 | " dic[ent_type] = {}\n", 42 | " ent_n_id = len(dic[ent_type])\n", 43 | " if entry not in dic[ent_type]:\n", 44 | " dic[ent_type][entry] = ent_n_id\n", 45 | " return dic\n", 46 | "\n", 47 | "for triple in triplets:\n", 48 | " src = triple[0]\n", 49 | " split_src = src.split('::')\n", 50 | " src_type = split_src[0]\n", 51 | " dest = triple[2]\n", 52 | " split_dest = dest.split('::')\n", 53 | " dest_type = split_dest[0]\n", 54 | " insert_entry(src,src_type,entity_dictionary)\n", 55 | " insert_entry(dest,dest_type,entity_dictionary)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 3, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "edge_dictionary={}\n", 72 | "for triple in triplets:\n", 73 | " src = triple[0]\n", 74 | " split_src = src.split('::')\n", 75 | " src_type = split_src[0]\n", 76 | " dest = triple[2]\n", 77 | " split_dest = dest.split('::')\n", 78 | " dest_type = split_dest[0]\n", 79 | " \n", 80 | " src_int_id = entity_dictionary[src_type][src]\n", 81 | " dest_int_id = entity_dictionary[dest_type][dest]\n", 82 | " \n", 83 | " pair = (src_int_id,dest_int_id)\n", 84 | " etype = (src_type,triple[1],dest_type)\n", 85 | " if etype in edge_dictionary:\n", 86 | " edge_dictionary[etype] += [pair]\n", 87 | " else:\n", 88 | " edge_dictionary[etype] = [pair]" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Create a DGL heterograph using the dictionary of relations" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 4, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "graph = dgl.heterograph(edge_dictionary);" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "## Print the statistics of the created graph" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "Number of nodes for each node-type" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 5, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "Anatomy \t 400\n", 131 | "Atc \t 4048\n", 132 | "Biological Process \t 11381\n", 133 | "Cellular Component \t 1391\n", 134 | "Compound \t 24313\n", 135 | "Disease \t 4920\n", 136 | "Gene \t 39220\n", 137 | "Molecular Function \t 2884\n", 138 | "Pathway \t 1822\n", 139 | "Pharmacologic Class \t 345\n", 140 | "Side Effect \t 5701\n", 141 | "Symptom \t 415\n", 142 | "Tax \t 215\n", 143 | "Graph contains 97055 nodes from 13 node-types.\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "total_nodes = 0;\n", 149 | "for ntype in graph.ntypes:\n", 150 | " print(ntype, '\\t', graph.number_of_nodes(ntype));\n", 151 | " total_nodes += graph.number_of_nodes(ntype);\n", 152 | "print(\"Graph contains {} nodes from {} node-types.\".format(total_nodes, len(graph.ntypes)))" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "Number of edges for each relation (edge-type)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 6, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "name": "stdout", 169 | "output_type": "stream", 170 | "text": [ 171 | "bioarx::HumGenHumGen:Gene:Gene \t 58093\n", 172 | "bioarx::VirGenHumGen:Gene:Gene \t 535\n", 173 | "bioarx::DrugVirGen:Compound:Gene \t 1165\n", 174 | "bioarx::DrugHumGen:Compound:Gene \t 24501\n", 175 | "Covid2_acc_host_gene::Disease:Gene \t 332\n", 176 | "Coronavirus_ass_host_gene::Disease:Gene \t 129\n", 177 | "DGIDB::INHIBITOR::Gene:Compound \t 5971\n", 178 | "DGIDB::ANTAGONIST::Gene:Compound \t 3006\n", 179 | "DGIDB::OTHER::Gene:Compound \t 11070\n", 180 | "DGIDB::AGONIST::Gene:Compound \t 3012\n", 181 | "DGIDB::BINDER::Gene:Compound \t 143\n", 182 | "DGIDB::MODULATOR::Gene:Compound \t 243\n", 183 | "DGIDB::BLOCKER::Gene:Compound \t 979\n", 184 | "DGIDB::CHANNEL BLOCKER::Gene:Compound \t 352\n", 185 | "DGIDB::ANTIBODY::Gene:Compound \t 188\n", 186 | "DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound \t 618\n", 187 | "DGIDB::ALLOSTERIC MODULATOR::Gene:Compound \t 317\n", 188 | "DGIDB::ACTIVATOR::Gene:Compound \t 316\n", 189 | "DGIDB::PARTIAL AGONIST::Gene:Compound \t 75\n", 190 | "DRUGBANK::drugbank_vocabulary:x-atc::Compound:Atc \t 15750\n", 191 | "DRUGBANK::drugbank_vocabulary:ddi-interactor-in::Compound:Compound \t 1379271\n", 192 | "DRUGBANK::drugbank_vocabulary:target::Compound:Gene \t 19158\n", 193 | "DRUGBANK::drugbank_vocabulary:enzyme::Compound:Gene \t 4923\n", 194 | "DRUGBANK::drugbank_vocabulary:carrier::Compound:Gene \t 720\n", 195 | "GNBR::E::Compound:Gene \t 32743\n", 196 | "GNBR::A+::Compound:Gene \t 1568\n", 197 | "GNBR::N::Compound:Gene \t 12521\n", 198 | "GNBR::K::Compound:Gene \t 12411\n", 199 | "GNBR::A-::Compound:Gene \t 1108\n", 200 | "GNBR::E+::Compound:Gene \t 1970\n", 201 | "GNBR::B::Compound:Gene \t 7170\n", 202 | "GNBR::E-::Compound:Gene \t 2918\n", 203 | "GNBR::O::Compound:Gene \t 5573\n", 204 | "GNBR::Z::Compound:Gene \t 2821\n", 205 | "GNBR::T::Compound:Disease \t 54020\n", 206 | "GNBR::C::Compound:Disease \t 1739\n", 207 | "GNBR::Sa::Compound:Disease \t 16923\n", 208 | "GNBR::Pa::Compound:Disease \t 2619\n", 209 | "GNBR::Mp::Compound:Disease \t 495\n", 210 | "GNBR::Pr::Compound:Disease \t 966\n", 211 | "GNBR::J::Compound:Disease \t 1020\n", 212 | "GNBR::L::Gene:Disease \t 48385\n", 213 | "GNBR::U::Gene:Disease \t 6432\n", 214 | "GNBR::Y::Gene:Disease \t 1948\n", 215 | "GNBR::J::Gene:Disease \t 30234\n", 216 | "GNBR::Te::Gene:Disease \t 2836\n", 217 | "GNBR::Md::Gene:Disease \t 1279\n", 218 | "GNBR::G::Gene:Disease \t 2055\n", 219 | "GNBR::D::Gene:Disease \t 500\n", 220 | "GNBR::X::Gene:Disease \t 1324\n", 221 | "GNBR::Ud::Gene:Disease \t 407\n", 222 | "GNBR::V+::Gene:Gene \t 8689\n", 223 | "GNBR::Q::Gene:Gene \t 19372\n", 224 | "GNBR::Rg::Gene:Gene \t 11018\n", 225 | "GNBR::B::Gene:Gene \t 8164\n", 226 | "GNBR::I::Gene:Gene \t 5434\n", 227 | "GNBR::E+::Gene:Gene \t 10838\n", 228 | "GNBR::H::Gene:Gene \t 2509\n", 229 | "GNBR::W::Gene:Gene \t 280\n", 230 | "GNBR::E::Gene:Gene \t 418\n", 231 | "GNBR:in_tax::Gene:Tax \t 14663\n", 232 | "Hetionet::GpBP::Gene:Biological Process \t 559504\n", 233 | "Hetionet::GiG::Gene:Gene \t 147164\n", 234 | "Hetionet::CrC::Compound:Compound \t 6486\n", 235 | "Hetionet::DdG::Disease:Gene \t 7623\n", 236 | "Hetionet::DpS::Disease:Symptom \t 3357\n", 237 | "Hetionet::DlA::Disease:Anatomy \t 3602\n", 238 | "Hetionet::CtD::Compound:Disease \t 755\n", 239 | "Hetionet::CbG::Compound:Gene \t 11571\n", 240 | "Hetionet::CuG::Compound:Gene \t 18756\n", 241 | "Hetionet::DrD::Disease:Disease \t 543\n", 242 | "Hetionet::DaG::Disease:Gene \t 12623\n", 243 | "Hetionet::CpD::Compound:Disease \t 390\n", 244 | "Hetionet::AdG::Anatomy:Gene \t 102240\n", 245 | "Hetionet::AuG::Anatomy:Gene \t 97848\n", 246 | "Hetionet::GcG::Gene:Gene \t 61690\n", 247 | "Hetionet::GpMF::Gene:Molecular Function \t 97222\n", 248 | "Hetionet::PCiC::Pharmacologic Class:Compound \t 1029\n", 249 | "Hetionet::GpCC::Gene:Cellular Component \t 73566\n", 250 | "Hetionet::Gr>G::Gene:Gene \t 265672\n", 251 | "Hetionet::CdG::Compound:Gene \t 21102\n", 252 | "Hetionet::DuG::Disease:Gene \t 7731\n", 253 | "Hetionet::GpPW::Gene:Pathway \t 84372\n", 254 | "Hetionet::CcSE::Compound:Side Effect \t 138944\n", 255 | "Hetionet::AeG::Anatomy:Gene \t 526407\n", 256 | "INTACT::PHYSICAL ASSOCIATION::Compound:Gene \t 203\n", 257 | "INTACT::DIRECT INTERACTION::Compound:Gene \t 155\n", 258 | "INTACT::ASSOCIATION::Compound:Gene \t 1447\n", 259 | "INTACT::ASSOCIATION::Gene:Gene \t 112390\n", 260 | "INTACT::PHYSICAL ASSOCIATION::Gene:Gene \t 129318\n", 261 | "INTACT::COLOCALIZATION::Gene:Gene \t 3468\n", 262 | "INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene \t 303\n", 263 | "INTACT::CLEAVAGE REACTION::Gene:Gene \t 93\n", 264 | "INTACT::DIRECT INTERACTION::Gene:Gene \t 6950\n", 265 | "INTACT::PHOSPHORYLATION REACTION::Gene:Gene \t 1328\n", 266 | "INTACT::ADP RIBOSYLATION REACTION::Gene:Gene \t 58\n", 267 | "INTACT::UBIQUITINATION REACTION::Gene:Gene \t 371\n", 268 | "INTACT::PROTEIN CLEAVAGE::Gene:Gene \t 67\n", 269 | "STRING::REACTION::Gene:Gene \t 400426\n", 270 | "STRING::CATALYSIS::Gene:Gene \t 343533\n", 271 | "STRING::ACTIVATION::Gene:Gene \t 81355\n", 272 | "STRING::INHIBITION::Gene:Gene \t 28959\n", 273 | "STRING::OTHER::Gene:Gene \t 310690\n", 274 | "STRING::BINDING::Gene:Gene \t 315875\n", 275 | "STRING::PTMOD::Gene:Gene \t 15113\n", 276 | "STRING::EXPRESSION::Gene:Gene \t 757\n", 277 | "Graph contains 5869293 edges from 106 edge-types.\n" 278 | ] 279 | } 280 | ], 281 | "source": [ 282 | "total_edges = 0;\n", 283 | "for etype in graph.etypes:\n", 284 | " print(etype, '\\t', graph.number_of_edges(etype))\n", 285 | " total_edges += graph.number_of_edges(etype);\n", 286 | "print(\"Graph contains {} edges from {} edge-types.\".format(total_edges, len(graph.etypes)))" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "Just printing the graph (\"print(graph)\") will also print the graph summary" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 7, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "name": "stdout", 303 | "output_type": "stream", 304 | "text": [ 305 | "Graph(num_nodes={'Anatomy': 400, 'Atc': 4048, 'Biological Process': 11381, 'Cellular Component': 1391, 'Compound': 24313, 'Disease': 4920, 'Gene': 39220, 'Molecular Function': 2884, 'Pathway': 1822, 'Pharmacologic Class': 345, 'Side Effect': 5701, 'Symptom': 415, 'Tax': 215},\n", 306 | " num_edges={('Gene', 'bioarx::HumGenHumGen:Gene:Gene', 'Gene'): 58093, ('Gene', 'bioarx::VirGenHumGen:Gene:Gene', 'Gene'): 535, ('Compound', 'bioarx::DrugVirGen:Compound:Gene', 'Gene'): 1165, ('Compound', 'bioarx::DrugHumGen:Compound:Gene', 'Gene'): 24501, ('Disease', 'Covid2_acc_host_gene::Disease:Gene', 'Gene'): 332, ('Disease', 'Coronavirus_ass_host_gene::Disease:Gene', 'Gene'): 129, ('Gene', 'DGIDB::INHIBITOR::Gene:Compound', 'Compound'): 5971, ('Gene', 'DGIDB::ANTAGONIST::Gene:Compound', 'Compound'): 3006, ('Gene', 'DGIDB::OTHER::Gene:Compound', 'Compound'): 11070, ('Gene', 'DGIDB::AGONIST::Gene:Compound', 'Compound'): 3012, ('Gene', 'DGIDB::BINDER::Gene:Compound', 'Compound'): 143, ('Gene', 'DGIDB::MODULATOR::Gene:Compound', 'Compound'): 243, ('Gene', 'DGIDB::BLOCKER::Gene:Compound', 'Compound'): 979, ('Gene', 'DGIDB::CHANNEL BLOCKER::Gene:Compound', 'Compound'): 352, ('Gene', 'DGIDB::ANTIBODY::Gene:Compound', 'Compound'): 188, ('Gene', 'DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 618, ('Gene', 'DGIDB::ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 317, ('Gene', 'DGIDB::ACTIVATOR::Gene:Compound', 'Compound'): 316, ('Gene', 'DGIDB::PARTIAL AGONIST::Gene:Compound', 'Compound'): 75, ('Compound', 'DRUGBANK::drugbank_vocabulary:x-atc::Compound:Atc', 'Atc'): 15750, ('Compound', 'DRUGBANK::drugbank_vocabulary:ddi-interactor-in::Compound:Compound', 'Compound'): 1379271, ('Compound', 'DRUGBANK::drugbank_vocabulary:target::Compound:Gene', 'Gene'): 19158, ('Compound', 'DRUGBANK::drugbank_vocabulary:enzyme::Compound:Gene', 'Gene'): 4923, ('Compound', 'DRUGBANK::drugbank_vocabulary:carrier::Compound:Gene', 'Gene'): 720, ('Compound', 'GNBR::E::Compound:Gene', 'Gene'): 32743, ('Compound', 'GNBR::A+::Compound:Gene', 'Gene'): 1568, ('Compound', 'GNBR::N::Compound:Gene', 'Gene'): 12521, ('Compound', 'GNBR::K::Compound:Gene', 'Gene'): 12411, ('Compound', 'GNBR::A-::Compound:Gene', 'Gene'): 1108, ('Compound', 'GNBR::E+::Compound:Gene', 'Gene'): 1970, ('Compound', 'GNBR::B::Compound:Gene', 'Gene'): 7170, ('Compound', 'GNBR::E-::Compound:Gene', 'Gene'): 2918, ('Compound', 'GNBR::O::Compound:Gene', 'Gene'): 5573, ('Compound', 'GNBR::Z::Compound:Gene', 'Gene'): 2821, ('Compound', 'GNBR::T::Compound:Disease', 'Disease'): 54020, ('Compound', 'GNBR::C::Compound:Disease', 'Disease'): 1739, ('Compound', 'GNBR::Sa::Compound:Disease', 'Disease'): 16923, ('Compound', 'GNBR::Pa::Compound:Disease', 'Disease'): 2619, ('Compound', 'GNBR::Mp::Compound:Disease', 'Disease'): 495, ('Compound', 'GNBR::Pr::Compound:Disease', 'Disease'): 966, ('Compound', 'GNBR::J::Compound:Disease', 'Disease'): 1020, ('Gene', 'GNBR::L::Gene:Disease', 'Disease'): 48385, ('Gene', 'GNBR::U::Gene:Disease', 'Disease'): 6432, ('Gene', 'GNBR::Y::Gene:Disease', 'Disease'): 1948, ('Gene', 'GNBR::J::Gene:Disease', 'Disease'): 30234, ('Gene', 'GNBR::Te::Gene:Disease', 'Disease'): 2836, ('Gene', 'GNBR::Md::Gene:Disease', 'Disease'): 1279, ('Gene', 'GNBR::G::Gene:Disease', 'Disease'): 2055, ('Gene', 'GNBR::D::Gene:Disease', 'Disease'): 500, ('Gene', 'GNBR::X::Gene:Disease', 'Disease'): 1324, ('Gene', 'GNBR::Ud::Gene:Disease', 'Disease'): 407, ('Gene', 'GNBR::V+::Gene:Gene', 'Gene'): 8689, ('Gene', 'GNBR::Q::Gene:Gene', 'Gene'): 19372, ('Gene', 'GNBR::Rg::Gene:Gene', 'Gene'): 11018, ('Gene', 'GNBR::B::Gene:Gene', 'Gene'): 8164, ('Gene', 'GNBR::I::Gene:Gene', 'Gene'): 5434, ('Gene', 'GNBR::E+::Gene:Gene', 'Gene'): 10838, ('Gene', 'GNBR::H::Gene:Gene', 'Gene'): 2509, ('Gene', 'GNBR::W::Gene:Gene', 'Gene'): 280, ('Gene', 'GNBR::E::Gene:Gene', 'Gene'): 418, ('Gene', 'GNBR:in_tax::Gene:Tax', 'Tax'): 14663, ('Gene', 'Hetionet::GpBP::Gene:Biological Process', 'Biological Process'): 559504, ('Gene', 'Hetionet::GiG::Gene:Gene', 'Gene'): 147164, ('Compound', 'Hetionet::CrC::Compound:Compound', 'Compound'): 6486, ('Disease', 'Hetionet::DdG::Disease:Gene', 'Gene'): 7623, ('Disease', 'Hetionet::DpS::Disease:Symptom', 'Symptom'): 3357, ('Disease', 'Hetionet::DlA::Disease:Anatomy', 'Anatomy'): 3602, ('Compound', 'Hetionet::CtD::Compound:Disease', 'Disease'): 755, ('Compound', 'Hetionet::CbG::Compound:Gene', 'Gene'): 11571, ('Compound', 'Hetionet::CuG::Compound:Gene', 'Gene'): 18756, ('Disease', 'Hetionet::DrD::Disease:Disease', 'Disease'): 543, ('Disease', 'Hetionet::DaG::Disease:Gene', 'Gene'): 12623, ('Compound', 'Hetionet::CpD::Compound:Disease', 'Disease'): 390, ('Anatomy', 'Hetionet::AdG::Anatomy:Gene', 'Gene'): 102240, ('Anatomy', 'Hetionet::AuG::Anatomy:Gene', 'Gene'): 97848, ('Gene', 'Hetionet::GcG::Gene:Gene', 'Gene'): 61690, ('Gene', 'Hetionet::GpMF::Gene:Molecular Function', 'Molecular Function'): 97222, ('Pharmacologic Class', 'Hetionet::PCiC::Pharmacologic Class:Compound', 'Compound'): 1029, ('Gene', 'Hetionet::GpCC::Gene:Cellular Component', 'Cellular Component'): 73566, ('Gene', 'Hetionet::Gr>G::Gene:Gene', 'Gene'): 265672, ('Compound', 'Hetionet::CdG::Compound:Gene', 'Gene'): 21102, ('Disease', 'Hetionet::DuG::Disease:Gene', 'Gene'): 7731, ('Gene', 'Hetionet::GpPW::Gene:Pathway', 'Pathway'): 84372, ('Compound', 'Hetionet::CcSE::Compound:Side Effect', 'Side Effect'): 138944, ('Anatomy', 'Hetionet::AeG::Anatomy:Gene', 'Gene'): 526407, ('Compound', 'INTACT::PHYSICAL ASSOCIATION::Compound:Gene', 'Gene'): 203, ('Compound', 'INTACT::DIRECT INTERACTION::Compound:Gene', 'Gene'): 155, ('Compound', 'INTACT::ASSOCIATION::Compound:Gene', 'Gene'): 1447, ('Gene', 'INTACT::ASSOCIATION::Gene:Gene', 'Gene'): 112390, ('Gene', 'INTACT::PHYSICAL ASSOCIATION::Gene:Gene', 'Gene'): 129318, ('Gene', 'INTACT::COLOCALIZATION::Gene:Gene', 'Gene'): 3468, ('Gene', 'INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 303, ('Gene', 'INTACT::CLEAVAGE REACTION::Gene:Gene', 'Gene'): 93, ('Gene', 'INTACT::DIRECT INTERACTION::Gene:Gene', 'Gene'): 6950, ('Gene', 'INTACT::PHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 1328, ('Gene', 'INTACT::ADP RIBOSYLATION REACTION::Gene:Gene', 'Gene'): 58, ('Gene', 'INTACT::UBIQUITINATION REACTION::Gene:Gene', 'Gene'): 371, ('Gene', 'INTACT::PROTEIN CLEAVAGE::Gene:Gene', 'Gene'): 67, ('Gene', 'STRING::REACTION::Gene:Gene', 'Gene'): 400426, ('Gene', 'STRING::CATALYSIS::Gene:Gene', 'Gene'): 343533, ('Gene', 'STRING::ACTIVATION::Gene:Gene', 'Gene'): 81355, ('Gene', 'STRING::INHIBITION::Gene:Gene', 'Gene'): 28959, ('Gene', 'STRING::OTHER::Gene:Gene', 'Gene'): 310690, ('Gene', 'STRING::BINDING::Gene:Gene', 'Gene'): 315875, ('Gene', 'STRING::PTMOD::Gene:Gene', 'Gene'): 15113, ('Gene', 'STRING::EXPRESSION::Gene:Gene', 'Gene'): 757},\n", 307 | " metagraph=[('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Tax'), ('Gene', 'Biological Process'), ('Gene', 'Molecular Function'), ('Gene', 'Cellular Component'), ('Gene', 'Pathway'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Atc'), ('Compound', 'Compound'), ('Compound', 'Compound'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Side Effect'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Symptom'), ('Disease', 'Anatomy'), ('Disease', 'Disease'), ('Pharmacologic Class', 'Compound')])\n" 308 | ] 309 | } 310 | ], 311 | "source": [ 312 | "print(graph)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [] 321 | } 322 | ], 323 | "metadata": { 324 | "kernelspec": { 325 | "display_name": "Python 3", 326 | "language": "python", 327 | "name": "python3" 328 | }, 329 | "language_info": { 330 | "codemirror_mode": { 331 | "name": "ipython", 332 | "version": 3 333 | }, 334 | "file_extension": ".py", 335 | "mimetype": "text/x-python", 336 | "name": "python", 337 | "nbconvert_exporter": "python", 338 | "pygments_lexer": "ipython3", 339 | "version": "3.6.9" 340 | } 341 | }, 342 | "nbformat": 4, 343 | "nbformat_minor": 2 344 | } 345 | -------------------------------------------------------------------------------- /drkg_with_dgl/Readme.md: -------------------------------------------------------------------------------- 1 | # Query Basic DRKG Graph Information with DGL 2 | Here we show how to load DRKG dataset into a heterograph in DGL and give several examples of queries on investigating the graph structure of DRKG in [loading_drkg_in_dgl.ipynb](loading_drkg_in_dgl.ipynb). -------------------------------------------------------------------------------- /drkg_with_dgl/loading_drkg_in_dgl.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Working with DRKG in Deep Graph Library (DGL)\n", 8 | "This notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ \"" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": {}, 15 | "outputs": [ 16 | { 17 | "name": "stderr", 18 | "output_type": "stream", 19 | "text": [ 20 | "Using backend: pytorch\n" 21 | ] 22 | } 23 | ], 24 | "source": [ 25 | "import pandas as pd\n", 26 | "import numpy as np\n", 27 | "import dgl\n", 28 | "import sys\n", 29 | "sys.path.insert(1, '../utils')\n", 30 | "from utils import download_and_extract\n", 31 | "download_and_extract()\n", 32 | "drkg_file = '../data/drkg/drkg.tsv'\n", 33 | "df = pd.read_csv(drkg_file, sep =\"\\t\", header=None)\n", 34 | "triplets = df.values.tolist()" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "entity_dictionary = {}\n", 51 | "def insert_entry(entry, ent_type, dic):\n", 52 | " if ent_type not in dic:\n", 53 | " dic[ent_type] = {}\n", 54 | " ent_n_id = len(dic[ent_type])\n", 55 | " if entry not in dic[ent_type]:\n", 56 | " dic[ent_type][entry] = ent_n_id\n", 57 | " return dic\n", 58 | "\n", 59 | "for triple in triplets:\n", 60 | " src = triple[0]\n", 61 | " split_src = src.split('::')\n", 62 | " src_type = split_src[0]\n", 63 | " dest = triple[2]\n", 64 | " split_dest = dest.split('::')\n", 65 | " dest_type = split_dest[0]\n", 66 | " insert_entry(src,src_type,entity_dictionary)\n", 67 | " insert_entry(dest,dest_type,entity_dictionary)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "edge_dictionary={}\n", 84 | "for triple in triplets:\n", 85 | " src = triple[0]\n", 86 | " split_src = src.split('::')\n", 87 | " src_type = split_src[0]\n", 88 | " dest = triple[2]\n", 89 | " split_dest = dest.split('::')\n", 90 | " dest_type = split_dest[0]\n", 91 | " \n", 92 | " src_int_id = entity_dictionary[src_type][src]\n", 93 | " dest_int_id = entity_dictionary[dest_type][dest]\n", 94 | " \n", 95 | " pair = (src_int_id,dest_int_id)\n", 96 | " etype = (src_type,triple[1],dest_type)\n", 97 | " if etype in edge_dictionary:\n", 98 | " edge_dictionary[etype] += [pair]\n", 99 | " else:\n", 100 | " edge_dictionary[etype] = [pair]" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "## Create a DGL heterograph using the dictionary of relations" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "graph = dgl.heterograph(edge_dictionary);" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "## Print the statistics of the created graph" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "Number of nodes for each node-type" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 5, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "name": "stdout", 140 | "output_type": "stream", 141 | "text": [ 142 | "Anatomy \t 400\n", 143 | "Atc \t 4048\n", 144 | "Biological Process \t 11381\n", 145 | "Cellular Component \t 1391\n", 146 | "Compound \t 24313\n", 147 | "Disease \t 5103\n", 148 | "Gene \t 39220\n", 149 | "Molecular Function \t 2884\n", 150 | "Pathway \t 1822\n", 151 | "Pharmacologic Class \t 345\n", 152 | "Side Effect \t 5701\n", 153 | "Symptom \t 415\n", 154 | "Tax \t 215\n", 155 | "Graph contains 97238 nodes from 13 node-types.\n" 156 | ] 157 | } 158 | ], 159 | "source": [ 160 | "total_nodes = 0;\n", 161 | "for ntype in graph.ntypes:\n", 162 | " print(ntype, '\\t', graph.number_of_nodes(ntype));\n", 163 | " total_nodes += graph.number_of_nodes(ntype);\n", 164 | "print(\"Graph contains {} nodes from {} node-types.\".format(total_nodes, len(graph.ntypes)))" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "Number of edges for each relation (edge-type)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 6, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "name": "stdout", 181 | "output_type": "stream", 182 | "text": [ 183 | "bioarx::HumGenHumGen:Gene:Gene \t 58094\n", 184 | "bioarx::VirGenHumGen:Gene:Gene \t 535\n", 185 | "bioarx::DrugVirGen:Compound:Gene \t 1165\n", 186 | "bioarx::DrugHumGen:Compound:Gene \t 24501\n", 187 | "bioarx::Covid2_acc_host_gene::Disease:Gene \t 332\n", 188 | "bioarx::Coronavirus_ass_host_gene::Disease:Gene \t 129\n", 189 | "DGIDB::INHIBITOR::Gene:Compound \t 5971\n", 190 | "DGIDB::ANTAGONIST::Gene:Compound \t 3006\n", 191 | "DGIDB::OTHER::Gene:Compound \t 11070\n", 192 | "DGIDB::AGONIST::Gene:Compound \t 3012\n", 193 | "DGIDB::BINDER::Gene:Compound \t 143\n", 194 | "DGIDB::MODULATOR::Gene:Compound \t 243\n", 195 | "DGIDB::BLOCKER::Gene:Compound \t 979\n", 196 | "DGIDB::CHANNEL BLOCKER::Gene:Compound \t 352\n", 197 | "DGIDB::ANTIBODY::Gene:Compound \t 188\n", 198 | "DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound \t 618\n", 199 | "DGIDB::ALLOSTERIC MODULATOR::Gene:Compound \t 317\n", 200 | "DGIDB::ACTIVATOR::Gene:Compound \t 316\n", 201 | "DGIDB::PARTIAL AGONIST::Gene:Compound \t 75\n", 202 | "DRUGBANK::x-atc::Compound:Atc \t 15750\n", 203 | "DRUGBANK::ddi-interactor-in::Compound:Compound \t 1379271\n", 204 | "DRUGBANK::target::Compound:Gene \t 19158\n", 205 | "DRUGBANK::enzyme::Compound:Gene \t 4923\n", 206 | "DRUGBANK::carrier::Compound:Gene \t 720\n", 207 | "DRUGBANK::treats::Compound:Disease \t 4968\n", 208 | "GNBR::E::Compound:Gene \t 32743\n", 209 | "GNBR::A+::Compound:Gene \t 1568\n", 210 | "GNBR::N::Compound:Gene \t 12521\n", 211 | "GNBR::K::Compound:Gene \t 12411\n", 212 | "GNBR::A-::Compound:Gene \t 1108\n", 213 | "GNBR::E+::Compound:Gene \t 1970\n", 214 | "GNBR::B::Compound:Gene \t 7170\n", 215 | "GNBR::E-::Compound:Gene \t 2918\n", 216 | "GNBR::O::Compound:Gene \t 5573\n", 217 | "GNBR::Z::Compound:Gene \t 2821\n", 218 | "GNBR::T::Compound:Disease \t 54020\n", 219 | "GNBR::C::Compound:Disease \t 1739\n", 220 | "GNBR::Sa::Compound:Disease \t 16923\n", 221 | "GNBR::Pa::Compound:Disease \t 2619\n", 222 | "GNBR::Mp::Compound:Disease \t 495\n", 223 | "GNBR::Pr::Compound:Disease \t 966\n", 224 | "GNBR::J::Compound:Disease \t 1020\n", 225 | "GNBR::L::Gene:Disease \t 48384\n", 226 | "GNBR::U::Gene:Disease \t 6432\n", 227 | "GNBR::Y::Gene:Disease \t 1948\n", 228 | "GNBR::J::Gene:Disease \t 30234\n", 229 | "GNBR::Te::Gene:Disease \t 2836\n", 230 | "GNBR::Md::Gene:Disease \t 1279\n", 231 | "GNBR::G::Gene:Disease \t 2055\n", 232 | "GNBR::D::Gene:Disease \t 500\n", 233 | "GNBR::X::Gene:Disease \t 1324\n", 234 | "GNBR::Ud::Gene:Disease \t 407\n", 235 | "GNBR::V+::Gene:Gene \t 8689\n", 236 | "GNBR::Q::Gene:Gene \t 19372\n", 237 | "GNBR::Rg::Gene:Gene \t 11018\n", 238 | "GNBR::B::Gene:Gene \t 8164\n", 239 | "GNBR::I::Gene:Gene \t 5434\n", 240 | "GNBR::E+::Gene:Gene \t 10838\n", 241 | "GNBR::H::Gene:Gene \t 2509\n", 242 | "GNBR::W::Gene:Gene \t 280\n", 243 | "GNBR::E::Gene:Gene \t 418\n", 244 | "GNBR::in_tax::Gene:Tax \t 14663\n", 245 | "Hetionet::GpBP::Gene:Biological Process \t 559504\n", 246 | "Hetionet::GiG::Gene:Gene \t 147164\n", 247 | "Hetionet::CrC::Compound:Compound \t 6486\n", 248 | "Hetionet::DdG::Disease:Gene \t 7623\n", 249 | "Hetionet::DpS::Disease:Symptom \t 3357\n", 250 | "Hetionet::DlA::Disease:Anatomy \t 3602\n", 251 | "Hetionet::CtD::Compound:Disease \t 755\n", 252 | "Hetionet::CbG::Compound:Gene \t 11571\n", 253 | "Hetionet::CuG::Compound:Gene \t 18756\n", 254 | "Hetionet::DrD::Disease:Disease \t 543\n", 255 | "Hetionet::DaG::Disease:Gene \t 12623\n", 256 | "Hetionet::CpD::Compound:Disease \t 390\n", 257 | "Hetionet::AdG::Anatomy:Gene \t 102240\n", 258 | "Hetionet::AuG::Anatomy:Gene \t 97848\n", 259 | "Hetionet::GcG::Gene:Gene \t 61690\n", 260 | "Hetionet::GpMF::Gene:Molecular Function \t 97222\n", 261 | "Hetionet::PCiC::Pharmacologic Class:Compound \t 1029\n", 262 | "Hetionet::GpCC::Gene:Cellular Component \t 73566\n", 263 | "Hetionet::Gr>G::Gene:Gene \t 265672\n", 264 | "Hetionet::CdG::Compound:Gene \t 21102\n", 265 | "Hetionet::DuG::Disease:Gene \t 7731\n", 266 | "Hetionet::GpPW::Gene:Pathway \t 84372\n", 267 | "Hetionet::CcSE::Compound:Side Effect \t 138944\n", 268 | "Hetionet::AeG::Anatomy:Gene \t 526407\n", 269 | "INTACT::PHYSICAL ASSOCIATION::Compound:Gene \t 203\n", 270 | "INTACT::DIRECT INTERACTION::Compound:Gene \t 155\n", 271 | "INTACT::ASSOCIATION::Compound:Gene \t 1447\n", 272 | "INTACT::ASSOCIATION::Gene:Gene \t 112390\n", 273 | "INTACT::PHYSICAL ASSOCIATION::Gene:Gene \t 129318\n", 274 | "INTACT::COLOCALIZATION::Gene:Gene \t 3468\n", 275 | "INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene \t 303\n", 276 | "INTACT::CLEAVAGE REACTION::Gene:Gene \t 93\n", 277 | "INTACT::DIRECT INTERACTION::Gene:Gene \t 6950\n", 278 | "INTACT::PHOSPHORYLATION REACTION::Gene:Gene \t 1328\n", 279 | "INTACT::ADP RIBOSYLATION REACTION::Gene:Gene \t 58\n", 280 | "INTACT::UBIQUITINATION REACTION::Gene:Gene \t 371\n", 281 | "INTACT::PROTEIN CLEAVAGE::Gene:Gene \t 67\n", 282 | "STRING::REACTION::Gene:Gene \t 400426\n", 283 | "STRING::CATALYSIS::Gene:Gene \t 343533\n", 284 | "STRING::ACTIVATION::Gene:Gene \t 81355\n", 285 | "STRING::INHIBITION::Gene:Gene \t 28959\n", 286 | "STRING::OTHER::Gene:Gene \t 310690\n", 287 | "STRING::BINDING::Gene:Gene \t 315875\n", 288 | "STRING::PTMOD::Gene:Gene \t 15113\n", 289 | "STRING::EXPRESSION::Gene:Gene \t 757\n", 290 | "Graph contains 5874261 edges from 107 edge-types.\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "total_edges = 0;\n", 296 | "for etype in graph.etypes:\n", 297 | " print(etype, '\\t', graph.number_of_edges(etype))\n", 298 | " total_edges += graph.number_of_edges(etype);\n", 299 | "print(\"Graph contains {} edges from {} edge-types.\".format(total_edges, len(graph.etypes)))" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Just printing the graph (\"print(graph)\") will also print the graph summary" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 7, 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "name": "stdout", 316 | "output_type": "stream", 317 | "text": [ 318 | "Graph(num_nodes={'Anatomy': 400, 'Atc': 4048, 'Biological Process': 11381, 'Cellular Component': 1391, 'Compound': 24313, 'Disease': 5103, 'Gene': 39220, 'Molecular Function': 2884, 'Pathway': 1822, 'Pharmacologic Class': 345, 'Side Effect': 5701, 'Symptom': 415, 'Tax': 215},\n", 319 | " num_edges={('Gene', 'bioarx::HumGenHumGen:Gene:Gene', 'Gene'): 58094, ('Gene', 'bioarx::VirGenHumGen:Gene:Gene', 'Gene'): 535, ('Compound', 'bioarx::DrugVirGen:Compound:Gene', 'Gene'): 1165, ('Compound', 'bioarx::DrugHumGen:Compound:Gene', 'Gene'): 24501, ('Disease', 'bioarx::Covid2_acc_host_gene::Disease:Gene', 'Gene'): 332, ('Disease', 'bioarx::Coronavirus_ass_host_gene::Disease:Gene', 'Gene'): 129, ('Gene', 'DGIDB::INHIBITOR::Gene:Compound', 'Compound'): 5971, ('Gene', 'DGIDB::ANTAGONIST::Gene:Compound', 'Compound'): 3006, ('Gene', 'DGIDB::OTHER::Gene:Compound', 'Compound'): 11070, ('Gene', 'DGIDB::AGONIST::Gene:Compound', 'Compound'): 3012, ('Gene', 'DGIDB::BINDER::Gene:Compound', 'Compound'): 143, ('Gene', 'DGIDB::MODULATOR::Gene:Compound', 'Compound'): 243, ('Gene', 'DGIDB::BLOCKER::Gene:Compound', 'Compound'): 979, ('Gene', 'DGIDB::CHANNEL BLOCKER::Gene:Compound', 'Compound'): 352, ('Gene', 'DGIDB::ANTIBODY::Gene:Compound', 'Compound'): 188, ('Gene', 'DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 618, ('Gene', 'DGIDB::ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 317, ('Gene', 'DGIDB::ACTIVATOR::Gene:Compound', 'Compound'): 316, ('Gene', 'DGIDB::PARTIAL AGONIST::Gene:Compound', 'Compound'): 75, ('Compound', 'DRUGBANK::x-atc::Compound:Atc', 'Atc'): 15750, ('Compound', 'DRUGBANK::ddi-interactor-in::Compound:Compound', 'Compound'): 1379271, ('Compound', 'DRUGBANK::target::Compound:Gene', 'Gene'): 19158, ('Compound', 'DRUGBANK::enzyme::Compound:Gene', 'Gene'): 4923, ('Compound', 'DRUGBANK::carrier::Compound:Gene', 'Gene'): 720, ('Compound', 'DRUGBANK::treats::Compound:Disease', 'Disease'): 4968, ('Compound', 'GNBR::E::Compound:Gene', 'Gene'): 32743, ('Compound', 'GNBR::A+::Compound:Gene', 'Gene'): 1568, ('Compound', 'GNBR::N::Compound:Gene', 'Gene'): 12521, ('Compound', 'GNBR::K::Compound:Gene', 'Gene'): 12411, ('Compound', 'GNBR::A-::Compound:Gene', 'Gene'): 1108, ('Compound', 'GNBR::E+::Compound:Gene', 'Gene'): 1970, ('Compound', 'GNBR::B::Compound:Gene', 'Gene'): 7170, ('Compound', 'GNBR::E-::Compound:Gene', 'Gene'): 2918, ('Compound', 'GNBR::O::Compound:Gene', 'Gene'): 5573, ('Compound', 'GNBR::Z::Compound:Gene', 'Gene'): 2821, ('Compound', 'GNBR::T::Compound:Disease', 'Disease'): 54020, ('Compound', 'GNBR::C::Compound:Disease', 'Disease'): 1739, ('Compound', 'GNBR::Sa::Compound:Disease', 'Disease'): 16923, ('Compound', 'GNBR::Pa::Compound:Disease', 'Disease'): 2619, ('Compound', 'GNBR::Mp::Compound:Disease', 'Disease'): 495, ('Compound', 'GNBR::Pr::Compound:Disease', 'Disease'): 966, ('Compound', 'GNBR::J::Compound:Disease', 'Disease'): 1020, ('Gene', 'GNBR::L::Gene:Disease', 'Disease'): 48384, ('Gene', 'GNBR::U::Gene:Disease', 'Disease'): 6432, ('Gene', 'GNBR::Y::Gene:Disease', 'Disease'): 1948, ('Gene', 'GNBR::J::Gene:Disease', 'Disease'): 30234, ('Gene', 'GNBR::Te::Gene:Disease', 'Disease'): 2836, ('Gene', 'GNBR::Md::Gene:Disease', 'Disease'): 1279, ('Gene', 'GNBR::G::Gene:Disease', 'Disease'): 2055, ('Gene', 'GNBR::D::Gene:Disease', 'Disease'): 500, ('Gene', 'GNBR::X::Gene:Disease', 'Disease'): 1324, ('Gene', 'GNBR::Ud::Gene:Disease', 'Disease'): 407, ('Gene', 'GNBR::V+::Gene:Gene', 'Gene'): 8689, ('Gene', 'GNBR::Q::Gene:Gene', 'Gene'): 19372, ('Gene', 'GNBR::Rg::Gene:Gene', 'Gene'): 11018, ('Gene', 'GNBR::B::Gene:Gene', 'Gene'): 8164, ('Gene', 'GNBR::I::Gene:Gene', 'Gene'): 5434, ('Gene', 'GNBR::E+::Gene:Gene', 'Gene'): 10838, ('Gene', 'GNBR::H::Gene:Gene', 'Gene'): 2509, ('Gene', 'GNBR::W::Gene:Gene', 'Gene'): 280, ('Gene', 'GNBR::E::Gene:Gene', 'Gene'): 418, ('Gene', 'GNBR::in_tax::Gene:Tax', 'Tax'): 14663, ('Gene', 'Hetionet::GpBP::Gene:Biological Process', 'Biological Process'): 559504, ('Gene', 'Hetionet::GiG::Gene:Gene', 'Gene'): 147164, ('Compound', 'Hetionet::CrC::Compound:Compound', 'Compound'): 6486, ('Disease', 'Hetionet::DdG::Disease:Gene', 'Gene'): 7623, ('Disease', 'Hetionet::DpS::Disease:Symptom', 'Symptom'): 3357, ('Disease', 'Hetionet::DlA::Disease:Anatomy', 'Anatomy'): 3602, ('Compound', 'Hetionet::CtD::Compound:Disease', 'Disease'): 755, ('Compound', 'Hetionet::CbG::Compound:Gene', 'Gene'): 11571, ('Compound', 'Hetionet::CuG::Compound:Gene', 'Gene'): 18756, ('Disease', 'Hetionet::DrD::Disease:Disease', 'Disease'): 543, ('Disease', 'Hetionet::DaG::Disease:Gene', 'Gene'): 12623, ('Compound', 'Hetionet::CpD::Compound:Disease', 'Disease'): 390, ('Anatomy', 'Hetionet::AdG::Anatomy:Gene', 'Gene'): 102240, ('Anatomy', 'Hetionet::AuG::Anatomy:Gene', 'Gene'): 97848, ('Gene', 'Hetionet::GcG::Gene:Gene', 'Gene'): 61690, ('Gene', 'Hetionet::GpMF::Gene:Molecular Function', 'Molecular Function'): 97222, ('Pharmacologic Class', 'Hetionet::PCiC::Pharmacologic Class:Compound', 'Compound'): 1029, ('Gene', 'Hetionet::GpCC::Gene:Cellular Component', 'Cellular Component'): 73566, ('Gene', 'Hetionet::Gr>G::Gene:Gene', 'Gene'): 265672, ('Compound', 'Hetionet::CdG::Compound:Gene', 'Gene'): 21102, ('Disease', 'Hetionet::DuG::Disease:Gene', 'Gene'): 7731, ('Gene', 'Hetionet::GpPW::Gene:Pathway', 'Pathway'): 84372, ('Compound', 'Hetionet::CcSE::Compound:Side Effect', 'Side Effect'): 138944, ('Anatomy', 'Hetionet::AeG::Anatomy:Gene', 'Gene'): 526407, ('Compound', 'INTACT::PHYSICAL ASSOCIATION::Compound:Gene', 'Gene'): 203, ('Compound', 'INTACT::DIRECT INTERACTION::Compound:Gene', 'Gene'): 155, ('Compound', 'INTACT::ASSOCIATION::Compound:Gene', 'Gene'): 1447, ('Gene', 'INTACT::ASSOCIATION::Gene:Gene', 'Gene'): 112390, ('Gene', 'INTACT::PHYSICAL ASSOCIATION::Gene:Gene', 'Gene'): 129318, ('Gene', 'INTACT::COLOCALIZATION::Gene:Gene', 'Gene'): 3468, ('Gene', 'INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 303, ('Gene', 'INTACT::CLEAVAGE REACTION::Gene:Gene', 'Gene'): 93, ('Gene', 'INTACT::DIRECT INTERACTION::Gene:Gene', 'Gene'): 6950, ('Gene', 'INTACT::PHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 1328, ('Gene', 'INTACT::ADP RIBOSYLATION REACTION::Gene:Gene', 'Gene'): 58, ('Gene', 'INTACT::UBIQUITINATION REACTION::Gene:Gene', 'Gene'): 371, ('Gene', 'INTACT::PROTEIN CLEAVAGE::Gene:Gene', 'Gene'): 67, ('Gene', 'STRING::REACTION::Gene:Gene', 'Gene'): 400426, ('Gene', 'STRING::CATALYSIS::Gene:Gene', 'Gene'): 343533, ('Gene', 'STRING::ACTIVATION::Gene:Gene', 'Gene'): 81355, ('Gene', 'STRING::INHIBITION::Gene:Gene', 'Gene'): 28959, ('Gene', 'STRING::OTHER::Gene:Gene', 'Gene'): 310690, ('Gene', 'STRING::BINDING::Gene:Gene', 'Gene'): 315875, ('Gene', 'STRING::PTMOD::Gene:Gene', 'Gene'): 15113, ('Gene', 'STRING::EXPRESSION::Gene:Gene', 'Gene'): 757},\n", 320 | " metagraph=[('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Tax'), ('Gene', 'Biological Process'), ('Gene', 'Molecular Function'), ('Gene', 'Cellular Component'), ('Gene', 'Pathway'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Atc'), ('Compound', 'Compound'), ('Compound', 'Compound'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Side Effect'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Symptom'), ('Disease', 'Anatomy'), ('Disease', 'Disease'), ('Pharmacologic Class', 'Compound')])\n" 321 | ] 322 | } 323 | ], 324 | "source": [ 325 | "print(graph)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": {}, 332 | "outputs": [], 333 | "source": [] 334 | } 335 | ], 336 | "metadata": { 337 | "kernelspec": { 338 | "display_name": "Python 3", 339 | "language": "python", 340 | "name": "python3" 341 | }, 342 | "language_info": { 343 | "codemirror_mode": { 344 | "name": "ipython", 345 | "version": 3 346 | }, 347 | "file_extension": ".py", 348 | "mimetype": "text/x-python", 349 | "name": "python", 350 | "nbconvert_exporter": "python", 351 | "pygments_lexer": "ipython3", 352 | "version": "3.7.6" 353 | } 354 | }, 355 | "nbformat": 4, 356 | "nbformat_minor": 2 357 | } 358 | -------------------------------------------------------------------------------- /drug_repurpose/COVID-19_drug_repurposing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# COVID-19 Drug Repurposing via disease-compounds relations\n", 8 | "This example shows how to do drug repurposing using DRKG even with the pretrained model." 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "## Collecting COVID-19 related disease\n", 16 | "At the very beginning we need to collect a list of disease of Corona-Virus(COV) in DRKG. We can easily use the Disease ID that DRKG uses for encoding the disease. Here we take all of the COV disease as target." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "COV_disease_list = [\n", 26 | "'Disease::SARS-CoV2 E',\n", 27 | "'Disease::SARS-CoV2 M',\n", 28 | "'Disease::SARS-CoV2 N',\n", 29 | "'Disease::SARS-CoV2 Spike',\n", 30 | "'Disease::SARS-CoV2 nsp1',\n", 31 | "'Disease::SARS-CoV2 nsp10',\n", 32 | "'Disease::SARS-CoV2 nsp11',\n", 33 | "'Disease::SARS-CoV2 nsp12',\n", 34 | "'Disease::SARS-CoV2 nsp13',\n", 35 | "'Disease::SARS-CoV2 nsp14',\n", 36 | "'Disease::SARS-CoV2 nsp15',\n", 37 | "'Disease::SARS-CoV2 nsp2',\n", 38 | "'Disease::SARS-CoV2 nsp4',\n", 39 | "'Disease::SARS-CoV2 nsp5',\n", 40 | "'Disease::SARS-CoV2 nsp5_C145A',\n", 41 | "'Disease::SARS-CoV2 nsp6',\n", 42 | "'Disease::SARS-CoV2 nsp7',\n", 43 | "'Disease::SARS-CoV2 nsp8',\n", 44 | "'Disease::SARS-CoV2 nsp9',\n", 45 | "'Disease::SARS-CoV2 orf10',\n", 46 | "'Disease::SARS-CoV2 orf3a',\n", 47 | "'Disease::SARS-CoV2 orf3b',\n", 48 | "'Disease::SARS-CoV2 orf6',\n", 49 | "'Disease::SARS-CoV2 orf7a',\n", 50 | "'Disease::SARS-CoV2 orf8',\n", 51 | "'Disease::SARS-CoV2 orf9b',\n", 52 | "'Disease::SARS-CoV2 orf9c',\n", 53 | "'Disease::MESH:D045169',\n", 54 | "'Disease::MESH:D045473',\n", 55 | "'Disease::MESH:D001351',\n", 56 | "'Disease::MESH:D065207',\n", 57 | "'Disease::MESH:D028941',\n", 58 | "'Disease::MESH:D058957',\n", 59 | "'Disease::MESH:D006517'\n", 60 | "]" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "## Candidate drugs\n", 68 | "Now we use FDA-approved drugs in Drugbank as candidate drugs. (we exclude drugs with molecule weight < 250) The drug list is in infer\\_drug.tsv" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 2, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "import csv\n", 78 | "\n", 79 | "# Load entity file\n", 80 | "drug_list = []\n", 81 | "with open(\"./infer_drug.tsv\", newline='', encoding='utf-8') as csvfile:\n", 82 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['drug','ids'])\n", 83 | " for row_val in reader:\n", 84 | " drug_list.append(row_val['drug'])" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 3, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "text/plain": [ 95 | "8104" 96 | ] 97 | }, 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "output_type": "execute_result" 101 | } 102 | ], 103 | "source": [ 104 | "len(drug_list)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "## Treatment relation" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "Two treatment relations in this context" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 4, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "treatment = ['Hetionet::CtD::Compound:Disease','GNBR::T::Compound:Disease']" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Get pretrained model\n", 135 | "We can directly use the pretrianed model to do drug repurposing." 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 5, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "Download finished. Unzipping the file...\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "import pandas as pd\n", 153 | "import numpy as np\n", 154 | "import sys\n", 155 | "sys.path.insert(1, '../utils')\n", 156 | "from utils import download_and_extract\n", 157 | "download_and_extract()" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 6, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "entity_idmap_file = '../data/drkg/embed/entities.tsv'\n", 167 | "relation_idmap_file = '../data/drkg/embed/relations.tsv'" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "## Get embeddings for diseases and drugs" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 7, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "# Get drugname/disease name to entity ID mappings\n", 184 | "entity_map = {}\n", 185 | "entity_id_map = {}\n", 186 | "relation_map = {}\n", 187 | "with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:\n", 188 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n", 189 | " for row_val in reader:\n", 190 | " entity_map[row_val['name']] = int(row_val['id'])\n", 191 | " entity_id_map[int(row_val['id'])] = row_val['name']\n", 192 | " \n", 193 | "with open(relation_idmap_file, newline='', encoding='utf-8') as csvfile:\n", 194 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n", 195 | " for row_val in reader:\n", 196 | " relation_map[row_val['name']] = int(row_val['id'])\n", 197 | " \n", 198 | "# handle the ID mapping\n", 199 | "drug_ids = []\n", 200 | "disease_ids = []\n", 201 | "for drug in drug_list:\n", 202 | " drug_ids.append(entity_map[drug])\n", 203 | " \n", 204 | "for disease in COV_disease_list:\n", 205 | " disease_ids.append(entity_map[disease])\n", 206 | "\n", 207 | "treatment_rid = [relation_map[treat] for treat in treatment]" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 8, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "# Load embeddings\n", 217 | "import torch as th\n", 218 | "entity_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_entity.npy')\n", 219 | "rel_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_relation.npy')\n", 220 | "\n", 221 | "drug_ids = th.tensor(drug_ids).long()\n", 222 | "disease_ids = th.tensor(disease_ids).long()\n", 223 | "treatment_rid = th.tensor(treatment_rid)\n", 224 | "\n", 225 | "drug_emb = th.tensor(entity_emb[drug_ids])\n", 226 | "treatment_embs = [th.tensor(rel_emb[rid]) for rid in treatment_rid]" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "## Drug Repurposing Based on Edge Score\n", 234 | "We use following algorithm to calculate the edge score. Note, here we use logsigmiod to make all scores < 0. The larger the score is, the stronger the $h$ will have $r$ with $t$.\n", 235 | "\n", 236 | "$\\mathbf{d} = \\gamma - ||\\mathbf{h}+\\mathbf{r}-\\mathbf{t}||_{2}$\n", 237 | "\n", 238 | "$\\mathbf{score} = \\log\\left(\\frac{1}{1+\\exp(\\mathbf{-d})}\\right)$\n", 239 | "\n", 240 | "When doing drug repurposing, we only use the treatment related relations." 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 9, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [ 249 | "import torch.nn.functional as fn\n", 250 | "\n", 251 | "gamma=12.0\n", 252 | "def transE_l2(head, rel, tail):\n", 253 | " score = head + rel - tail\n", 254 | " return gamma - th.norm(score, p=2, dim=-1)\n", 255 | "\n", 256 | "scores_per_disease = []\n", 257 | "dids = []\n", 258 | "for rid in range(len(treatment_embs)):\n", 259 | " treatment_emb=treatment_embs[rid]\n", 260 | " for disease_id in disease_ids:\n", 261 | " disease_emb = entity_emb[disease_id]\n", 262 | " score = fn.logsigmoid(transE_l2(drug_emb, treatment_emb, disease_emb))\n", 263 | " scores_per_disease.append(score)\n", 264 | " dids.append(drug_ids)\n", 265 | "scores = th.cat(scores_per_disease)\n", 266 | "dids = th.cat(dids)\n" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 10, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "# sort scores in decending order\n", 276 | "idx = th.flip(th.argsort(scores), dims=[0])\n", 277 | "scores = scores[idx].numpy()\n", 278 | "dids = dids[idx].numpy()" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "### Now we output proposed treatments" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 11, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "_, unique_indices = np.unique(dids, return_index=True)\n", 295 | "topk=100\n", 296 | "topk_indices = np.sort(unique_indices)[:topk]\n", 297 | "proposed_dids = dids[topk_indices]\n", 298 | "proposed_scores = scores[topk_indices]" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "Now we list the pairs of in form of (drug, treat, disease, score) \n", 306 | "\n", 307 | "We select top K relevent drugs according the edge score" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 12, 313 | "metadata": {}, 314 | "outputs": [ 315 | { 316 | "name": "stdout", 317 | "output_type": "stream", 318 | "text": [ 319 | "Compound::DB00811\t-0.21416784822940826\n", 320 | "Compound::DB00993\t-0.8350892663002014\n", 321 | "Compound::DB00635\t-0.8974801898002625\n", 322 | "Compound::DB01082\t-0.9854875802993774\n", 323 | "Compound::DB01234\t-0.9984006881713867\n", 324 | "Compound::DB00982\t-1.0160722732543945\n", 325 | "Compound::DB00563\t-1.0189464092254639\n", 326 | "Compound::DB00290\t-1.064104437828064\n", 327 | "Compound::DB01394\t-1.080674648284912\n", 328 | "Compound::DB01222\t-1.084547519683838\n", 329 | "Compound::DB00415\t-1.0853980779647827\n", 330 | "Compound::DB01004\t-1.096668004989624\n", 331 | "Compound::DB00860\t-1.1004775762557983\n", 332 | "Compound::DB00681\t-1.1011559963226318\n", 333 | "Compound::DB00688\t-1.125687599182129\n", 334 | "Compound::DB00624\t-1.1428285837173462\n", 335 | "Compound::DB00959\t-1.1618402004241943\n", 336 | "Compound::DB00115\t-1.1868144273757935\n", 337 | "Compound::DB00091\t-1.1906721591949463\n", 338 | "Compound::DB01024\t-1.2051165103912354\n", 339 | "Compound::DB00741\t-1.2147064208984375\n", 340 | "Compound::DB00441\t-1.2320444583892822\n", 341 | "Compound::DB00158\t-1.2346539497375488\n", 342 | "Compound::DB00499\t-1.2525147199630737\n", 343 | "Compound::DB00929\t-1.2730510234832764\n", 344 | "Compound::DB00770\t-1.2825534343719482\n", 345 | "Compound::DB01331\t-1.2960500717163086\n", 346 | "Compound::DB00958\t-1.2967796325683594\n", 347 | "Compound::DB02527\t-1.303438663482666\n", 348 | "Compound::DB00196\t-1.3053392171859741\n", 349 | "Compound::DB00537\t-1.3131829500198364\n", 350 | "Compound::DB00644\t-1.3131871223449707\n", 351 | "Compound::DB01048\t-1.3267226219177246\n", 352 | "Compound::DB00552\t-1.3272088766098022\n", 353 | "Compound::DB00328\t-1.3286101818084717\n", 354 | "Compound::DB00171\t-1.3300385475158691\n", 355 | "Compound::DB01212\t-1.3330755233764648\n", 356 | "Compound::DB09093\t-1.3382999897003174\n", 357 | "Compound::DB00783\t-1.338560938835144\n", 358 | "Compound::DB09341\t-1.3396968841552734\n", 359 | "Compound::DB00558\t-1.3425884246826172\n", 360 | "Compound::DB05382\t-1.3575129508972168\n", 361 | "Compound::DB01112\t-1.3584508895874023\n", 362 | "Compound::DB00515\t-1.3608112335205078\n", 363 | "Compound::DB01101\t-1.381548523902893\n", 364 | "Compound::DB01165\t-1.3838160037994385\n", 365 | "Compound::DB01183\t-1.3862146139144897\n", 366 | "Compound::DB00815\t-1.3863483667373657\n", 367 | "Compound::DB00755\t-1.3881785869598389\n", 368 | "Compound::DB00198\t-1.3885014057159424\n", 369 | "Compound::DB00480\t-1.3935325145721436\n", 370 | "Compound::DB00806\t-1.3996552228927612\n", 371 | "Compound::DB01656\t-1.3999741077423096\n", 372 | "Compound::DB00759\t-1.404650092124939\n", 373 | "Compound::DB00917\t-1.4116020202636719\n", 374 | "Compound::DB01181\t-1.4148889780044556\n", 375 | "Compound::DB01039\t-1.4176580905914307\n", 376 | "Compound::DB00512\t-1.4207379817962646\n", 377 | "Compound::DB01233\t-1.4211887121200562\n", 378 | "Compound::DB11996\t-1.425789475440979\n", 379 | "Compound::DB00738\t-1.4274098873138428\n", 380 | "Compound::DB00716\t-1.4327492713928223\n", 381 | "Compound::DB03461\t-1.437927484512329\n", 382 | "Compound::DB00591\t-1.4404338598251343\n", 383 | "Compound::DB01327\t-1.4408743381500244\n", 384 | "Compound::DB00131\t-1.4446886777877808\n", 385 | "Compound::DB00693\t-1.4460749626159668\n", 386 | "Compound::DB00369\t-1.4505752325057983\n", 387 | "Compound::DB04630\t-1.453115463256836\n", 388 | "Compound::DB00878\t-1.456466555595398\n", 389 | "Compound::DB08818\t-1.4633680582046509\n", 390 | "Compound::DB00682\t-1.4691765308380127\n", 391 | "Compound::DB01068\t-1.4700121879577637\n", 392 | "Compound::DB00446\t-1.4720206260681152\n", 393 | "Compound::DB01115\t-1.4729849100112915\n", 394 | "Compound::DB00355\t-1.4770021438598633\n", 395 | "Compound::DB01030\t-1.485068678855896\n", 396 | "Compound::DB00620\t-1.4973516464233398\n", 397 | "Compound::DB00396\t-1.4976921081542969\n", 398 | "Compound::DB01073\t-1.4987037181854248\n", 399 | "Compound::DB00640\t-1.5026229619979858\n", 400 | "Compound::DB00999\t-1.5034282207489014\n", 401 | "Compound::DB01060\t-1.504364252090454\n", 402 | "Compound::DB00493\t-1.5072362422943115\n", 403 | "Compound::DB01240\t-1.5090957880020142\n", 404 | "Compound::DB00364\t-1.509944200515747\n", 405 | "Compound::DB01263\t-1.511993169784546\n", 406 | "Compound::DB00746\t-1.513066053390503\n", 407 | "Compound::DB00718\t-1.5183149576187134\n", 408 | "Compound::DB01065\t-1.5207160711288452\n", 409 | "Compound::DB01205\t-1.521277904510498\n", 410 | "Compound::DB01137\t-1.5229592323303223\n", 411 | "Compound::DB08894\t-1.5239660739898682\n", 412 | "Compound::DB00813\t-1.5308701992034912\n", 413 | "Compound::DB01157\t-1.5316557884216309\n", 414 | "Compound::DB04570\t-1.5430843830108643\n", 415 | "Compound::DB00459\t-1.5503207445144653\n", 416 | "Compound::DB01752\t-1.5541703701019287\n", 417 | "Compound::DB00775\t-1.5559712648391724\n", 418 | "Compound::DB01610\t-1.5563474893569946\n" 419 | ] 420 | } 421 | ], 422 | "source": [ 423 | "for i in range(topk):\n", 424 | " drug = int(proposed_dids[i])\n", 425 | " score = proposed_scores[i]\n", 426 | " \n", 427 | " print(\"{}\\t{}\".format(entity_id_map[drug], score))" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "### Check Clinial Trial Drugs\n", 435 | "There are seven clinial trial drugs hit in top100. (Note: Ribavirin exists in DRKG as a treatment for SARS)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 13, 441 | "metadata": {}, 442 | "outputs": [ 443 | { 444 | "name": "stdout", 445 | "output_type": "stream", 446 | "text": [ 447 | "[0]\tRibavirin\t-0.21416784822940826\n", 448 | "[4]\tDexamethasone\t-0.9984006881713867\n", 449 | "[8]\tColchicine\t-1.080674648284912\n", 450 | "[16]\tMethylprednisolone\t-1.1618402004241943\n", 451 | "[49]\tOseltamivir\t-1.3885014057159424\n", 452 | "[87]\tDeferoxamine\t-1.513066053390503\n" 453 | ] 454 | } 455 | ], 456 | "source": [ 457 | "clinical_drugs_file = './COVID19_clinical_trial_drugs.tsv'\n", 458 | "clinical_drug_map = {}\n", 459 | "with open(clinical_drugs_file, newline='', encoding='utf-8') as csvfile:\n", 460 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['id', 'drug_name','drug_id'])\n", 461 | " for row_val in reader:\n", 462 | " clinical_drug_map[row_val['drug_id']] = row_val['drug_name']\n", 463 | " \n", 464 | "for i in range(topk):\n", 465 | " drug = entity_id_map[int(proposed_dids[i])][10:17]\n", 466 | " if clinical_drug_map.get(drug, None) is not None:\n", 467 | " score = proposed_scores[i]\n", 468 | " print(\"[{}]\\t{}\\t{}\".format(i, clinical_drug_map[drug],score , proposed_scores[i]))" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 14, 474 | "metadata": {}, 475 | "outputs": [ 476 | { 477 | "data": { 478 | "text/plain": [ 479 | "32" 480 | ] 481 | }, 482 | "execution_count": 14, 483 | "metadata": {}, 484 | "output_type": "execute_result" 485 | } 486 | ], 487 | "source": [ 488 | "len(clinical_drug_map)" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "metadata": {}, 495 | "outputs": [], 496 | "source": [] 497 | } 498 | ], 499 | "metadata": { 500 | "kernelspec": { 501 | "display_name": "Python 3", 502 | "language": "python", 503 | "name": "python3" 504 | }, 505 | "language_info": { 506 | "codemirror_mode": { 507 | "name": "ipython", 508 | "version": 3 509 | }, 510 | "file_extension": ".py", 511 | "mimetype": "text/x-python", 512 | "name": "python", 513 | "nbconvert_exporter": "python", 514 | "pygments_lexer": "ipython3", 515 | "version": "3.7.6" 516 | } 517 | }, 518 | "nbformat": 4, 519 | "nbformat_minor": 4 520 | } 521 | -------------------------------------------------------------------------------- /drug_repurpose/COVID-19_drug_repurposing_via_genes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# COVID-19 Drug Repurposing via gene-compounds relations\n", 8 | "This example shows how to do drug repurposing using DRKG even with the pretrained model." 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "## Collecting COVID-19 related disease\n", 16 | "At the very beginning we need to collect a list of associated genes for Corona-Virus(COV) in DRKG. " 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "442\n" 29 | ] 30 | } 31 | ], 32 | "source": [ 33 | "import pandas as pd\n", 34 | "import numpy as np\n", 35 | "file='coronavirus-related-host-genes.tsv'\n", 36 | "df = pd.read_csv(file, sep=\"\\t\")\n", 37 | "cov_genes = np.unique(df.values[:,2]).tolist()\n", 38 | "file='covid19-host-genes.tsv'\n", 39 | "df = pd.read_csv(file, sep=\"\\t\")\n", 40 | "cov2_genes = np.unique(df.values[:,2]).tolist()\n", 41 | "# keep unique related genes\n", 42 | "\n", 43 | "cov_related_genes=list(set(cov_genes+cov2_genes))\n", 44 | "#cov_related_genes=list(set(cov2_genes))\n", 45 | "print(len(cov_related_genes))" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Candidate drugs\n", 53 | "Now we use FDA-approved drugs in Drugbank as candidate drugs. (we exclude drugs with molecule weight < 250) The drug list is in infer\\_drug.tsv" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "import csv\n", 63 | "\n", 64 | "# Load entity file\n", 65 | "drug_list = []\n", 66 | "with open(\"./infer_drug.tsv\", newline='', encoding='utf-8') as csvfile:\n", 67 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['drug','ids'])\n", 68 | " for row_val in reader:\n", 69 | " drug_list.append(row_val['drug'])" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 3, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "8104" 81 | ] 82 | }, 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "len(drug_list)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## Inhibits relation" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "One inhibit relation in this context" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 4, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "treatment = ['GNBR::N::Compound:Gene']#'DRUGBANK::target::Compound:Gene','DGIDB::INHIBITOR::Gene:Compound']" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "## Get pretrained model\n", 120 | "We can directly use the pretrianed model to do drug repurposing." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 5, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "import pandas as pd\n", 130 | "import numpy as np\n", 131 | "import sys\n", 132 | "import csv\n", 133 | "sys.path.insert(1, '../utils')\n", 134 | "from utils import download_and_extract\n", 135 | "download_and_extract()" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 6, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "entity_idmap_file = '../data/drkg/embed/entities.tsv'\n", 145 | "relation_idmap_file = '../data/drkg/embed/relations.tsv'" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "## Get embeddings for genes and drugs" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 7, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "# Get drugname/disease name to entity ID mappings\n", 162 | "entity_map = {}\n", 163 | "entity_id_map = {}\n", 164 | "relation_map = {}\n", 165 | "with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:\n", 166 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n", 167 | " for row_val in reader:\n", 168 | " entity_map[row_val['name']] = int(row_val['id'])\n", 169 | " entity_id_map[int(row_val['id'])] = row_val['name']\n", 170 | " \n", 171 | "with open(relation_idmap_file, newline='', encoding='utf-8') as csvfile:\n", 172 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n", 173 | " for row_val in reader:\n", 174 | " relation_map[row_val['name']] = int(row_val['id'])\n", 175 | " \n", 176 | "# handle the ID mapping\n", 177 | "drug_ids = []\n", 178 | "gene_ids = []\n", 179 | "for drug in drug_list:\n", 180 | " drug_ids.append(entity_map[drug])\n", 181 | " \n", 182 | "for gene in cov_related_genes:\n", 183 | " gene_ids.append(entity_map[gene])\n", 184 | "\n", 185 | "treatment_rid = [relation_map[treat] for treat in treatment]" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 8, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "# Load embeddings\n", 195 | "import torch as th\n", 196 | "entity_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_entity.npy')\n", 197 | "rel_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_relation.npy')\n", 198 | "\n", 199 | "drug_ids = th.tensor(drug_ids).long()\n", 200 | "gene_ids = th.tensor(gene_ids).long()\n", 201 | "treatment_rid = th.tensor(treatment_rid)\n", 202 | "\n", 203 | "drug_emb = th.tensor(entity_emb[drug_ids])\n", 204 | "treatment_embs = [th.tensor(rel_emb[rid]) for rid in treatment_rid]" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "## Drug Repurposing Based on Edge Score\n", 212 | "We use following algorithm to calculate the edge score. Note, here we use logsigmiod to make all scores < 0. The larger the score is, the stronger the $h$ will have $r$ with $t$.\n", 213 | "\n", 214 | "$\\mathbf{d} = \\gamma - ||\\mathbf{h}+\\mathbf{r}-\\mathbf{t}||_{2}$\n", 215 | "\n", 216 | "$\\mathbf{score} = \\log\\left(\\frac{1}{1+\\exp(\\mathbf{-d})}\\right)$\n", 217 | "\n", 218 | "When doing drug repurposing, we only use the treatment related relations." 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 9, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "import torch.nn.functional as fn\n", 228 | "\n", 229 | "gamma=12.0\n", 230 | "def transE_l2(head, rel, tail):\n", 231 | " score = head + rel - tail\n", 232 | " return gamma - th.norm(score, p=2, dim=-1)\n", 233 | "\n", 234 | "scores_per_gene = []\n", 235 | "dids_per_gene = []\n", 236 | "for rid in range(len(treatment_embs)):\n", 237 | " treatment_emb=treatment_embs[rid]\n", 238 | " for gene_id in gene_ids:\n", 239 | " gene_emb = th.tensor(entity_emb[gene_id])\n", 240 | " if treatment[rid]=='DGIDB::INHIBITOR::Gene:Compound':\n", 241 | " score = fn.logsigmoid(transE_l2(gene_emb, treatment_emb,\n", 242 | " drug_emb))\n", 243 | " else:\n", 244 | " score = fn.logsigmoid(transE_l2(drug_emb, treatment_emb,\n", 245 | " gene_emb))\n", 246 | " scores_per_gene.append(score)\n", 247 | " dids_per_gene.append(drug_ids)\n", 248 | "scores = th.cat(scores_per_gene)\n", 249 | "dids = th.cat(dids_per_gene)\n" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "### Check clinical trial drugs per gene\n", 257 | "Here we load the clinical trial drugs" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 10, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "clinical_drugs_file = './COVID19_clinical_trial_drugs.tsv'\n", 267 | "clinical_drug_map = {}\n", 268 | "with open(clinical_drugs_file, newline='', encoding='utf-8') as csvfile:\n", 269 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['id', 'drug_name','drug_id'])\n", 270 | " for row_val in reader:\n", 271 | " clinical_drug_map[row_val['drug_id']] = row_val['drug_name']" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "Next we measure some statistics per gene." 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 13, 284 | "metadata": { 285 | "scrolled": true 286 | }, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "Gene::6441\t9\t\n", 293 | "[0]Dexamethasone\n", 294 | "[29]Methylprednisolone\n", 295 | "[30]Ribavirin\n", 296 | "[40]Thalidomide\n", 297 | "[46]Chloroquine\n", 298 | "[77]Losartan\n", 299 | "[86]Sargramostim\n", 300 | "[88]Azithromycin\n", 301 | "[90]Hydroxychloroquine\n", 302 | "\n", 303 | "DB01234\tDexamethasone\t401\t17.424322932617844\n", 304 | "DB01041\tThalidomide\t336\t9.52602832899466\n", 305 | "DB00608\tChloroquine\t258\t5.281556104219857\n", 306 | "DB00746\tDeferoxamine\t111\t2.3803197362314727\n", 307 | "DB01394\tColchicine\t108\t1.9397152439066307\n", 308 | "DB00959\tMethylprednisolone\t105\t1.6800667504790185\n", 309 | "DB00678\tLosartan\t92\t1.9905909204249115\n", 310 | "DB00811\tRibavirin\t92\t2.0302922908647756\n", 311 | "DB08877\tRuxolitinib\t47\t0.7744534092963637\n", 312 | "DB08895\tTofacitinib\t33\t0.46233716095307054\n", 313 | "DB01611\tHydroxychloroquine\t14\t0.20167135495496702\n", 314 | "DB05511\tPiclidenoson\t6\t0.1513038675225646\n", 315 | "DB00207\tAzithromycin\t5\t0.05829492617697397\n", 316 | "DB00198\tOseltamivir\t1\t0.2\n", 317 | "DB00020\tSargramostim\t1\t0.011494252873563218\n" 318 | ] 319 | } 320 | ], 321 | "source": [ 322 | "maxhit=0\n", 323 | "drugs_in_top_k={}\n", 324 | "drugsfr_in_top_k={}\n", 325 | "for i in range(len(scores_per_gene)):\n", 326 | " score=scores_per_gene[i]\n", 327 | " did=dids_per_gene[i]\n", 328 | " idx = th.flip(th.argsort(score), dims=[0])\n", 329 | " score = score[idx].numpy()\n", 330 | " did = did[idx].numpy()\n", 331 | " #print(did)\n", 332 | " _, unique_indices = np.unique(did, return_index=True)\n", 333 | " topk=100\n", 334 | " topk_indices = np.sort(unique_indices)[:topk]\n", 335 | " proposed_did = did[topk_indices]\n", 336 | " proposed_score = score[topk_indices]\n", 337 | " found_in_top_k=0\n", 338 | " found_drugs=\"\\n\"\n", 339 | " for j in range(topk):\n", 340 | " drug = entity_id_map[int(proposed_did[j])][10:17]\n", 341 | " if clinical_drug_map.get(drug, None) is not None:\n", 342 | " found_in_top_k+=1\n", 343 | " score = proposed_score[j]\n", 344 | " if drug in drugs_in_top_k:\n", 345 | " drugs_in_top_k[drug]+=1\n", 346 | " drugsfr_in_top_k[drug]+=1/(j+1)\n", 347 | " else:\n", 348 | " drugs_in_top_k[drug]=1\n", 349 | " drugsfr_in_top_k[drug]=1/(j+1)\n", 350 | " found_drugs+=\"[{}]{}\\n\".format(j, clinical_drug_map[drug])\n", 351 | " #print(\"[{}]{}\".format(j, clinical_drug_map[drug]))\n", 352 | " #print(\"{}\\t{}\".format(cov_related_genes[i], found_in_top_k))\n", 353 | " if maxhit< found_in_top_k:\n", 354 | " maxhit=found_in_top_k\n", 355 | " maxgene=cov_related_genes[i]\n", 356 | " max_dugs=found_drugs\n", 357 | "print(\"{}\\t{}\\t{}\".format(maxgene, maxhit,max_dugs))\n", 358 | "\n", 359 | "res=[[drug, clinical_drug_map[drug] ,drugs_in_top_k[drug],drugsfr_in_top_k[drug]] for drug in drugs_in_top_k.keys()]\n", 360 | "res=reversed(sorted(res, key=lambda x : x[2]))\n", 361 | "for drug in res:\n", 362 | " print(\"{}\\t{}\\t{}\\t{}\".format(drug[0], drug[1] ,drug[2],drug[3]))\n", 363 | " " 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": {}, 370 | "outputs": [], 371 | "source": [] 372 | } 373 | ], 374 | "metadata": { 375 | "kernelspec": { 376 | "display_name": "Python 3", 377 | "language": "python", 378 | "name": "python3" 379 | }, 380 | "language_info": { 381 | "codemirror_mode": { 382 | "name": "ipython", 383 | "version": 3 384 | }, 385 | "file_extension": ".py", 386 | "mimetype": "text/x-python", 387 | "name": "python", 388 | "nbconvert_exporter": "python", 389 | "pygments_lexer": "ipython3", 390 | "version": "3.7.6" 391 | } 392 | }, 393 | "nbformat": 4, 394 | "nbformat_minor": 4 395 | } 396 | -------------------------------------------------------------------------------- /drug_repurpose/COVID19_clinical_trial_drugs.tsv: -------------------------------------------------------------------------------- 1 | 1 Deferoxamine DB00746 2 | 2 Piclidenoson DB05511 3 | 3 Losartan DB00678 4 | 4 Ibuprofen DB01050 5 | 5 Favipiravir DB12466 6 | 6 Ruxolitinib DB08877 7 | 7 Dexamethasone DB01234 8 | 8 Thalidomide DB01041 9 | 9 Tranexamic acid DB00302 10 | 10 Tocilizumab DB06273 11 | 11 Sarilumab DB11767 12 | 12 Tradipitant DB12580 13 | 13 Angiotensin 1-7 DB11720 14 | 14 Oseltamivir DB00198 15 | 15 Baricitinib DB11817 16 | 16 Sargramostim DB00020 17 | 17 Chloroquine DB00608 18 | 18 Anakinra DB00026 19 | 19 Mavrilimumab DB12534 20 | 20 Azithromycin DB00207 21 | 21 Tetrandrine DB14066 22 | 22 Ribavirin DB00811 23 | 23 Tofacitinib DB08895 24 | 24 Siltuximab DB09036 25 | 25 Nivolumab DB09035 26 | 26 Nitric Oxide DB00435 27 | 27 Colchicine DB01394 28 | 28 Remdesivir DB14761 29 | 29 Hydroxychloroquine DB01611 30 | 30 Eculizumab DB01257 31 | 31 Methylprednisolone DB00959 32 | 32 Bevacizumab DB00112 -------------------------------------------------------------------------------- /drug_repurpose/Readme.md: -------------------------------------------------------------------------------- 1 | # Drug Repurposing Examples 2 | Here we show how to do drug repurposing using pretrained knowledge graph embedding of DRKG. 3 | 4 | ## Drug Repurposing for Covid-19 via Disease 5 | [COVID-19_drug_repurposing.ipynb](COVID-19_drug_repurposing.ipynb) shows how to do drug repurposing for Covid-19 by predicting links between the disease entities and the drug entitites in the DRKG. The target disease entities are listed in the notebook and the candidate drug entities are listed in [infer_drug.tsv](infer_drug.tsv). The drugs are all from Drugbank and we exclude drugs with molecule weight less than 250 daltons which results in 8104 candidates. Two edge types are chosen here: Hetionet::CtD::Compound:Disease' and 'GNBR::T::Compound:Disease, which represent the treatment relationship between a certain drug for a disease. To evaluate the repurposed drugs, we compare them with the clinical drugs as there is no treatment for Covid-19 right now. The list of clinical drugs are shown in [COVID19_clinical_trial_drugs.tsv](COVID19_clinical_trial_drugs.tsv) which is collected from http://www.covid19-trials.com/. 6 | 7 | ## Drug Repurposing for Covid-19 via Disease Related Host Genes 8 | [COVID-19_drug_repurposing_via_genes.ipynb](COVID-19_drug_repurposing_via_genes.ipynb) shows how to do drug repurposing for Covid-19 by predicting links between the disease related host gene entities and the drug entities in the DRKG. The target host gene entities are listed in [covid19-host-genes.tsv](covid19-host-genes.tsv) and [coronavirus-related-host-genes.tsv](coronavirus-related-host-genes.tsv). The drugs are all from Drugbank and we exclude drugs with molecule weight less than 250 daltons which results in 8104 candidates. The edge type used here is GNBR::N::Compound:Gene. One can also use DRUGBANK::target::Compound:Gene, DGIDB::INHIBITOR::Gene:Compound or combination of them. To evaluate the repurposed drugs, we also compare them with the clinical drugs. 9 | -------------------------------------------------------------------------------- /drug_repurpose/coronavirus-related-host-genes.tsv: -------------------------------------------------------------------------------- 1 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2931 2 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2932 3 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::26986 4 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8761 5 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178 6 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3181 7 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4869 8 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10146 9 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9908 10 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6143 11 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::142 12 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4691 13 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1653 14 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23429 15 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5478 16 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::408050 17 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2280 18 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9360 19 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4140 20 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10465 21 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11123 22 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9146 23 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7917 24 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23564 25 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::819 26 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::25978 27 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::116841 28 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23609 29 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7177 30 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::710 31 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7681 32 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5683 33 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::81926 34 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5204 35 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::57409 36 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4705 37 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::79001 38 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::81887 39 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::55506 40 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6224 41 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11200 42 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7013 43 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10540 44 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1655 45 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::119710 46 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8665 47 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1915 48 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::857 49 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3551 50 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7329 51 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6449 52 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9550 53 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::689 54 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::22809 55 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4539 56 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4513 57 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178 58 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272 59 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::339390 60 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::30835 61 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10332 62 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3661 63 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3838 64 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6441 65 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5478 66 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::598 67 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::599 68 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4170 69 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::597 70 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::596 71 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6502 72 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3840 73 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8575 74 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::928 75 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7113 76 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3551 77 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::290 78 | Disease::MESH:D058957 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272 79 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::85437 80 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1803 81 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178 82 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10492 83 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5725 84 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::634 85 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::85437 86 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272 87 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::302 88 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3181 89 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::220988 90 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::50 91 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3337 92 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3329 93 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3313 94 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9276 95 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23521 96 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3646 97 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8668 98 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4836 99 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::128866 100 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8665 101 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8729 102 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6241 103 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3832 104 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5707 105 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6729 106 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::84955 107 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4666 108 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::51429 109 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::689 110 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23256 111 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5701 112 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7942 113 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11344 114 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10652 115 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3837 116 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6811 117 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6776 118 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::51187 119 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::91452 120 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5245 121 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3725 122 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6774 123 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5499 124 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::92521 125 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10875 126 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7514 127 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7040 128 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10875 129 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4088 130 | -------------------------------------------------------------------------------- /drug_repurpose/covid19-host-genes.tsv: -------------------------------------------------------------------------------- 1 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8546 2 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23476 3 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6046 4 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10283 5 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::124245 6 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57153 7 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9512 8 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10897 9 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::481 10 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::34 11 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2108 12 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2040 13 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2677 14 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::523 15 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5714 16 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7905 17 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23203 18 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::196527 19 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10531 20 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10463 21 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::60493 22 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::148867 23 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10426 24 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79934 25 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::113174 26 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::92840 27 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::92105 28 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::89874 29 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10844 30 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80222 31 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57142 32 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51439 33 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10157 34 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26993 35 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25980 36 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::28969 37 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9136 38 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26986 39 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1459 40 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1460 41 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10146 42 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8761 43 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23367 44 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25940 45 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79753 46 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5976 47 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4343 48 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9908 49 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9188 50 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55131 51 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25873 52 | Disease::SARS-CoV2 Spike bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51125 53 | Disease::SARS-CoV2 Spike bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25921 54 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5422 55 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5557 56 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5558 57 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23649 58 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79709 59 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5318 60 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::161 61 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2671 62 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57222 63 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1173 64 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80273 65 | Disease::SARS-CoV2 nsp11 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6902 66 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55206 67 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10295 68 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10270 69 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23077 70 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10569 71 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8737 72 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9898 73 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::219743 74 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51248 75 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84726 76 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55833 77 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::24149 78 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64784 79 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::159195 80 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29066 81 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23185 82 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55285 83 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6938 84 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::53938 85 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54477 86 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9755 87 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11113 88 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3281 89 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5116 90 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11116 91 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5576 92 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5566 93 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5577 94 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5962 95 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1063 96 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7088 97 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7090 98 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::166 99 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2802 100 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2801 101 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2804 102 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56850 103 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9857 104 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9659 105 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9662 106 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23177 107 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11064 108 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23085 109 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9648 110 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79745 111 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51199 112 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::201134 113 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::145282 114 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8975 115 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79571 116 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::152789 117 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55755 118 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10142 119 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64689 120 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79443 121 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79078 122 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11190 123 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29110 124 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51361 125 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22981 126 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2717 127 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3615 128 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23408 129 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10204 130 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::382 131 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10193 132 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11001 133 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9470 134 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5447 135 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5910 136 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23325 137 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23307 138 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26058 139 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3416 140 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26519 141 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::440138 142 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23225 143 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::90580 144 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55735 145 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26515 146 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26520 147 | Disease::SARS-CoV2 nsp5 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3066 148 | Disease::SARS-CoV2 nsp5_C145A bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2876 149 | Disease::SARS-CoV2 nsp5_C145A bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55621 150 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10632 151 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::537 152 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10280 153 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79572 154 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8540 155 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80777 156 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2181 157 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1727 158 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5898 159 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1312 160 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5878 161 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7879 162 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4218 163 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5862 164 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10890 165 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51552 166 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::387 167 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5861 168 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2782 169 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2787 170 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10960 171 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7841 172 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26092 173 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64757 174 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::169714 175 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9653 176 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::91942 177 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51097 178 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::949 179 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57106 180 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79877 181 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26355 182 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::131118 183 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55829 184 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80142 185 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22931 186 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10199 187 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6731 188 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11101 189 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7468 190 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6728 191 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6729 192 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64432 193 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1662 194 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51574 195 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56257 196 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25983 197 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11340 198 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79731 199 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79954 200 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79080 201 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51091 202 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56915 203 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51010 204 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26574 205 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25831 206 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51116 207 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64969 208 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23404 209 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23107 210 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2963 211 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2200 212 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2201 213 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8021 214 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23636 215 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10238 216 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7458 217 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::53371 218 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57534 219 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23111 220 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::91754 221 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84858 222 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4927 223 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9818 224 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27430 225 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10516 226 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5538 227 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8453 228 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55700 229 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79178 230 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79699 231 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26521 232 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9978 233 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6921 234 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6923 235 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3162 236 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::286827 237 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::151188 238 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23339 239 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23155 240 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55823 241 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25777 242 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29880 243 | Disease::SARS-CoV2 orf3b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::30968 244 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4928 245 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8480 246 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23787 247 | Disease::SARS-CoV2 orf7a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55027 248 | Disease::SARS-CoV2 orf7a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23195 249 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5352 250 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1861 251 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8614 252 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5327 253 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3688 254 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::284106 255 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1291 256 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5817 257 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1786 258 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4015 259 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5046 260 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::83729 261 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10577 262 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4240 263 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10956 264 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4884 265 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79070 266 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::143888 267 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56605 268 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23646 269 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80020 270 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79586 271 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::126789 272 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23065 273 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8836 274 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27248 275 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23765 276 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55768 277 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::90161 278 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6388 279 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4758 280 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9518 281 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80213 282 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23071 283 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80267 284 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64374 285 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23509 286 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64093 287 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79666 288 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54850 289 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55757 290 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54480 291 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9510 292 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10525 293 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51661 294 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8754 295 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::60681 296 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9368 297 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27243 298 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7812 299 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9868 300 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4140 301 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2011 302 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51611 303 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79077 304 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4139 305 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::58155 306 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9529 307 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7993 308 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8733 309 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7466 310 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4363 311 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2150 312 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22937 313 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23333 314 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27346 315 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55676 316 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::202018 317 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79956 318 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79671 319 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::162427 320 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84720 321 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55711 322 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51295 323 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79053 324 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55116 325 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27069 326 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::28976 327 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54602 328 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::617 329 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51103 330 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::50999 331 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4715 332 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::94005 333 | -------------------------------------------------------------------------------- /drugbank_info/README.md: -------------------------------------------------------------------------------- 1 | # Information about Compounds from DrugBank 2 | 3 | The KG currently has 9708 compounds from DrugBank. DB05697, DB06517, and DB15351 have been deprecated from DrugBank 4 | in the sense that there are no associated pages. DB02709 is a duplicate of DB05073 and should be merged 5 | 6 | For the rest 9704 ids, 8968 are characterized as "small-molecule" typed drugs in DrugBank and 736 are 7 | characterized as "Biotech" typed drugs in DrugBank. A full list of them are separately stored in `drugbank_smiles.txt` 8 | and `drugbank_biotech.txt`. 9 | 10 | For the 8968 "small-molecule" typed drugs, we manage to extract SMILES 11 | (The simplified molecular-input line-entry system) for 8807 of them from DrugBank, PubChem, KEGG, ChEMBL, and 12 | ChemSpider. The SMILES can be found in `drugbank_smiles.txt`. For the rest "small-molecule" typed drugs, SMILES is not 13 | available either because their structural information is not available or because they are substances of multiple 14 | ingredients. 15 | 16 | We also include the weight of compounds available in DrugBank in `drugbank_weight.txt`. For compounds whose weight 17 | information is missing, we use -1 for a placeholder. The weight information is available for 7158 compounds 18 | and missing for 2547 compounds. 19 | -------------------------------------------------------------------------------- /drugbank_info/drugbank_biotech.txt: -------------------------------------------------------------------------------- 1 | DB00001 2 | DB00002 3 | DB00003 4 | DB00004 5 | DB00005 6 | DB00008 7 | DB00009 8 | DB00010 9 | DB00011 10 | DB00012 11 | DB00013 12 | DB00015 13 | DB00016 14 | DB00017 15 | DB00018 16 | DB00019 17 | DB00020 18 | DB00022 19 | DB00023 20 | DB00024 21 | DB00025 22 | DB00026 23 | DB00028 24 | DB00029 25 | DB00030 26 | DB00031 27 | DB00032 28 | DB00033 29 | DB00034 30 | DB00036 31 | DB00038 32 | DB00039 33 | DB00040 34 | DB00041 35 | DB00042 36 | DB00043 37 | DB00044 38 | DB00045 39 | DB00046 40 | DB00047 41 | DB00048 42 | DB00049 43 | DB00051 44 | DB00052 45 | DB00053 46 | DB00054 47 | DB00055 48 | DB00056 49 | DB00057 50 | DB00058 51 | DB00059 52 | DB00060 53 | DB00061 54 | DB00062 55 | DB00063 56 | DB00065 57 | DB00066 58 | DB00068 59 | DB00069 60 | DB00070 61 | DB00071 62 | DB00072 63 | DB00073 64 | DB00074 65 | DB00075 66 | DB00076 67 | DB00078 68 | DB00081 69 | DB00082 70 | DB00083 71 | DB00085 72 | DB00086 73 | DB00087 74 | DB00088 75 | DB00089 76 | DB00090 77 | DB00092 78 | DB00094 79 | DB00095 80 | DB00097 81 | DB00098 82 | DB00099 83 | DB00100 84 | DB00102 85 | DB00103 86 | DB00105 87 | DB00107 88 | DB00108 89 | DB00109 90 | DB00110 91 | DB00111 92 | DB00112 93 | DB00113 94 | DB01257 95 | DB01269 96 | DB01270 97 | DB01271 98 | DB01272 99 | DB01276 100 | DB01277 101 | DB01279 102 | DB01281 103 | DB01285 104 | DB01306 105 | DB01307 106 | DB01309 107 | DB01363 108 | DB01381 109 | DB03404 110 | DB04895 111 | DB04897 112 | DB04899 113 | DB04900 114 | DB04901 115 | DB04907 116 | DB04914 117 | DB04919 118 | DB04925 119 | DB04932 120 | DB04949 121 | DB04950 122 | DB04956 123 | DB04958 124 | DB04959 125 | DB04962 126 | DB04964 127 | DB04973 128 | DB04985 129 | DB04988 130 | DB05006 131 | DB05084 132 | DB05097 133 | DB05099 134 | DB05101 135 | DB05110 136 | DB05111 137 | DB05121 138 | DB05136 139 | DB05139 140 | DB05144 141 | DB05164 142 | DB05202 143 | DB05209 144 | DB05218 145 | DB05222 146 | DB05254 147 | DB05258 148 | DB05259 149 | DB05276 150 | DB05299 151 | DB05304 152 | DB05305 153 | DB05306 154 | DB05311 155 | DB05319 156 | DB05321 157 | DB05325 158 | DB05332 159 | DB05336 160 | DB05337 161 | DB05386 162 | DB05396 163 | DB05405 164 | DB05429 165 | DB05437 166 | DB05459 167 | DB05471 168 | DB05481 169 | DB05496 170 | DB05530 171 | DB05545 172 | DB05550 173 | DB05555 174 | DB05578 175 | DB05595 176 | DB05656 177 | DB05675 178 | DB05679 179 | DB05685 180 | DB05718 181 | DB05739 182 | DB05758 183 | DB05771 184 | DB05773 185 | DB05777 186 | DB05787 187 | DB05793 188 | DB05794 189 | DB05797 190 | DB05798 191 | DB05819 192 | DB05829 193 | DB05848 194 | DB05860 195 | DB05867 196 | DB05879 197 | DB05883 198 | DB05889 199 | DB05892 200 | DB05895 201 | DB05906 202 | DB05915 203 | DB05916 204 | DB05941 205 | DB05942 206 | DB05945 207 | DB05996 208 | DB06014 209 | DB06022 210 | DB06043 211 | DB06049 212 | DB06050 213 | DB06081 214 | DB06101 215 | DB06116 216 | DB06162 217 | DB06168 218 | DB06171 219 | DB06186 220 | DB06192 221 | DB06205 222 | DB06226 223 | DB06241 224 | DB06245 225 | DB06271 226 | DB06273 227 | DB06285 228 | DB06304 229 | DB06305 230 | DB06310 231 | DB06317 232 | DB06318 233 | DB06322 234 | DB06324 235 | DB06342 236 | DB06343 237 | DB06356 238 | DB06360 239 | DB06366 240 | DB06371 241 | DB06372 242 | DB06379 243 | DB06400 244 | DB06404 245 | DB06423 246 | DB06467 247 | DB06474 248 | DB06475 249 | DB06492 250 | DB06495 251 | DB06503 252 | DB06511 253 | DB06534 254 | DB06550 255 | DB06552 256 | DB06557 257 | DB06560 258 | DB06584 259 | DB06599 260 | DB06602 261 | DB06606 262 | DB06607 263 | DB06611 264 | DB06612 265 | DB06642 266 | DB06643 267 | DB06647 268 | DB06650 269 | DB06655 270 | DB06662 271 | DB06674 272 | DB06679 273 | DB06681 274 | DB06688 275 | DB06692 276 | DB06719 277 | DB06720 278 | DB06752 279 | DB06759 280 | DB06760 281 | DB06761 282 | DB06773 283 | DB08870 284 | DB08876 285 | DB08879 286 | DB08885 287 | DB08886 288 | DB08888 289 | DB08894 290 | DB08898 291 | DB08900 292 | DB08902 293 | DB08904 294 | DB08935 295 | DB09005 296 | DB09029 297 | DB09033 298 | DB09035 299 | DB09036 300 | DB09037 301 | DB09043 302 | DB09045 303 | DB09046 304 | DB09051 305 | DB09052 306 | DB09057 307 | DB09066 308 | DB09077 309 | DB09098 310 | DB09100 311 | DB09103 312 | DB09105 313 | DB09107 314 | DB09108 315 | DB09109 316 | DB09113 317 | DB09122 318 | DB09126 319 | DB09141 320 | DB09208 321 | DB09222 322 | DB09228 323 | DB09264 324 | DB09302 325 | DB09303 326 | DB09310 327 | DB09312 328 | DB09328 329 | DB09329 330 | DB09331 331 | DB09334 332 | DB09456 333 | DB09527 334 | DB09532 335 | DB09559 336 | DB09564 337 | DB10062 338 | DB10076 339 | DB10276 340 | DB10283 341 | DB10317 342 | DB10318 343 | DB10342 344 | DB10343 345 | DB10349 346 | DB10351 347 | DB10353 348 | DB10365 349 | DB10366 350 | DB10370 351 | DB10379 352 | DB10389 353 | DB10394 354 | DB10395 355 | DB10400 356 | DB10401 357 | DB10403 358 | DB10404 359 | DB10489 360 | DB10494 361 | DB10532 362 | DB10583 363 | DB10584 364 | DB10600 365 | DB10675 366 | DB10676 367 | DB10713 368 | DB10714 369 | DB10715 370 | DB10716 371 | DB10717 372 | DB10719 373 | DB10769 374 | DB10770 375 | DB10772 376 | DB10794 377 | DB10803 378 | DB10804 379 | DB10805 380 | DB10836 381 | DB10879 382 | DB10949 383 | DB10962 384 | DB10963 385 | DB10964 386 | DB10966 387 | DB10982 388 | DB10989 389 | DB10990 390 | DB11003 391 | DB11038 392 | DB11040 393 | DB11041 394 | DB11044 395 | DB11050 396 | DB11051 397 | DB11065 398 | DB11066 399 | DB11095 400 | DB11097 401 | DB11113 402 | DB11131 403 | DB11166 404 | DB11198 405 | DB11199 406 | DB11294 407 | DB11300 408 | DB11311 409 | DB11312 410 | DB11330 411 | DB11338 412 | DB11358 413 | DB11563 414 | DB11567 415 | DB11568 416 | DB11569 417 | DB11571 418 | DB11572 419 | DB11580 420 | DB11595 421 | DB11597 422 | DB11598 423 | DB11601 424 | DB11603 425 | DB11604 426 | DB11606 427 | DB11607 428 | DB11608 429 | DB11621 430 | DB11624 431 | DB11626 432 | DB11627 433 | DB11631 434 | DB11639 435 | DB11646 436 | DB11657 437 | DB11661 438 | DB11668 439 | DB11680 440 | DB11685 441 | DB11714 442 | DB11715 443 | DB11727 444 | DB11731 445 | DB11746 446 | DB11756 447 | DB11767 448 | DB11771 449 | DB11776 450 | DB11803 451 | DB11826 452 | DB11834 453 | DB11840 454 | DB11849 455 | DB11850 456 | DB11854 457 | DB11856 458 | DB11857 459 | DB11862 460 | DB11866 461 | DB11884 462 | DB11914 463 | DB11930 464 | DB11945 465 | DB11959 466 | DB11972 467 | DB11976 468 | DB11988 469 | DB12023 470 | DB12034 471 | DB12053 472 | DB12056 473 | DB12077 474 | DB12089 475 | DB12090 476 | DB12102 477 | DB12104 478 | DB12118 479 | DB12119 480 | DB12142 481 | DB12152 482 | DB12157 483 | DB12159 484 | DB12169 485 | DB12182 486 | DB12189 487 | DB12202 488 | DB12205 489 | DB12213 490 | DB12240 491 | DB12246 492 | DB12250 493 | DB12258 494 | DB12261 495 | DB12274 496 | DB12281 497 | DB12296 498 | DB12317 499 | DB12331 500 | DB12335 501 | DB12342 502 | DB12344 503 | DB12363 504 | DB12366 505 | DB12374 506 | DB12386 507 | DB12396 508 | DB12409 509 | DB12413 510 | DB12431 511 | DB12456 512 | DB12489 513 | DB12498 514 | DB12520 515 | DB12525 516 | DB12530 517 | DB12534 518 | DB12560 519 | DB12568 520 | DB12584 521 | DB12589 522 | DB12609 523 | DB12656 524 | DB12683 525 | DB12688 526 | DB12698 527 | DB12699 528 | DB12701 529 | DB12718 530 | DB12726 531 | DB12734 532 | DB12747 533 | DB12768 534 | DB12773 535 | DB12775 536 | DB12797 537 | DB12807 538 | DB12814 539 | DB12815 540 | DB12820 541 | DB12826 542 | DB12844 543 | DB12845 544 | DB12849 545 | DB12872 546 | DB12891 547 | DB12893 548 | DB12909 549 | DB12917 550 | DB12943 551 | DB12950 552 | DB12976 553 | DB13007 554 | DB13017 555 | DB13037 556 | DB13045 557 | DB13073 558 | DB13127 559 | DB13133 560 | DB13140 561 | DB13144 562 | DB13147 563 | DB13148 564 | DB13149 565 | DB13150 566 | DB13151 567 | DB13152 568 | DB13161 569 | DB13173 570 | DB13189 571 | DB13192 572 | DB13193 573 | DB13194 574 | DB13195 575 | DB13196 576 | DB13197 577 | DB13198 578 | DB13199 579 | DB13200 580 | DB13201 581 | DB13241 582 | DB13260 583 | DB13263 584 | DB13281 585 | DB13287 586 | DB13375 587 | DB13521 588 | DB13535 589 | DB13646 590 | DB13761 591 | DB13864 592 | DB13881 593 | DB13884 594 | DB13886 595 | DB13892 596 | DB13893 597 | DB13894 598 | DB13896 599 | DB13897 600 | DB13898 601 | DB13899 602 | DB13900 603 | DB13901 604 | DB13902 605 | DB13903 606 | DB13906 607 | DB13915 608 | DB13923 609 | DB13924 610 | DB13926 611 | DB13932 612 | DB13933 613 | DB13961 614 | DB13975 615 | DB13976 616 | DB13979 617 | DB13998 618 | DB13999 619 | DB14004 620 | DB14009 621 | DB14012 622 | DB14016 623 | DB14022 624 | DB14039 625 | DB14040 626 | DB14041 627 | DB14042 628 | DB14211 629 | DB14240 630 | DB14256 631 | DB14337 632 | DB14384 633 | DB14385 634 | DB14394 635 | DB14409 636 | DB14443 637 | DB14445 638 | DB14449 639 | DB14473 640 | DB14562 641 | DB14580 642 | DB14582 643 | DB14585 644 | DB14597 645 | DB14619 646 | DB14620 647 | DB14685 648 | DB14700 649 | DB14707 650 | DB14711 651 | DB14712 652 | DB14713 653 | DB14724 654 | DB14730 655 | DB14731 656 | DB14738 657 | DB14740 658 | DB14749 659 | DB14751 660 | DB14762 661 | DB14776 662 | DB14778 663 | DB14784 664 | DB14809 665 | DB14811 666 | DB14824 667 | DB14843 668 | DB14864 669 | DB14865 670 | DB14871 671 | DB14877 672 | DB14891 673 | DB14897 674 | DB14905 675 | DB14907 676 | DB14908 677 | DB14919 678 | DB14947 679 | DB14952 680 | DB14959 681 | DB14962 682 | DB14967 683 | DB14988 684 | DB14997 685 | DB14999 686 | DB15014 687 | DB15022 688 | DB15044 689 | DB15045 690 | DB15066 691 | DB15076 692 | DB15089 693 | DB15090 694 | DB15093 695 | DB15095 696 | DB15101 697 | DB15104 698 | DB15112 699 | DB15113 700 | DB15118 701 | DB15119 702 | DB15131 703 | DB15135 704 | DB15160 705 | DB15171 706 | DB15172 707 | DB15217 708 | DB15225 709 | DB15252 710 | DB15253 711 | DB15263 712 | DB15271 713 | DB15274 714 | DB15277 715 | DB15290 716 | DB15332 717 | DB15336 718 | DB15349 719 | DB15354 720 | DB15363 721 | DB15383 722 | DB15397 723 | DB15409 724 | DB15415 725 | DB15428 726 | DB15432 727 | DB15441 728 | DB15443 729 | DB15453 730 | DB15461 731 | DB15478 732 | DB15483 733 | DB15528 734 | DB15559 735 | DB15593 736 | DB15595 737 | -------------------------------------------------------------------------------- /embedding_analysis/Edge_similarity_based_on_link_recommendation_results.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DRKG Relation Similarity Analysis based on link recommendations\n", 8 | "\n", 9 | "This notebook performs an similarity analysis of different link types in the DRKG based on their recommendation outcome. Speciffically, for a certain node we predict the K most similar neighbors for a certain link type. Then we repeat this prediction for all link types. Link types that have a significant overlap of predicted neighbors will be more similar." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 3, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import numpy as np\n", 20 | "import os\n", 21 | "import csv\n", 22 | "import sys\n", 23 | "import torch as th\n", 24 | "sys.path.insert(1, '../utils')\n", 25 | "from utils import download_and_extract\n", 26 | "download_and_extract()" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "Define the function used for scoring the edges. This should cooincide with the function used to learn the embeddinds." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 4, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "def transE_l2(head, rel, tail):\n", 43 | " gamma=12.0\n", 44 | " score = head + rel - tail\n", 45 | " return gamma - th.norm(score, p=2, dim=-1)" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Loading Mapping files\n", 53 | "\n", 54 | "Load the mapping files that give the ids used by the embedding models for the corresponding DRKG id. Load the entity and relation embeddings as well. Change input files and embedding files." 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 6, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "97238\n", 67 | "107\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "# folders holding data\n", 73 | "folder_with_training_part='../data/drkg/embed/'\n", 74 | "folder_with_embeddings='../data/drkg/embed/'\n", 75 | "\n", 76 | "ids = []\n", 77 | "entity2id = {}\n", 78 | "with open(folder_with_training_part+\"entities.tsv\", newline='', encoding='utf-8') as csvfile:\n", 79 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=[ 'entity','id'])\n", 80 | " for row_val in reader:\n", 81 | " id = row_val['id']\n", 82 | "\n", 83 | " entity2id[row_val['entity']] = int(id)\n", 84 | "\n", 85 | "print(len(entity2id))\n", 86 | "\n", 87 | "rel2id = {}\n", 88 | "with open(folder_with_training_part+\"relations.tsv\", newline='', encoding='utf-8') as csvfile:\n", 89 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['entity','id'])\n", 90 | " for row_val in reader:\n", 91 | " id = row_val['id']\n", 92 | "\n", 93 | " rel2id[row_val['entity']] = int(id)\n", 94 | "\n", 95 | "print(len(rel2id))\n", 96 | "\n", 97 | "node_emb = np.load(folder_with_embeddings+'DRKG_TransE_l2_entity.npy')\n", 98 | "rel_emb = np.load(folder_with_embeddings+'DRKG_TransE_l2_relation.npy')" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## Loading triplets\n", 106 | "\n", 107 | "Load triplets and map them to DRKG id space" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "head_ids = []\n", 117 | "rel_ids = []\n", 118 | "tail_ids = []\n", 119 | "p0_rows = []\n", 120 | "folder_with_training_part\n", 121 | "with open(\"../data/drkg/drkg.tsv\", newline='', encoding='utf-8') as csvfile:\n", 122 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['head', 'rel', 'tail'])\n", 123 | " for row_val in reader:\n", 124 | " head = row_val['head']\n", 125 | " rel = row_val['rel']\n", 126 | " tail = row_val['tail']\n", 127 | "\n", 128 | " head_id = entity2id[head]\n", 129 | " rel_id = rel2id[rel]\n", 130 | " tail_id = entity2id[tail]\n", 131 | " \n", 132 | " head_ids.append(head_id)\n", 133 | " rel_ids.append(rel_id)\n", 134 | " tail_ids.append(tail_id)\n", 135 | " p0_rows.append((head, rel, tail))\n", 136 | " \n", 137 | "head_ids = np.array(head_ids)\n", 138 | "rel_ids = np.array(rel_ids)\n", 139 | "tail_ids = np.array(tail_ids)\n", 140 | "triple_ids = np.arange(head_ids.shape[0])" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "## Link prediction\n", 148 | "\n", 149 | "Specify number of seed nodes to select for link prediction" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "scores={}\n", 159 | "L=100\n", 160 | "device = th.device('cpu')\n", 161 | "with th.no_grad():\n", 162 | " node_emb = th.tensor(node_emb).to(device)\n", 163 | " rel_emb = th.tensor(rel_emb).to(device)\n", 164 | " head_ids = th.tensor(head_ids).to(device)\n", 165 | " rel_ids = th.tensor(rel_ids).to(device)\n", 166 | " tail_ids = th.tensor(tail_ids).to(device)\n", 167 | "\n", 168 | " head_embedding = node_emb[head_ids]\n", 169 | " rel_embedding = rel_emb[rel_ids]\n", 170 | " tail_embedding = node_emb[tail_ids]\n", 171 | " # select L random heads\n", 172 | " \n", 173 | " perm = th.randperm(head_ids.shape[0])\n", 174 | " seeds = head_ids[perm[:L]]\n", 175 | " seed_heads = node_emb[seeds]" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "Predict the scores per link type among the selected seed nodes and all other nodes" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 7, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "for rel in rel2id.keys():\n", 192 | " rel_id=rel2id[rel]\n", 193 | " rel_embedding=((rel_emb[rel_id]).repeat(node_emb.shape[0],1))\n", 194 | " #print(rel_embedding.shape)\n", 195 | " scores[rel] =[transE_l2((seed_heads[i].repeat(node_emb.shape[0],1)), rel_embedding, node_emb) for i in range(seed_heads.shape[0])]# for i in range()]\n", 196 | " #print(scores[rel])\n", 197 | " " 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "### Top K link predicition\n", 205 | "Specify the number of top scoring neighbors to evaluate the proposed ovelap of link prediction." 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 8, 211 | "metadata": {}, 212 | "outputs": [], 213 | "source": [ 214 | "K=10\n", 215 | "top_neighbors={}\n", 216 | "for rel in scores.keys():\n", 217 | " top_neighbors[rel]=[th.argsort(score, descending=True)[:K] for score in scores[rel]]" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "## Overlap among predicted neighbors\n", 225 | "Calculate the overlap of predicted neighboring nodes for each per of relation types" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 9, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "overlap_of_predicted_neighbors=[]\n", 235 | "keys=list(scores.keys()) \n", 236 | "for i in range(len(keys)):\n", 237 | " for j in range(i+1,len(keys)):\n", 238 | " e1=keys[i]\n", 239 | " e2=keys[j]\n", 240 | " n_1=top_neighbors[e1]\n", 241 | " n_2=top_neighbors[e2]\n", 242 | " jacard=0\n", 243 | " for l in range(len(n_1)):\n", 244 | " n1=list(n_1[l].cpu().numpy())\n", 245 | " n2=list(n_2[l].cpu().numpy())\n", 246 | " jacard+=float(len(set(n1).intersection(set(n2)))/len(set(n1).union(set(n2))))\n", 247 | " jacard=jacard/len(n_1)\n", 248 | " overlap_of_predicted_neighbors.append([e1,e2,jacard])" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "Store sorted overlap file" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 10, 261 | "metadata": {}, 262 | "outputs": [], 263 | "source": [ 264 | "overlap_of_predicted_neighbors_sort=(sorted(overlap_of_predicted_neighbors,key=lambda x: float(x[2])))[::-1]\n", 265 | "\n", 266 | "overlap_of_predicted_neighbors_store=[\"{}\\t{}\\t{}\\n\".format(j[0], j[1], j[2]) for j in overlap_of_predicted_neighbors_sort]\n", 267 | "\n", 268 | "overlap_of_predicted_neighbors_store=[\"edge_type1\\tedge_type2\\tpercentage of overlapping predicted edges\\n\"]+overlap_of_predicted_neighbors_store\n", 269 | "entity_file = \"percentage_of_overlapping_predicted_edges_per_edge_pair\"+str(K)+\"v1.tsv\"\n", 270 | "with open(entity_file, 'w+') as f:\n", 271 | " f.writelines(overlap_of_predicted_neighbors_store)" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [] 280 | } 281 | ], 282 | "metadata": { 283 | "kernelspec": { 284 | "display_name": "Python 3", 285 | "language": "python", 286 | "name": "python3" 287 | }, 288 | "language_info": { 289 | "codemirror_mode": { 290 | "name": "ipython", 291 | "version": 3 292 | }, 293 | "file_extension": ".py", 294 | "mimetype": "text/x-python", 295 | "name": "python", 296 | "nbconvert_exporter": "python", 297 | "pygments_lexer": "ipython3", 298 | "version": "3.7.6" 299 | } 300 | }, 301 | "nbformat": 4, 302 | "nbformat_minor": 2 303 | } 304 | -------------------------------------------------------------------------------- /embedding_analysis/Readme.md: -------------------------------------------------------------------------------- 1 | # Knowledge Graph Embedding Based Analysis of DRKG 2 | We analyze the extracted DRKG by learning a TransE KGE model that utilizes the $\ell_2$ distance. As DRKG combines information from different data sources, we want to verify that meaningful entity and relation embeddings can be generated using knowledge graph embedding technology. 3 | 4 | ## Train Knowledge Graph Embedding 5 | Before doing the analysis, we need to train the knowledge graph embedding first. Here, we split the edge triplets in training, validation and test sets as follows 90%, 5%, and 5% and train the KGE model as shown in following notebook: 6 | 7 | - [Train_embeddings.ipynb](Train_embeddings.ipynb) 8 | 9 | ## Analyze the Relation Embedding Similarity 10 | We analyze the relation embedding similarity in [Relation_similarity_analysis.ipynb](Relation_similarity_analysis.ipynb). We first use t-SNE to map relation embedding to a 2D space to show the relation embedding distribution and then plot the pair-wise similarity between different edge relation types. 11 | 12 | ## Analyze the Entity Embedding Similarity 13 | We analyze the entity embedding similarity in [Entity_similarity_analysis.ipynb](Entity_similarity_analysis.ipynb). We first use t-SNE to map relation embedding to a 2D space to show the entity embedding distribution, then plot the embedding distribution of entities of Drugbank drugs, and finally we show the pair-wise similarity between different entities. 14 | 15 | ## Analyze Edge Score 16 | We analyze whether the learned KGE model can predict the edges of DRGK in [Edge_score_analysis.ipynb](Edge_score_analysis.ipynb). In order to avoid the possible bias of over-fitting the triplets in the training set, we split the whole DRKG into 10 equal folds and train 10 KGE models by picking each fold as the test set and the rest other nine folds are the training set. Following this, the score for each triplet is calculated while this triplet was in the test set. Then we show how edge scores distribute. 17 | 18 | ## Analyze Link Type Recommendation Similarity 19 | We analyze how similar are the predicted links among different relation types in [Edge_similarity_based_on_link_recommendation_results.ipynb](Edge_similarity_based_on_link_recommendation_results.ipynb). We evaluate how similar are the predicted links among different relation types. This task examines the similarity across relation types for the link prediction task. For seed node $n^{k}_i$ we find the top 10 neighbors under relation $r_j$ with the highest link prediction score. Next, we repeat the same prediction for relation $r_{j'}$ and calculate the Jaccard similarity coefficient among the predicted sets of top 10 neighbors for $r_j$ and $r_{j'}$. -------------------------------------------------------------------------------- /embedding_analysis/Relation_similarity_analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DRKG Relation Embedding Similarity Analysis\n", 8 | "This nodebook shows how to analyze the trained relation embeddings. \n", 9 | "\n", 10 | "In this example, we first load the trained embeddings and map them back into original relation names. And then apply three methodologies to analyze these embeddings:\n", 11 | " - Project the embeddings into low dimension space and visualize their distribution.\n", 12 | " - Use cosine distance to analyze the similarity between each relation.\n", 13 | " - Use frobenius distance to analyze the similarity between each relation." 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "import pandas as pd\n", 23 | "import numpy as np\n", 24 | "import os\n", 25 | "import csv" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Loading Relation ID Mapping" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "name": "stdout", 42 | "output_type": "stream", 43 | "text": [ 44 | "Number of relations: 107\n" 45 | ] 46 | } 47 | ], 48 | "source": [ 49 | "rel2id = {}\n", 50 | "id2rel = {}\n", 51 | "with open(\"./train/relations.tsv\", newline='', encoding='utf-8') as csvfile:\n", 52 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['rel','id'])\n", 53 | " for row_val in reader:\n", 54 | " id = row_val['id']\n", 55 | " relation = row_val['rel']\n", 56 | "\n", 57 | " rel2id[relation] = int(id)\n", 58 | " id2rel[int(id)] = relation\n", 59 | "\n", 60 | "print(\"Number of relations: {}\".format(len(rel2id)))" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "## Loading Relation Embeddings" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "(107, 400)\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "rel_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_relation.npy')\n", 85 | "print(rel_emb.shape)" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "## General Relation Embedding Clustering\n", 93 | "Here we use t-SNE to convert relation embeddings into low dimension space and visualize their distribution." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 9, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "name": "stderr", 103 | "output_type": "stream", 104 | "text": [ 105 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n", 106 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n", 107 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n", 108 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n" 109 | ] 110 | }, 111 | { 112 | "data": { 113 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdUAAAD4CAYAAAC6/HyrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de0CUVf4/8PeZGe4gchNk5CLpyEUMEskss3Iz2zIt16xdtWzF3da+baWVfd2srXWzbxfb0rbUdNFyV+1qbj+zMt3KvICgpCC6pCCCV+SqwMxzfn/gICj3eWaeGXi//rF55plnzswuvDnnPOdzhJQSREREZDud1g0gIiLqLhiqREREKmGoEhERqYShSkREpBKGKhERkUoMWrxpcHCwjI6O1uKtiYhcVmZm5mkpZYjW7aDWaRKq0dHRyMjI0OKtiYhclhDiqNZtoLZx+JeIiEglDFUiIiKVMFSJiIhUosmcKhERqSMzM7OPwWBYDmAw2FGyNwXAT2azecbQoUNPtnQCQ5WIyIUZDIblYWFhcSEhIWU6nY7F3O1IURRx6tSp+NLS0uUA7mrpHP5VQ0Tk2gaHhIRUMFDtT6fTyZCQkHI0jAq0fI4D20NEbSnaBXz3WsO/RB2nY6A6zsXvutXs5PAvkTMo2gWk3wVY6gC9O/DABiAiVetWEVEnsadK5AyOfNcQqNLS8O+R77RuEVGHHTx40H3gwIEJlx+fPHlyVGZmpqcWbdIKe6pEziB6ZEMP1dpTjR6pdYuIbLZ27VpVKkCZzWYYDK4RV+ypEjmDiNSGId9b5nHol+zu+8OnfV7elBf2/eHTPmpd02w245577ok2mUzxY8eOjamsrNSlpqYO+s9//uMNAO+++26gyWSKHzhwYMLDDz9stL7uN7/5TeTgwYPjBgwYkPD444+HW48bjcbEOXPm9B06dOigpUuXBg4ePDhu48aNfgAwa9Ys4//8z/8Yr2yF9lwj+ol6gohUhinZ3feHT/s89I/dJrNF0b33/c/KigeH5d8wILja1useOXLE89133z0yZsyY6kmTJkW/8sorIU2ec3v++eeNmZmZuSEhIeaRI0eaVq9e3Xvq1KnnXn/99eLQ0FCL2WzGiBEjBu3cudPr2muvPQ8Anp6eSmZm5kEASE1Nrbn33nuvqq+vL9qyZYt/VlZWrq1ttgf2VImcEe8EJjv54fBpP7NF0SkSMFsU3Q+HT/upcd2wsLC6MWPGVAPA1KlTz2zfvt3X+tz333/vM3z48Mrw8HCzm5sbJk+efHbbtm2+AJCenh4YHx8fFx8fH3/o0CHPvXv3Ns7BTps2rcz63ykpKRfuvffeM/fdd9+A5cuX/+zp6emUdzyzp0qkpaJdDTclRY+81EvlncBkR9cPCK587/ufFbNF0Rn0OuX6AcGValxXCNHqYylbzr+8vDz3xYsXh17swVomTpwYfeHChcbOnp+fn9L0/P3793v5+flZSkpK3ACcV6PdamNPlUgr1vDcsqDhX2uvtNmdwLXA1pfYYyXV3DAguHrFg8PyfzfqqmK1hn4BoKSkxP3rr7/2AYA1a9YEjhgxosr63I033li9c+dOv5KSEoPZbMb69esDb7rppqqysjK9l5eXEhgYaCkqKjJs3brVv7Xrp6en9z579qxhy5YteXPmzIk8ffq0Xo12q42hSqSV1pbRWO8EFjpAKsB/tzYPXVtwWJnQEKxPj40tVStQASAmJubCihUrgkwmU3xZWZlhzpw5p6zPRUVF1c+fP7941KhRpri4uIQhQ4bUTJky5dx11113fvDgwTUDBw5MmDp1avTQoUOrWrp2SUmJ4bnnnuuXnp5+ZMiQIbUzZsw4OXPmzAi12q4m0Vq33J5SUlIkNymnHq+tYd6iXQ091P9uBaAAQt9wZ/DI2fZ5P3IJQohMKWVK02N79+49cvXVV5/Wqk090d69e4Ovvvrq6JaeY0+VqD326t21tYwmIhW46RnA4NEQqF1Zu3p5uztTYII9WqIuUeVGJSHE4wBmAJAAcgBMl1JeUOPaRJqyd++urWU01tA98h3gFXQpBDvy/i21u6MFJtijJeoym3uqQggjgEcBpEgpBwPQA7jP1usSOQWtywdGpDYE6hezgS1/6fjcakvt7miBCa0/M5ELU2tJjQGAlxCiHoA3gOMqXZdIW1qXDyza1RCoirnhsaX2UkC2pbV2d6TAhNafmciF2RyqUspiIcSrAArRsG5os5Rys80tI3IGTYdgm64ldZQj3wGK5dJjoetYyNnSbq0/M5ELszlUhRABAMYD6A/gHID1QogpUsr3LztvJoCZABAZGWnr2xI5jpblA72C0HCrwkXXPdLxttjSbpZMJOoSNe7+/QWAn6WUp6SU9QA+BjDi8pOklEullClSypSQkJArLkJELTh/Bpd+THWAZy8tW0PUIr1ePzQ2NjZ+wIABCYMGDYp//vnnQy2WSyMs3377rXdqauqgqKiowfHx8XE33XTTgF27dnkBwBNPPBE+f/78UACYOHFitNFoTBw0aFB8dHT04Lvvvjv6559/drNex2g0JppMpvjY2Nh4k8kU//777/d2+IdthxqhWghguBDCWzTUpRoNwCkLHRO5nOiRl5bVGDxsX1ZDZAceHh5KXl7egcOHD+/fsmVL/ubNm/3nzJkTDgBFRUWGKVOmXLVgwYJjR48e/enAgQO5zzzzTOnBgwc9WrrWX/7yl2MHDx48UFBQ8FNSUlLNzTffPOjChQuNNQ+3bduWn5eXd2D9+vX/feqpp5yuAITNoSql3AngQwB70LCcRgdgqa3XJeqRLg9BW7aEa60MIlHBVh98/XwYCraqtvWbldFoNC9fvvzIypUr+yiKgldffbXPvffee+bWW29trN502223VU2dOvVcW9fR6XR47rnnTgYHB9d/+OGHV5QvPHfunL5Xr16Wll6rJVXu/pVSPgfgOTWuRdRjtbY+tKvzm60tq6GerWCrDz641wSlXocf31bwm3X5iLlJtXKFABAfH1+nKAqKi4sNubm5XtOmTTvT1WsNGTKkJjc3t3HnmlGjRpmklOLYsWPuK1asKFCnxephRSUiZ6H2+tDGGsJdrMhE3VPBVj8o9TpIBVDqdSjYqsrWb5drrQTukCFDYmNiYhKmT5/eoaHby6+zbdu2/EOHDu3PyMg4MGfOnMjy8nKnyjGnagxRj6Z2CNoydEzdV8xNldC5KRA6QOemIOYmVbZ+a+rAgQPuer0eRqPRHBcXdz4zM9Pb+ty+ffvynn322eMVFRUd2mUmJyfHOz4+/opt3hISEmqDgoLq9+zZ49nS67TC/VSJnIU91odyaQxdLuamavxmXT4Ktvoh5qZKtYd+jx8/bkhLS4uaPn36SZ1Oh9mzZ58cPnx43C9/+cty67xqdXV1ux06RVHw17/+tc+pU6fcJk6cWHH588XFxYZjx455DBgwoE7N9tuKoUrkTBiC5AgxN1WrGaa1tbW62NjYeLPZLPR6vZw8efKZ55577gQAREZGmlevXl0wd+7cfr/97W/dgoKCzAEBAebnn3++xcp7f/rTn/otXLiw74ULF3TJycnVW7ZsOejp6dk4Bjxq1CiTTqeD2WwW8+fPPxYREWFW63OogVu/ERG5CG795hy49RsREZEDMFSJiIhUwlAlIiJSCUOViIhIJQxVIiIilTBUiYiIVMJQJSIim7S19dvGjRv9/Pz8kuLi4uL79++fMHPmzH7W1zXd9s3KaDQmlpSUGICGHW7GjRvXv1+/fokJCQlxSUlJsatWrerd9LrWbeBGjBhhKi4ublZ7YfTo0VclJSXFNj32xBNPhHt5eSU3Pdfb2zu5pf9eu3atf1RU1OBDhw65d/S7YKgSEZFN2tr6DQBSUlKqcnNzD+Tk5Bz46quv/Ddv3tzu7jiKomDcuHEDRo4cWXXs2LGc/fv3565bt66gqKioMeBSUlKq8vLyDuTn5x9ITk6ufvXVV/tYnzt9+rR+//79PhUVFfq8vLxmodi7d2/zX/7yl2ZhfrnPPvvMb86cORFffPHFoYEDB3a4ahNDlYioh9lxfIfPG5lvhO04vsPuW7815evrKxMSEs4XFha22/P7/PPP/dzc3ORTTz11ynrMZDLVzZs37+Tl5yqKgsrKSn1AQEBjdaXVq1cH/OIXvzh39913n01PTw9sev79999/ZsOGDYEnTpxosf7wpk2bfGfNmhW9YcOGwwkJCbUd+NiNGKpERD3IjuM7fGZ9M8u08qeVxlnfzDLZI1ibbv3W9PipU6f0P//8s8eYMWPaLeKfk5PjNWTIkJq2zsnIyPCNjY2NDw8PH/Ldd9/5PfLII42VpdavXx84ZcqUsw888MDZjz76qFmo+vr6Wu6///7TCxcuvKK3WldXJyZPnjzgo48+OpycnHyh/U/bHEOViKgH2VGyw8+smHUKFJgVs25HyQ67b/2WkZHhazKZ4o1G49Vjxowpj4yMNAOAEKLFOrktHZ86dWrkoEGD4gcPHhxnPWYd/i0tLd3361//+swjjzzSD2iYiz169KjHmDFjqoYMGVJrMBjk7t27m+1mM3fu3JPr1q0LOnv2bLMcdHNzk9dcc03VO++8E9yVz81QJSLqQYb3HV5p0BkUHXQw6AzK8L7D7br1G9AQfvn5+QcyMjL2p6enh2zfvt0LAIKCgsxlZWXNerPV1dX64OBgS2Ji4vl9+/Y1bhm3evXqwq1bt+Zffr7VxIkTz+3cudMPANLT0wMrKir0ERERiUajMbG4uNhj9erVzXqrwcHBlrvvvvts03lYABBCYMOGDQXZ2dk+c+fODevsZ2eoEhH1IMPDh1cvGb0kf/rg6cVLRi/JHx4+3K5bvzU1ZMiQ2j/+8Y8lL730UhgAjB49uurLL7/0Lysr0wFAenp679jY2BqDwYBx48ZV1tbWipdffjnE+vqqqqpWM+vbb7/1jYqKqgWADz/8MPCTTz45VFxcnFNcXJyzc+fOA59++mng5a+ZN2/eifT09BCLxSKaHvfz81M2bdp06MMPPwxatGhRp3qs3PqNiKiHGR4+vFrNMG1r67fLzZ49+1RMTExYXl6e+7XXXns+LS3t5PDhw2OFEAgKCqpfsWLFEQDQ6XT4/PPP/ztr1qyIN998MywwMNDs7e1tef75549Zr2WdU5VSws/Pz7JixYojBw8edD9+/Lj7Lbfc0vj5YmNj63x9fS1btmxpNn/ct29f8+2331723nvvXTG3Ghoaatm0aVP+qFGjYkNCQsxTpkw515Hvglu/ERG5CG795hy49RsREZEDMFSJiIhUwlAlIiJSiSqhKoToLYT4UAiRJ4TIFUJcp8Z1iYiIXIlad//+DcAmKeWvhBDuALzbewEREVF3Y3OoCiF6AbgRwIMAIKWsA9Dh4sNERETdhRrDvzEATgFYKYTIEkIsF0JcUUtSCDFTCJEhhMg4derUlVchIiKX1NoWbRs3bvQTQgxds2aNv/Xcm2++ecDGjRv9ACA1NXVQdHT04NjY2PiYmJiEV199tbHQgtFoTDSZTPEmkyl+2LBhg/Lz8zu8/ZqW1AhVA4BrAPxdSpkMoBrA3MtPklIulVKmSClTQkJCLn+aiIhcUHtbtIWGhta//PLLfVt7/apVqwry8vIO/Pjjj3kvvPBCvwsXLjRWN9q2bVt+fn7+gRtuuKFy/vz5rV7DmagRqscAHJNS7rz4+EM0hCx1U6UF5cjcdASlBeVaN4WIuqB6+48+J197Pax6+48271DT3hZtcXFxNX5+fpZPPvmkV1vXqaio0Ht5eSkGg+GKikTXX399VUlJiZutbXUEm+dUpZSlQogiIcQgKeVBAKMBHLC9aeSMSgvK8dmiLFjMCvQGHcY/noywGP/2X0hETqF6+48+Rb//vUmazbqz6elKxDvv5PuMuK7LJQs7skXbn/70p5Jnn33WePfdd1dc/ty0adNi3N3dlcLCQs8XX3yx0GC4Mpa++OIL/3HjxnWoTKDW1Fqn+j8APhBC7AOQBOCvKl2XnExxfhksZgVSAhaLguL8Mq2bRESdUP3jj37SbNZBUSDNZl31jz+quvVbS1u0jR07tgpo2Pz78vNXrVpVkJ+ff6CgoGDf4sWLw5rOnY4aNcoUGBh49XfffdcrLS3trJrttBdVQlVKmX1xvnSIlHKClJK/abspoykAeoMOQgfo9ToYTQFaN4mIOsHnuusqhcGgQKeDMBgUn+uus2nrt45u0fbMM8+ULFiwoNV50fDwcPPgwYNr/vOf/zQOSW/bti2/sLBwn8lkOj979uxwW9rpKKyoRJ0SFuOP8Y8n49q7Yjj0S+SCfEZcVx3xzjv5Qb/9bbGtQ78AOrxF2z333FNRXl6uz83NbbGOQWVlpW7//v3egwYNqm163NfXV7799ttFH330UdCJEyf0trTVEbj1G3VaWIx/tw7T0oJyFOeXwWgK6Nafk3ounxHXVdsaplYd2aLN6umnny6ZMmXKgKbHpk2bFuPp6anU1dWJ++677/TIkSOvmJ+Nioqqv+uuu86++uqrfV555ZUSNdptL9z6jagJ3ohFzoxbvzkHbv1G1EG8EYuIbMFQJWqCN2IRkS04p0rUhPVGLM6pElFXMFSJLtPdb8Qi9dVkZaFm1254pw6Dd3Ky1s0hDTFUiYhsUJOVhcLpD0HW1UG4uyNy5QoGaw/GOdVuLvtkNpbnLEf2yWytm0LULdXs2g1ZVwcoCmR9PWp27da6SaQhhmo3ln0yG2mb0/DWnreQtjmNwUpkB/re/oBOB+h0EHo96o8fR01W1hXn1WRl4fS7S1t8ztV5e3s365q/+eabQdOmTYts6zUbN270++qrrxqrJ/3f//1fyOLFi4PUbNfBgwfd33nnnUA1r9keDv86mCPnXjJOZKDOUgcFCuqVemScyEBSnyS7vqc9lBaUI29Hw3rv2OF9Od9JDtPaz6v1uL63P068tBCwWACdDlJRcG7tWpxbvx6BD02H3+jRKP/0M5hPn0bVd98BZjOHiC/asmWLn6+vr+XWW2+tBoCmu9yo5dChQx5r164N/P3vf++wusEMVQdy9NxLSmgK3PXuqFfq4aZzQ0poSvsvcjKlBeX49PU9sJgbipTk/VCCCbOvYbCS3bX289r0OHQ6QFEAKS/9CwCKgrPL38PZlf9oCNwmrEPEWoZqUe5Zn2N5ZX79YgMqI+ICVams1Jrjx48bpk+fHlVcXOwOAK+//nphVFRU/apVq0J0Op1ct25d0BtvvFG4efPmXr6+vpYXXnjhxPbt270efvjhqPPnz+uioqJq16xZcyQkJMSSmpo6aOjQoVXff/99r8rKSv0777xzZOzYsVVmsxmzZs3q98MPP/jV1dWJtLS0k08++eTpefPmGQsKCjxjY2Pj77///tPPPffcSXt+VoCh6lAtzb3Y8wcrqU8Slo1ZhowTGUgJTXHJXmpDMYZLVb8sikRxfhlDleyutZ/XpscBNASrEBB6PWR9/aVgBa4IVAgB4eYG79RhjvsglynKPevz7yV7TRaL1O39plC5Y9bV+bYGa21trS42Njbe+ri8vFx/6623lgPA7373u4gnnnjixG233VZ16NAh99tuu21gQUHB/mnTpp2yhigAbN68uXG/1QcffLD/okWLCu+4446qxx57LPzpp58OX7FiRREAmM1mkZOTk7t27Vr/F154IXzs2LH5b7zxRrC/v7/lp59+yj1//rwYNmxY7Lhx4yoWLFhQ/Nprr4V+++23h235fJ3BUHUg79RhEO7ukPX1DvvBSuqT5JJhatVQjEE0BqteJ1iQgRyitZ/Xy4+HPjMXlnPl8E4dhspvvsHZ5e9duohefylYDQb0njgR/hPGa9pLPZZX5mexSB0koFik7lhemZ+toerh4aHk5eU17qP95ptvBmVkZPgAwA8//NDr0KFDXtbnqqqq9GVlZa3ez3PmzBl9ZWWl/o477qgCgLS0tDOTJk2KsT4/adKkMgAYMWJE9ZNPPukOAF9//XWvvLw87w0bNgQAQGVlpf7AgQOe7u7uDq/Dy1B1IO/kZESuXMH1bJ0QFuOPCU9cwzlVcrjWfl7b+jn2Tk6Ge0QEKjd/Bb8xt8LDZEL5p58BgOZhatUvNqBy7zeFimKROp1eKP1iA2za+q09UkpkZGTk+vr6qhJwnp6eEgAMBgMsFou4+B7itddeK5w4cWKzTdA3btyo6l6xHcFQdTDv5GSn+MFyJSzGQFpp7ee1rZ/jgMmTETB5crNznUlEXGD1HbOuznfUnOoNN9xQ8fLLL/d58cUXTwDA9u3bvUaMGHHez8/PUlFRccVWbkFBQZZevXpZNm3a5Dt27Niq9957L+i6666raus9br311vK///3vIXfeeWelh4eH3Ldvn0d0dHS9v7+/paqqyqHbxXFJDRFRDxMRF1h93d1Xldo7UAFg6dKlRXv27PExmUzxV111VcLixYtDAGDixInn/v3vf/eOjY2N37Rpk2/T16xcufLnp59+up/JZIrft2+f18KFC4+39R6PP/746djY2AuJiYlxAwcOTEhLS4uqr68Xqamp5w0Ggxw0aFD8n//85z72/JxW3PqNiMhFcOs358Ct36hLao9WoOLbItQerWj/ZGpXaUE5MjcdQWlBudZNISI74Zwqtaj2aAVOL8+BNCsQBh2CZyTCI6pX+y9UQWlBebfbJYabnxP1DAxValFtQTmkWQEkIM0KagvKHRKq3TV8Wtr8vDt8LiJqjsO/1CKPGH8Igw4QgDDo4OGgAGgpfLoDbn5O1DOo1lMVQugBZAAollLeqdZ1SRseUb0QPCOxoYca4++woV9r+FgsSrcKH25+TtQzqDn8+0cAuQAc89uX7M4jqpfDwtSqO4cP19sSdX+qDP8KIfoBuAPAcjWupxXenekcwmL8MXRsNAOIyEVYt347ePCguxBi6IIFCxrXhE6bNi3yzTffDJo6dWpkbGxs/FVXXZXg6el5TWxsbHxsbGz8ypUrAwCgvr4eAQEBV8+aNcvY9Nq1tbXiD3/4gzEqKmrwwIEDExITE+PWrVvXa8iQIbGxsbHxffv2TQwICLjaer2DBw+6O/bTN6dWT/UNAE8BaLUklBBiJoCZABAZ2eY2e5rorjfIEBE5UmBgoPndd9/tM3v27FPWkoIAsHr16kKgIXjvvPPOgU1rBQPAxx9/7N+/f//aDRs2BLz11lvFOl1Dn+/xxx8PLy0tdcvLy9vv5eUli4qKDF9++aXfvn378oBLdYZXrVpV6MCP2Sqbe6pCiDsBnJRSZrZ1npRyqZQyRUqZEhISYuvbqq673iDT03BtLVH7juZk+3y35h9hR3Oyfdo/u3MCAwPNN9xwQ+WSJUs6teH4P//5z8A//OEPJ8LDw+u2bNniAwCVlZW6NWvWhCxfvrzQy8tLAkBERIR5xowZTvsLWo3h3+sB3CWEOALgXwBuEUK8r8J1HYp3Z7o+69rais1HcHp5DoOVqAVHc7J9Pnn5z6ZdGz4yfvLyn032CNb58+eXLF68ONRsNnfo/KqqKrF9+3a/yZMnl0+aNOns+++/HwgABw4c8Ojbt29dYGCgonYb7cXmUJVSPiOl7CeljAZwH4AtUsopNrfMwaw3yFx7V4xDh34dMY9r795bTVYWTr+7FDVZWXa5fke1tLaWiJorzMn2s5jNOkgJxWzRFeZkq76TS2xsbF1SUlL1u+++G9iR89etW9d7+PDhlX5+fsqUKVPKNm3aFNDRQHY2LP7QhKPvznTEPK69KyPVZGWhcPpDkHV1EO7uiFy5ot1dOWqysuyyHZZ1ba31s3Z2bW3t0QqHLyEicrTIxKTKzC8+UxSzRacz6JXIxCS7bP02f/780nvvvfeqa6+9tt3r/+tf/wrMzMz0NRqNiUDDJucbN270Gz16dHVJSYl7WVmZLiAgwCV6q6oWf5BSbuUa1Y5zxDyuvXtvNbt2Q9bVAYoCWV+Pml272z4/KwtHH3gQ59auxbm1a1H4wIOq9XCta2t7jYnu9B8PHDqmniIqMan67qefyx921z3Fdz/9XH5UYpJddqpJTk6+MHDgwPPffPNNm3/dnj17VpeRkeF77NixfcXFxTnFxcU5CxcuLFyzZk2gn5+fct99951OS0uLvHDhggCAo0ePur399tsd6gFrgRWVNOSIeVxbKiO1NTSdebQMS749jKLIWAh3d0Cvh3Bzg3fqsDavWbNrN1Bf3/i4I0Hc7PXtDDV7RPVCr5sjOt3T5NAx9SRRiUnVI3/9YKm9AtXq2WefLTlx4kSbS1zef//9gBEjRlRab0QCgPvuu+/cV1991fv8+fPijTfeKA4ODjabTKaEgQMHJowbN+6q0NBQpx0b5tZvGnNE8fiuDGu2NTSdebQMv1m+A3VmBe4GHf45wgcRhXnwTh3WoaHfow88CNTVAUDDkHH6Pzo0BNyVoeaO0nIDAaKO4tZvzqGtrd84p6oxR8zjdqUyUlsF4HcUnEGdWYEigXqzgu2e4Zj1uxs7dF3v5GREpf+jS3OqLQ01qzYfq1FZRiLqXhiq1KK2avAOjwmCu0GHerMCN4MOw2M6tRwNAOAWHt6hnm1T3qnDINzdIevrOzTU3FlalGUkou6FoUotaqsG79CoAHwwYzh2FJzB8JggDI3q+FxwR4dwj+fnomh/DiISEhFuigPQ0MuNXLmioYfayUAmInIEhiq1qq2h6aFRAZ0KU6uODOEez8/F+hfnwWI2Q28wYNKzC5oFqzOEaUuhT0TEUCWH6sgQbtH+HFjMZkhFgcVsRtH+HKcKrrZCn4h6NoYqAVDvLuSarKw2h2c7MoQbkZAIvcHQGFoRCYktvpdWvUVnD30i0g5DlTpU2akjAdbR+dJzPp4o6tMbET6e8G7hOuGmOEx6dkGb76dlb7Gjoe8I2SezkXEiAymhKUjqk6RZO6hne/rpp8M++uijIJ1OJ3U6Hfz9/c3l5eWGmpoaXVlZmcFoNNYBwFtvvXV03rx5/U6ePOnm4eGhuLm5yaVLlx4ZMWLEeQAwGo2JGRkZuX379jULIYbOmDHjxLJly44BwPz580Orqqr0r7/++nEAePvttwPfeOONMEVRhF6vl0lJSdVLliw5FhwcbNHum2CoEtpePgN0PMBsnS9tKszUN7QAABQSSURBVNwU12ZI2tJbtLWH25HQd4Tsk9lI25yGOksd3PXuWDZmGYOVHO7rr7/2+fLLL3vn5OQc8PLykiUlJYba2loRHR1dv3HjRr/XXnst9Ntvvz1sPX/evHlYtWpVwY033ljzt7/9LWjOnDn9tm/ffujy67q7u8svvvgioKSkpLRv377Nij18+OGHvZYsWRL65ZdfHurfv3+92WzG4sWLg4qLiw1ahyorKlG7lZ1aCrCWWOdL26qu1NFrtcfaWxQ6Xad6i9ZQ/2Hd+1j/4jwcz8/t0vuHm+Jw7d33ajrsm3EiA3WWOihQUK/UI+MEC6pQx1w4XOZTvunnsAuHy2zeoaa4uNgtMDDQbK2I1LdvX3N0dHR9e68DgBtvvLG6tYpLer1eTps27dRf//rX0Mufe+mll/ouXLjwWP/+/esBwGAw4LHHHjtz9dVX19ryWdTAUO3GOro7TXs79HQ0wKzzpSGPPtrq0G9L1+rqLjrxN45G4i23dWroV61QdwYpoSlw17tDL/Rw07khJTSl/RdRj3fhcJnP6X/sN1VuO2Y8/Y/9JluDdcKECRXHjx93j46OHjxlypTIf//7374dfe3nn3/e6/bbbz/X2vNPPvnkyY8//jjwzJkz+qbHDx8+7DVixIgaW9ptLxz+VUnm0bIurdu0l86W3Wtr+UxnhjvbW/Jy+bWCPIydLg94+RBywqhb2jy/KWeaD7VVUp8kLBuzjHOq1Cm1h8/5wSJ1kAAsUld7+Jyf54CALtcA9vf3V3766acDmzZt8vvmm2/8Hnjggavmz59/7NFHHz3T2mumTZsWc/78eZ2iKMjIyGh1uCgwMFCZNGnSmYULF/bx8vJqcZeaXbt2eU2bNq1/dXW1bv78+cVpaWmabmDOnqoKrLVwX9t8EL9ZvgOZR7XflF7tAvFqDnc2vVZX2mlLb9Ma6tffO6VbLIVJ6pOEGYkzGKjUYR4DeldCLxQIAHqheAzobfPWbwaDAXfeeWflokWLjr/yyiuFn376aZs9i1WrVhUUFhbmTJgw4WxaWlpkW+c+88wzJ9asWRNcXV3dmFcDBgw4v337dm8ASE1NPZ+Xl3fg5ptvrjh//rzmmaZ5A7qDy2vh7iho9Q80h7FldxpH6ko7uzqfauUM86FEWvEcEFAd/GBCvt+ofsXBDybk29JLBYC9e/d65OTkeFgfZ2VlefXr16+uvdd5eHjIRYsWFWdnZ/vs2bPHs7XzQkNDLePGjStbs2ZNsPXYU089VTp37tx+//3vf92sx6xbw2mNw78qUKMWrto6UiBeq3Wel69l7Wwhe2e5+5bIVXkOCKi2NUytKioq9I8++mhkRUWFXq/Xy+jo6Nr09PSjHXmtr6+vfPjhh08sXLgwdN26da2+Zt68eaXp6ekh1seTJ08uP3nypOH2228faLFYRK9evSyxsbHnx48fr/lGyNz6TSXONqfans6u8+xqAF8eoJevZQ19Zi4s58pZy5eoA7j1m3Pg1m8O0NVauFrpzDrPrhZaaKkYRLO1rHV1KH3xL4CiqL4/KhGRFjin2kN1Zl6yqzcGtVgMoslaVuh0gMXS7PmuOp6fi52frOvyulMiIjWwp9pDdWZesqvLUFoqnt+09q++tz9OvLTQ5v1RWeCeejhFURSh0+kcP5fXAymKIgC0uLwHYKj2aO2VAmx6XlduDGqteH7TtaweJpPN+6OywD31cD+dOnUqPiQkpJzBal+KoohTp075A/iptXMYqtQhHQ3gy7VXDEKN/VG7U0EHos4ym80zSktLl5eWlg4Gp/TsTQHwk9lsntHaCTbf/SuEiACwCkDYxTdcKqX8W1uv6Y53/5K2uGk49QQt3f1LzkWNnqoZwGwp5R4hhB+ATCHEV1LKAypcm6hDutqTJiJSk81DBVLKEinlnov/XQkgF4DR1utSz9HVgvquIvNoGZZ8e9gpylcSkX2pOqcqhIgGkAxgZwvPzQQwEwAiI9ss9Ug9SGcL/7saa13oOrMCd4MOH8wY7lLrmYmoc1Sb1BZC+AL4CMBjUsoruhxSyqVSyhQpZUpISMiVFyCnZq/epNqF/52NM9aFJiL7UaWnKoRwQ0OgfiCl/FiNa5LzsGdv0lpQ33ptZy3831XOWBeaiOzH5lAVQggA7wHIlVK+bnuTyNm01JtULVQ7UPjflQ2NCsAHM4a7VF1oIuo6NXqq1wOYCiBHCJF98dj/Sim/UOHa3VJpQTmK88tgNAW0ujG4M7F3b9Ijqle3C9OmXK0uNBF1nc2hKqX8HoBT7GPnCkoLyvHZoixYzAr0Bh3GP57s9MHa3XuTRERqYUUlByvOL4PFrEBKwGJRUJxf5vShCnT/3iSR2rJPZiPjRAZSQlOQ1CdJ6+aQg7hkqLra8GlTRlMA9AYdLBYFer0ORhOHBYm6m+yT2UjbnIY6Sx3c9e5YNmYZg7WHcLlQdcXh06bCYvwx/vFkl/2joLtztc3myTllnMhAnaUOChTUK/XIOJHBUO0hXC5UXXX4tKmwGH+Xa3NPwEINpJaU0BS4691Rr9TDTeeGlFCW6+0pXC5UOXxK9tJSoQaGKnVFUp8kLBuzjHOqPZDLhSqHT8leWKiB1JTUJ4lh2gO5XKgCHD4l+2ChBiKylUuGKpG9sFADEdmCu8QTEdlZTVYWTr+7FDVZWVo3heyMPVVyCa68Npl6tpqsLBROfwiyrg7C3R2RK1fAOzlZ62aRnbhcT7W0oByZm46gtJttEUats65N3vlZAT5blMX/7cml1OzaDVlXBygK5IULKJn3J/ZYuzGXClX+cu2ZWlqbTOQqvFOHAbpLv2rrCgpwdOo0Bms35VKhmrejBOb6hl+uZjN/ufYU1rXJQgeuTSaX452cDENoaPODZjNqdu3WpkFkVy4zp1paUI7c749fOiABTx837RrUA2lVwo9rk8nV6Tw8rjjmnTpMg5aQvblMqBbnl0FRmh+7UF2vTWN6IK1L+HFtcufx5i7n4XvLzThbUND4OHDGb3mzUjflMqFqNAVArxewWCQAQG8QHAZ0IJbwcy3W+w/MZgU6IXDj/SYkjDRq3aweqSYrC2Xvf9DwQKdD4EPTETpnjraNIrtxmVANi/HHhNnXIG9HCQAgdnhf/vXtQCzh51qK88tgNiuABBQp8Z9/5iPI6MufGQ003v0LAEJA78d9ibszlwlVgEOAWmIJP9diNAVAJwQU2TCyo0jpkjs6dQfeqcMg3N0h6+sh3Nw4l9rNuVSokrZYws91hMX448b7TfjPP/OhSAmDgXdNa8U7ORmRK1egZtdueKcO41xqN8dQJeqmEkYaEWT05c1KTsA7OZlh2kMwVIm6MU6ZEDmWSxV/ICIicmaqhKoQYqwQ4qAQ4rAQYq4a1yQiInI1NoeqEEIPYAmA2wHEA7hfCBFv63WJiIhcjRo91VQAh6WUBVLKOgD/AjBehesSERG5FDVC1QigqMnjYxePNSOEmCmEyBBCZJw6dUqFtyUiInIuaoSqaOGYvOKAlEullClSypSQkBAV3paIiMi5qBGqxwBENHncD8DxVs4lIiLqttQI1d0ABgoh+gsh3AHcB2CDCtclIiJyKTYXf5BSmoUQjwD4EoAewAop5X6bW0ZERORiVKmoJKX8AsAXalyLiIjIVbGiEhERkUoYqkRERCphqBIREamEoUpERKQShioREZFKuJ+qE9v+8WEUZJ9ETFIfjLhngNbNISKidjBUndT2jw8ja3MhADT+y2AlInJuHP51UgXZJ9t8TEREzoeh6qRikvq0+ZiIiJwPh3+dlHWol3OqRESug6HqxEbcM4BhSkTkQjj8S0REpBKGKhERkUoYqkRERCphqBIREamEoUpERKQShioREZFKGKpEREQqYagSERGphKFKRESkEoYqERGRShiqREREKrEpVIUQrwgh8oQQ+4QQnwgheqvVMCIiIldja0/1KwCDpZRDAOQDeMb2JhHZV2lBObauycPWNXkoLSjXujlE1I3YtEuNlHJzk4c7APzKtuYQ2VdpQTk+fX0PLGYJAMj7oQQTZl+DsBh/jVvm+mqyslCzazf0vf1hOVcO79Rh8E5O1rpZRA6l5tZvDwFYq+L1iFRXnF/WGKgAYFEkivPLGKpdYA1R79RhAIDC6Q9B1tUBigIIAeHhgciVKxis1KO0G6pCiK8BhLXw1Dwp5WcXz5kHwAzggzauMxPATACIjIzsUmOJbGU0BUBvEI3BqtcJGE0BGrfK9dRkZTWGqHB3h//48ZcCFQCkhKyvbwhdhir1IO2GqpTyF209L4R4AMCdAEZLKWVr50kplwJYCgApKSmtnkdkT2Ex/pjwxDXI21ECAIgd3pe91C6o2bW7MURlfT0AQLi7Nw9WRYGlskLDVhI5nk3Dv0KIsQCeBjBKSlmjTpOI7Cssxp9BaiPv1GENIVpfD+HmBv8J4+E/YTzKP/0M59avbwhWKXF2+Xtwj4hAwOTJWjeZyCFsvft3MQA/AF8JIbKFEO+o0CYicnLeycmIXLkCIY8+2jhv2jjMa+2pXlS5+SsNWkikDVvv/h2gVkOIyLU0C1I0zLOWf/LJFed5xMU6sllEmmJFJSJSRc2u3ZBmc/ODQkDv10ubBhFpgKFKRKrQ9/ZvPvR7cVmNdclNUzVZWTj97lLUZGU5sIVE9qfmOlUi6sEs58oBIQApASHgM2IEgh+ZdcWSmsuX43AtK3Un7KkSkSq8U4dBeHgAej2Eh0eLgQpcuRynZtduDVpLZB/sqRKRKqx3BFurLLXW+7x8OU5Lw8NErkq0Ua/BblJSUmRGRobD35eInEPTEocc+u04IUSmlDJF63ZQ69hTJXIhx/NzUbQ/BxEJiQg3xWndnC67fDkOUXfBUCVyAcfzc7F/2xbs3/o1FMUCvcGASc8ucOlgJeqOGKpETu54fi7WvzgP5vr6hjtrAVjMZhTtz2GoEjkZ3v1L5OSK9ufAYjY3BiqEgN5gQERCorYNI6IrsKdK5OQiEhKhNxhgMZuh0+mQcNOtSBh1C3upRE6IoUrkALbcYBRuisOkZxd0ixuUiLo7hiqRnVnnRC1mc5dvMAo3xTFMiVwA51SJ7Mw6JyoVpfEGIyLqnhiqRHZmnRMVOh1vMCLq5jj8S2RnnBMl6jkYqkQOwDlRop6BoUrkwkoLylGcXwajKQBhMf7IPFqGT3N/gMH7Z4yPHYmkPklaN5GoR2GoErmo0oJyfLYoCxazAr1Bh0H3DcDDWzdCH74UEGZsOJqO925bzmAlciDeqETkoorzy2AxK5ASsFgU7M8+AcX9MCDMEEKiXqlHxgnuBkXkSAxVIhdlNAVAb9BB6AC9XoeEpFDo6gYA0gApBdx0bkgJbdglLPtkNpbnLEf2yWyNW03UvXH4l8hFhcX4Y/zjyc3mVN/vNxmf5vZrNqeafTIbaZvTUGepg0FnwIQBEzDuqnEcFiayA25STtTNLc9Zjrf2vAUFCgBAQMBD74FlY5YxWF0MNyl3fqoM/woh5gghpBAiWI3rEZF6UkJT4K53h4AAAEhwvpXIXmwOVSFEBIBbARTa3hwiUltSnyQsG7MMvzL9Cu46d+iFvtl8KxGpR4051UUAngLwmQrXIiI7SOqThKQ+SbjrqruQcSIDKaEpHPolsgObQlUIcReAYinlXiFEe+fOBDATACIjI215WyLqImu4EpF9tBuqQoivAYS18NQ8AP8LYExH3khKuRTAUqDhRqVOtJGIiMgltBuqUspftHRcCJEIoD8Aay+1H4A9QohUKWWpqq0kIiJyAV0e/pVS5gDoY30shDgCIEVKeVqFdhEREbkcVlQiIiJSiWoVlaSU0Wpdi4iIyBWxp0pERKQSTcoUCiFOATjq8De+JBgA534b8Lu4hN9Fc/w+LnGW7yJKShmidSOodZqEqtaEEBmsn9mA38Ul/C6a4/dxCb8L6igO/xIREamEoUpERKSSnhqqS7VugBPhd3EJv4vm+H1cwu+COqRHzqkSERHZQ0/tqRIREamOoUpERKSSHh+qQog5QggphAjWui1aEUK8IoTIE0LsE0J8IoTorXWbHE0IMVYIcVAIcVgIMVfr9mhFCBEhhPhWCJErhNgvhPij1m3SmhBCL4TIEkJs1Lot5Px6dKgKISIA3AqgUOu2aOwrAIOllEMA5AN4RuP2OJQQQg9gCYDbAcQDuF8IEa9tqzRjBjBbShkHYDiAWT34u7D6I4BcrRtBrqFHhyqARQCeAtCj79aSUm6WUpovPtyBhm38epJUAIellAVSyjoA/wIwXuM2aUJKWSKl3HPxvyvRECZGbVulHSFEPwB3AFiudVvINfTYUBVC3AWgWEq5V+u2OJmHAPw/rRvhYEYARU0eH0MPDhIrIUQ0gGQAO7VtiabeQMMf3orWDSHXoNouNc5ICPE1gLAWnpoH4H8BjHFsi7TT1nchpfzs4jnz0DD894Ej2+YERAvHevTohRDCF8BHAB6TUlZo3R4tCCHuBHBSSpkphLhJ6/aQa+jWoSql/EVLx4UQiQD6A9grhAAahjv3CCFSpZSlDmyiw7T2XVgJIR4AcCeA0bLnLV4+BiCiyeN+AI5r1BbNCSHc0BCoH0gpP9a6PRq6HsBdQohfAvAE0EsI8b6UcorG7SInxuIPAIQQRwCkSCmdYRcKhxNCjAXwOoBRUspTWrfH0YQQBjTcoDUaQDGA3QB+LaXcr2nDNCAa/spMB3BWSvmY1u1xFhd7qnOklHdq3RZybj12TpWaWQzAD8BXQohsIcQ7WjfIkS7epPUIgC/RcGPOup4YqBddD2AqgFsu/n8h+2JPjYg6gD1VIiIilbCnSkREpBKGKhERkUoYqkRERCphqBIREamEoUpERKQShioREZFKGKpEREQq+f/oMbvoMgYzqQAAAABJRU5ErkJggg==\n", 114 | "text/plain": [ 115 | "
" 116 | ] 117 | }, 118 | "metadata": { 119 | "needs_background": "light" 120 | }, 121 | "output_type": "display_data" 122 | } 123 | ], 124 | "source": [ 125 | "from matplotlib import cm\n", 126 | "import matplotlib.pyplot as plt\n", 127 | "from sklearn.utils import check_random_state\n", 128 | "from sklearn.manifold import TSNE\n", 129 | "\n", 130 | "dataset_id = {}\n", 131 | "for rel_name, i in rel2id.items():\n", 132 | " rel_key = rel_name.split('::')[0]\n", 133 | " if dataset_id.get(rel_key, None) is None:\n", 134 | " dataset_id[rel_key] = []\n", 135 | " dataset_id[rel_key].append(i)\n", 136 | "\n", 137 | "X_embedded = TSNE(n_components=2).fit_transform(rel_emb).T\n", 138 | "p = cm.rainbow(int(255/2 * 1))\n", 139 | "fig = plt.figure()\n", 140 | "ax = fig.add_subplot(111)\n", 141 | "for key, val in dataset_id.items():\n", 142 | " val = np.asarray(val, dtype=np.long)\n", 143 | "\n", 144 | " ax.plot(X_embedded[0][val], X_embedded[1][val], '.', label=key)\n", 145 | "\n", 146 | "lgd = ax.legend(bbox_to_anchor=(1.0, 1.0))\n", 147 | "plt.savefig('relation.eps', bbox_extra_artists=(lgd,), bbox_inches='tight', format='eps')" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "## Pair-wise Relation Embedding Cosine Similarity\n", 155 | "We calculate the pair-wise embedding similarity using cosine distance and output the top10 most similar pairs." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 16, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/plain": [ 166 | "[('GNBR::E::Compound:Gene', 'GNBR::K::Compound:Gene', 0.98599356),\n", 167 | " ('GNBR::E::Compound:Gene', 'GNBR::E+::Compound:Gene', 0.98297095),\n", 168 | " ('GNBR::N::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.96987224),\n", 169 | " ('GNBR::E::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.96532124),\n", 170 | " ('GNBR::K::Compound:Gene', 'GNBR::E+::Compound:Gene', 0.9564862),\n", 171 | " ('GNBR::E+::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.95019233),\n", 172 | " ('GNBR::L::Gene:Disease', 'GNBR::G::Gene:Disease', 0.9419448),\n", 173 | " ('GNBR::K::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.94074607),\n", 174 | " ('GNBR::J::Gene:Disease', 'GNBR::Md::Gene:Disease', 0.9319676),\n", 175 | " ('GNBR::J::Gene:Disease', 'GNBR::Te::Gene:Disease', 0.93183714)]" 176 | ] 177 | }, 178 | "execution_count": 16, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "from sklearn.metrics.pairwise import cosine_similarity\n", 185 | "\n", 186 | "similarity = cosine_similarity(rel_emb)\n", 187 | "idx = np.flip(np.argsort(similarity), axis=1)\n", 188 | "\n", 189 | "max_pairs = []\n", 190 | "for i in range(idx.shape[0]):\n", 191 | " j = 1\n", 192 | " while (similarity[i][idx[i][j]] > 0.9):\n", 193 | " max_pairs.append((id2rel[idx[i][0]], id2rel[idx[i][j]], similarity[i][idx[i][j]]))\n", 194 | " j += 1\n", 195 | "\n", 196 | "def sort_score(pair):\n", 197 | " return pair[2]\n", 198 | "\n", 199 | "max_pairs.sort(reverse=True, key=sort_score)\n", 200 | "sim_pairs = []\n", 201 | "for i, pair in enumerate(max_pairs):\n", 202 | " if i % 2 == 0:\n", 203 | " sim_pairs.append(pair)\n", 204 | "\n", 205 | "sim_pairs[:10]" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "Then we draw a histogram of how the pair-wise similarity score distributed." 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 17, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "(11449,)\n" 225 | ] 226 | }, 227 | { 228 | "data": { 229 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAdEUlEQVR4nO3de7gcVZnv8e+PcFWCCRIwBDCBiTPCiAE2lxFGbnJ1JKCAgXGIHDQ6Awpn9ByDNxCGAY4CI4+IhiFyGSUgCEQFmRACqCOXREJCuJgNRNkkkii3CEOGhPf8UatNsdO7q7LT1d3Z+/d5nn66atWqrnfXTvbbVWvVWooIzMzMGtmg3QGYmVnnc7IwM7NCThZmZlbIycLMzAo5WZiZWaEN2x1AFbbaaqsYPXp0u8MwM1uvzJkz5w8RMaLetgGZLEaPHs3s2bPbHYaZ2XpF0m/72ubbUGZmVsjJwszMCjlZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVmhAPsFtVmT05J+27diLLvhg245t1l++sjAzs0KVJQtJm0p6QNLDkhZI+loqHyPpfkkLJV0vaeNUvkla707bR+c+68xU/oSkw6qK2czM6qvyymIFcFBEvBcYBxwuaR/gQuCSiBgLvACckuqfArwQEX8BXJLqIWlnYAKwC3A48G1JQyqM28zMeqksWUTmT2l1o/QK4CDgxlR+NXB0Wh6f1knbD5akVD4tIlZExNNAN7BXVXGbmdmaKm2zkDRE0lxgKTADeBJ4MSJWpio9wKi0PAp4BiBtfwl4e768zj75Y02SNFvS7GXLllXx45iZDVqVJouIWBUR44DtyK4G3l2vWnpXH9v6Ku99rCkR0RURXSNG1J27w8zM+qklvaEi4kXgbmAfYJikWpfd7YDFabkH2B4gbX8b8Hy+vM4+ZmbWAlX2hhohaVha3gz4APAYMAs4NlWbCNyalqenddL2uyIiUvmE1FtqDDAWeKCquM3MbE1VPpQ3Erg69VzaALghIn4i6VFgmqR/AR4Crkz1rwSuldRNdkUxASAiFki6AXgUWAmcGhGrKozbzMx6qSxZRMQ8YLc65U9RpzdTRLwGHNfHZ50HnNfsGM3MrBw/wW1mZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMytUWbKQtL2kWZIek7RA0ump/GxJz0qam15H5vY5U1K3pCckHZYrPzyVdUuaXFXMZmZW34YVfvZK4HMR8WtJQ4E5kmakbZdExDfylSXtDEwAdgG2Be6U9K60+TLgEKAHeFDS9Ih4tMLYzcwsp7JkERFLgCVpebmkx4BRDXYZD0yLiBXA05K6gb3Stu6IeApA0rRU18nCzKxFWtJmIWk0sBtwfyo6TdI8SVMlDU9lo4Bncrv1pLK+ynsfY5Kk2ZJmL1u2rMk/gZnZ4FaYLCTtJGmTtHyApM9KGlb2AJI2B24CzoiIl4HLgZ2AcWRXHhfVqtbZPRqUv7kgYkpEdEVE14gRI8qGZ2ZmJZS5srgJWCXpL4ArgTHAD8p8uKSN0v7fj4gfAUTEcxGxKiLeAK5g9a2mHmD73O7bAYsblJuZWYuUSRZvRMRK4Bjg3yLifwMji3aSJLLk8lhEXJwrz+97DPBIWp4OTJC0iaQxwFjgAeBBYKykMZI2JmsEn14ibjMza5IyDdyvSzoBmAh8KJVtVGK/fYF/AOZLmpvKvgicIGkc2a2kRcCnACJigaQbyBquVwKnRsQqAEmnAXcAQ4CpEbGgxPHNzKxJyiSLk4FPA+dFxNPpW/9/FO0UEb+gfnvDbQ32OQ84r075bY32MzOzajVMFpKGAF+MiI/VyiLiaeCCqgMzM7PO0bDNIt0GGpHaCszMbJAqcxtqEfBLSdOBV2qF+UZrMzMb2Moki8XptQEwtNpwzMysExUmi4j4WisCMTOzztVnspD0bxFxhqQfU/+J6aMqjczMzDpGoyuLa9P7NxrUMTOzQaDPZBERc9L7Pa0Lx8zMOlFhm4WkscD5wM7AprXyiNixwrjMzKyDlBkb6ntkI8WuBA4ErmH1LSozMxsEyiSLzSJiJqCI+G1EnA0cVG1YZmbWSco8Z/GapA2AhWlAv2eBrasNy8zMOkmZK4szgLcAnwX2IBtJdmKVQZmZWWcp81DegwDp6uKzEbG88qjMzKyjlJlWtUvSfGAe2dwUD0vao/rQzMysU5Rps5gK/FNE/BxA0n5kPaR2rTIwMzPrHGXaLJbXEgX8eVIj34oyMxtEylxZPCDpu8B1ZGNEfRS4W9LuABHx6wrjMzOzDlAmWYxL72f1Kn8fWfLwMxdmZgNcmd5QB7YiEDMz61xl2izMzGyQc7IwM7NCThZmZlaoTAM3kt4HjM7Xj4hrKorJzMw6TJn5LK4FdgLmAqtScZANVW5mZoNAmSuLLmDniFhjHu5GJG1PllDeAbwBTImIb0raErie7EplEXB8RLwgScA3gSOBV4GP157hkDQR+HL66H+JiKvXJhYzM1s3ZdosHiH7g7+2VgKfi4h3A/sAp0raGZgMzIyIscDMtA5wBDA2vSaRTbhESi5nAXsDewFnSRrej3jMzKyfylxZbAU8KukBYEWtMCKOarRTRCwBlqTl5ZIeA0YB44EDUrWrgbuBL6Tya9IVzH2ShkkamerOiIjnASTNAA4ne6LczMxaoEyyOHtdDyJpNLAbcD+wTUokRMQSSbWJlEYBz+R260llfZX3PsYksisSdthhh3UN2czMcgpvQ0XEPcDjwND0eiyVlSJpc+Am4IyIeLlR1XqHb1DeO84pEdEVEV0jRowoG56ZmZVQZj6L44EHgOOA44H7JR1b5sMlbUSWKL4fET9Kxc+l20uk96WpvAfYPrf7dsDiBuVmZtYiZRq4vwTsGRETI+IkskbmrxTtlHo3XUl2JXJxbtN0Vk/LOhG4NVd+kjL7AC+l21V3AIdKGp4atg9NZWZm1iJl2iw2iIilufU/Ui7J7Es2X/d8SXNT2ReBC4AbJJ0C/I7sigXgNrJus91kXWdPBoiI5yWdCzyY6p1Ta+w2M7PWKJMsfibpDlb3Pvoo2R/2htIkSfXaGwAOrlM/gFP7+KypZDP2mZlZG5QZovz/SPoI2ZWCyB6uu7nyyMzMrGOUGhsqIm4ia6g2M7NBqM9kIekXEbGfpOW8uauqyO4abVF5dGZm1hH6TBYRsV96H9q6cMzMrBOVec7i2jJlZmY2cJXpArtLfkXShsAe1YRjZmadqM9kIenM1F6xq6SX02s58ByrH6QzM7NBoM9kERHnp/aKr0fEFuk1NCLeHhFntjBGMzNrszLPWZyZhtkYC2yaK7+3ysDMzKxzlJlW9RPA6WQD+M0lm8joV8BB1YZmZmadokwD9+nAnsBvI+JAsnkpllUalZmZdZQyyeK1iHgNQNImEfE48JfVhmVmZp2kzHAfPZKGAbcAMyS9gOeTMDMbVMo0cB+TFs+WNAt4G/CzSqMyM7OO0mhsqC3rFM9P75sDnlPCzGyQaHRlMYc158CurQewY4VxmZlZB2k0kOCYVgZiZmadq8xAgpL0MUlfSes7SNqr+tDMzKxTlOk6+23gb4AT0/py4LLKIjIzs45Tpuvs3hGxu6SHACLiBUkbVxyXmZl1kDJXFq9LGkKaLU/SCOCNSqMyM7OOUiZZXArcDGwt6TzgF8C/VhqVmZl1lDIP5X1f0hzgYLJus0dHxGOVR2ZmZh2jYbKQtAEwLyL+Gni8NSGZmVmnaXgbKiLeAB6WtEOL4jEzsw5Ups1iJLBA0kxJ02uvop0kTZW0VNIjubKzJT0raW56HZnbdqakbklPSDosV354KuuWNHltf0AzM1t3ZbrOfq2fn30V8C3gml7ll0TEN/IFknYGJgC7ANsCd0p6V9p8GXAI0AM8KGl6RDzaz5jMzKwfyjRw39OfD46IeyWNLll9PDAtIlYAT0vqBmpPiXdHxFMAkqaluk4WZmYtVOY2VLOdJmleuk01PJWNAp7J1elJZX2Vr0HSJEmzJc1etswT+ZmZNVOrk8XlwE7AOGAJcFEqV526vUe8zZevWRgxJSK6IqJrxIgRzYjVzMySPpOFpJnp/cJmHSwinouIVamX1RWsvtXUA2yfq7od2Wx8fZWbmVkLNWqzGClpf+Co1Fbwpm/5EfHrtT2YpJERsSStHgPUekpNB34g6WKyBu6xwAPpmGMljQGeJWsEPxEzM2upRsniq8Bksm/zF/faFsBBjT5Y0nXAAcBWknqAs4ADJI1L+y8CPgUQEQsk3UDWcL0SODUiVqXPOQ24AxgCTI2IBWvx85mZWRM0mvzoRuBGSV+JiHPX9oMj4oQ6xVc2qH8ecF6d8tuA29b2+GZm1jxlus6eK+ko4P2p6O6I+Em1YZmZWScpM1Pe+cDpZLeIHgVOT2VmZjZIlHmC+4PAuNSDCUlXAw8BZ1YZmJmZdY6yz1kMyy2/rYpAzMysc5W5sjgfeEjSLLKurO/HVxVmZoNKmQbu6yTdDexJliy+EBG/rzowMzPrHGWuLEgP0hUOS25mZgNTOwYSNDOz9YyThZmZFWqYLCRtkJ/pzszMBifPwW1mZoXKNHDX5uB+AHilVhgRR1UWlZmZdZQq5+A2M7MBotQc3JLeCYyNiDslvYVsuHAzMxskygwk+EngRuC7qWgUcEuVQZmZWWcp03X2VGBf4GWAiFgIbF1lUGZm1lnKJIsVEfE/tRVJG5LNdGdmZoNEmWRxj6QvAptJOgT4IfDjasMyM7NOUiZZTAaWAfPJ5sy+DfhylUGZmVlnKdMb6o004dH9ZLefnogI34YyMxtECpOFpA8C3wGeJBuifIykT0XE7VUHZ2ZmnaHMQ3kXAQdGRDeApJ2AnwJOFmZmg0SZNoultUSRPAUsrSgeMzPrQH1eWUj6cFpcIOk24AayNovjgAdbEJuZmXWIRlcWH0qvTYHngP2BA8h6Rg0v+mBJUyUtzQ9xLmlLSTMkLUzvw1O5JF0qqVvSPEm75/aZmOovlDSxXz+lmZmtkz6vLCLi5HX87KuAbwHX5MomAzMj4gJJk9P6F4AjgLHptTdwObC3pC2Bs4AusquaOZKmR8QL6xibmZmthTK9ocYAnwFG5+sXDVEeEfdKGt2reDzZ1QnA1cDdZMliPHBN6pJ7n6RhkkamujMi4vkUywzgcOC6orjNzKx5yvSGugW4kuyp7TfW8XjbRMQSgIhYIqk2xtQo4JlcvZ5U1lf5GiRNAiYB7LCD52oyM2umMsnitYi4tOI4VKcsGpSvWRgxBZgC0NXV5YcGzcyaqEzX2W9KOkvS30javfbq5/GeS7eXSO+1Lrg9wPa5etsBixuUm5lZC5W5sngP8A/AQay+DRVpfW1NByYCF6T3W3Plp0maRtbA/VK6TXUH8K+1XlPAocCZ/TiumZmtgzLJ4hhgx/ww5WVIuo6sgXorST1kvZouAG6QdArwO7JnNiAbnPBIoBt4FTgZICKel3Quq5/rOKfW2G1mZq1TJlk8DAxjLZ/ajogT+th0cJ26QTbJUr3PmQpMXZtjm5lZc5VJFtsAj0t6EFhRKyzqOmtmZgNHmWRxVuVRmJlZRyszn8U9rQjEzMw6V5knuJez+tmGjYGNgFciYosqAzMzs85R5spiaH5d0tHAXpVFZGZmHafMQ3lvEhG30L9nLMzMbD1V5jbUh3OrG7B6BFgzMxskyvSG+lBueSWwiGyUWDMzGyTKtFms67wWZma2nms0repXG+wXEXFuBfGYmVkHanRl8UqdsrcCpwBvB5wszMwGiUbTql5UW5Y0FDidbIC/acBFfe1nZmYDT8M2izQH9j8Df082Derunv/azGzwadRm8XXgw2Szz70nIv7UsqjMzKyjNHoo73PAtsCXgcWSXk6v5ZJebk14ZmbWCRq1Waz1091mZjYwlXkoz8yaaPTkn7bluIsu+GBbjmsDg68ezMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSF3nbW2alc3UjNbO76yMDOzQm1JFpIWSZovaa6k2alsS0kzJC1M78NTuSRdKqlb0jxJu7cjZjOzwaydVxYHRsS4iOhK65OBmRExFpiZ1gGOAMam1yTg8pZHamY2yHXSbajxZMOgk96PzpVfE5n7gGGSRrYjQDOzwapdySKA/5Q0R9KkVLZNRCwBSO9bp/JRwDO5fXtS2ZtImiRptqTZy5YtqzB0M7PBp129ofaNiMWStgZmSHq8QV3VKYs1CiKmkM29QVdX1xrbzcys/9qSLCJicXpfKulmYC/gOUkjI2JJus20NFXvAbbP7b4dsLilAZsNAO3spuwRb9d/Lb8NJemtaU5vJL0VOBR4BJgOTEzVJgK3puXpwEmpV9Q+wEu121VmZtYa7biy2Aa4WVLt+D+IiJ9JehC4QdIpwO+A41L924AjgW7gVeDk1odsZja4tTxZRMRTwHvrlP8ROLhOeQCntiA0MzPrQyd1nTUzsw7lZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoXaNUS5dZh2jkhqZp3PVxZmZlbIycLMzAo5WZiZWSEnCzMzK+QGbjOrXLs6UHg61+Zxsugg7pFkZp3Kt6HMzKyQk4WZmRVysjAzs0JuszCzAWswtgNW1ajvKwszMyvkZGFmZoWcLMzMrJCThZmZFVpvkoWkwyU9Ialb0uR2x2NmNpisF72hJA0BLgMOAXqAByVNj4hHqzjeYOxBYWbWyPpyZbEX0B0RT0XE/wDTgPFtjsnMbNBYL64sgFHAM7n1HmDvfAVJk4BJafVPkp5oUWxrYyvgD+0OokCnx9jp8YFjbIZOjw86NEZd+OfF/sT3zr42rC/JQnXK4k0rEVOAKa0Jp38kzY6IrnbH0Uinx9jp8YFjbIZOjw86P8Zmx7e+3IbqAbbPrW8HLG5TLGZmg876kiweBMZKGiNpY2ACML3NMZmZDRrrxW2oiFgp6TTgDmAIMDUiFrQ5rP7o6NtkSafH2OnxgWNshk6PDzo/xqbGp4gormVmZoPa+nIbyszM2sjJwszMCjlZNJmkLSXNkLQwvQ+vU+dASXNzr9ckHZ22XSXp6dy2ce2IMdVblYtjeq58jKT70/7Xp04HLY1P0jhJv5K0QNI8SR/NbavsHBYNOyNpk3ROutM5Gp3bdmYqf0LSYc2KaS3j+2dJj6ZzNlPSO3Pb6v6+2xDjxyUty8Xyidy2ienfxUJJE9sU3yW52H4j6cXctsrPoaSpkpZKeqSP7ZJ0aYp/nqTdc9v6f/4iwq8mvoD/B0xOy5OBCwvqbwk8D7wlrV8FHNsJMQJ/6qP8BmBCWv4O8I+tjg94FzA2LW8LLAGGVXkOyTpXPAnsCGwMPAzs3KvOPwHfScsTgOvT8s6p/ibAmPQ5Q9oQ34G5f2v/WIuv0e+7DTF+HPhWnX23BJ5K78PT8vBWx9er/mfIOty08hy+H9gdeKSP7UcCt5M9n7YPcH8zzp+vLJpvPHB1Wr4aOLqg/rHA7RHxaqVRvdnaxvhnkgQcBNzYn/1LKowvIn4TEQvT8mJgKTCiyXH0VmbYmXzsNwIHp3M2HpgWESsi4mmgO31eS+OLiFm5f2v3kT2z1ErrMnTPYcCMiHg+Il4AZgCHtzm+E4DrmhxDQxFxL9kXzL6MB66JzH3AMEkjWcfz52TRfNtExBKA9L51Qf0JrPmP7bx0+XiJpE3aGOOmkmZLuq92mwx4O/BiRKxM6z1kw7G0Iz4AJO1F9i3wyVxxFeew3rAzvX/2P9dJ5+glsnNWZt9WxJd3Ctk30Jp6v+9mKxvjR9Lv70ZJtQdyO+ocplt4Y4C7csWtOIdF+voZ1un8rRfPWXQaSXcC76iz6Utr+TkjgfeQPT9Scybwe7I/flOALwDntCnGHSJisaQdgbskzQderlNvrftfN/kcXgtMjIg3UnFTzmG9w9Up6/2z91WnzL7rqvQxJH0M6AL2zxWv8fuOiCfr7V9xjD8GrouIFZI+TXaldlDJfVsRX80E4MaIWJUra8U5LFLJv0Eni36IiA/0tU3Sc5JGRsSS9IdsaYOPOh64OSJez332krS4QtL3gM+3K8Z0e4eIeErS3cBuwE1kl7Ubpm/O/Rp6pRnxSdoC+Cnw5XS5XfvsppzDOsoMO1Or0yNpQ+BtZLcMWjFkTaljSPoAWVLePyJW1Mr7+H03+w9dYYwR8cfc6hVAbWi8HuCAXvve3er4ciYAp+YLWnQOi/T1M6zT+fNtqOabDtR6GUwEbm1Qd437nemPY61t4Gigbo+HqmOUNLx2+0bSVsC+wKORtZTNImtr6XP/FsS3MXAz2b3ZH/baVtU5LDPsTD72Y4G70jmbDkxQ1ltqDDAWeKBJcZWOT9JuwHeBoyJiaa687u+7yfGVjXFkbvUo4LG0fAdwaIp1OHAob74qb0l8Kca/JGsk/lWurFXnsMh04KTUK2of4KX0BWrdzl/VLfeD7UV2f3omsDC9b5nKu4B/z9UbDTwLbNBr/7uA+WR/4P4D2LwdMQLvS3E8nN5Pye2/I9kfum7gh8AmbYjvY8DrwNzca1zV55Csp8lvyL4tfimVnUP2xxdg03ROutM52jG375fSfk8AR1T0768ovjuB53LnbHrR77sNMZ4PLEixzAL+Krfv/0rnths4uR3xpfWzgQt67deSc0j2BXNJ+vffQ9b29Gng02m7yCaLezLF0dWM8+fhPszMrJBvQ5mZWSEnCzMzK+RkYWZmhZwszMyskJOFmZkVcrKwAUXSOyRNk/SkstFVb5P0rn58zm2ShlUU47aSbiyu+aZ9zkkP0yHpbkld67D/GZLesjb7m7nrrA0Y6SG8/wKujojvpLJxwNCI+Hlbg2ui9GTw5yNidsn6QyI3JIWkRWR97/9QTYQ2EPnKwgaSA4HXa4kCICLmRsTP09OsX5f0iKT5SvNfSBop6V5l8w88IulvU/kiSVtJGi3pMUlXKJs74z8lbZbq7CTpZ5LmSPq5pL/qHZCk/bV6foOHJA1Nn/lI2v5xSbdI+rGyOThOUzbnxEPKBqPbMtW7StKxdT7/cmUD1y2Q9LVc+SJJX5X0C+C42v6SPks2pPssSbMknSLpktx+n5R0cXN+HTaQOFnYQPLXwJw+tn0YGAe8F/gA8PU0rMSJwB0RUds2t86+Y4HLImIX4EXgI6l8CvCZiNiDbPypb9fZ9/PAqenz/xb47z7iPpFseOzzgFcjYjeyoSROavgTZ08YdwG7AvtL2jW37bWI2C8iptUKIuJSsnGCDoyIA8mG4D5K0kapysnA9wqOaYOQBxK0wWI/spFMVwHPSboH2JNsLKCp6Y/lLRFRL1k8nSufA4yWtDnZ8A4/zO5+AdnERr39ErhY0veBH0VET65+zayIWA4sl/QS2airkA3VsGvvyr0cL2kS2f/lkWSTLM1L264v2JeIeEXSXcDfSXoM2Cgi5hftZ4OPryxsIFkA7NHHtnrDMxPZRDLvJxun61pJ9b7Jr8gtryL7w7wB2bwe43Kvd9f5/AuATwCbAffVu1XV6/PfyK2/QYMvdGlAws8DB0fErmQj8G6aq/JKX/v28u9ks9P5qsL65GRhA8ldwCaSPlkrkLSnpP2Be4GPShoiaQRZgnhA2QQ2SyPiCuBKsukqC0XEy8DTko5Lx5Gk9/auJ2mniJgfERcCs4F6yaK/tiBLCC9J2gY4ouR+y4GhtZWIuJ9sSOsTafGsb7b+cLKwASOyrn3HAIekrrMLyEYHXUw2nPk8shFB7wL+b0T8nmx8/7mSHiJri/jmWhzy74FTJD1MdlVTb/rNM1LD+cNk7RW316nTLxHxMPBQOvZUslteZUwBbpc0K1d2A/DLyKbbNFuDu86aGZJ+AlwSETPbHYt1Jl9ZmA1ikoZJ+g3w304U1oivLMzMrJCvLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwK/X94hwsebRWmdwAAAABJRU5ErkJggg==\n", 230 | "text/plain": [ 231 | "
" 232 | ] 233 | }, 234 | "metadata": { 235 | "needs_background": "light" 236 | }, 237 | "output_type": "display_data" 238 | } 239 | ], 240 | "source": [ 241 | "similarity=similarity.flatten()\n", 242 | "print(similarity.shape)\n", 243 | "\n", 244 | "# cleanup self-compare and dup-compare\n", 245 | "s = similarity < 0.99\n", 246 | "s = np.unique(similarity[s])\n", 247 | "plt.xlabel('Cosine similarity')\n", 248 | "plt.ylabel('Number of relation pairs')\n", 249 | "plt.hist(s)\n", 250 | "plt.savefig('relation-sim.eps', format='eps')" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "## Pair-wise Relation Embedding Frobenius Similarity\n", 258 | "We calculate the pair-wise embedding similarity using L2 distance and output the top10 most similar pairs." 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 7, 264 | "metadata": {}, 265 | "outputs": [ 266 | { 267 | "data": { 268 | "text/plain": [ 269 | "[('GNBR::E::Compound:Gene', 'GNBR::K::Compound:Gene', 1.6984596),\n", 270 | " ('GNBR::E::Compound:Gene', 'GNBR::E+::Compound:Gene', 1.8601348),\n", 271 | " ('GNBR::N::Compound:Gene', 'GNBR::E-::Compound:Gene', 2.3698092),\n", 272 | " ('GNBR::E::Compound:Gene', 'GNBR::E-::Compound:Gene', 2.606229),\n", 273 | " ('GNBR::K::Compound:Gene', 'GNBR::E+::Compound:Gene', 2.9946468),\n", 274 | " ('GNBR::E+::Compound:Gene', 'GNBR::E-::Compound:Gene', 3.1560013),\n", 275 | " ('GNBR::L::Gene:Disease', 'GNBR::G::Gene:Disease', 3.4119256),\n", 276 | " ('GNBR::K::Compound:Gene', 'GNBR::E-::Compound:Gene', 3.454293),\n", 277 | " ('GNBR::J::Gene:Disease', 'GNBR::Md::Gene:Disease', 3.6071572),\n", 278 | " ('GNBR::J::Gene:Disease', 'GNBR::Te::Gene:Disease', 3.624401)]" 279 | ] 280 | }, 281 | "execution_count": 7, 282 | "metadata": {}, 283 | "output_type": "execute_result" 284 | } 285 | ], 286 | "source": [ 287 | "from sklearn.metrics.pairwise import euclidean_distances\n", 288 | "\n", 289 | "similarity = euclidean_distances(rel_emb)\n", 290 | "idx = np.argsort(similarity)\n", 291 | "\n", 292 | "min_pairs = []\n", 293 | "for i in range(idx.shape[0]):\n", 294 | " j = 1\n", 295 | " while (similarity[i][idx[i][j]] < 5):\n", 296 | " min_pairs.append((id2rel[idx[i][0]], id2rel[idx[i][j]], similarity[i][idx[i][j]]))\n", 297 | " j += 1\n", 298 | "\n", 299 | "def sort_score(pair):\n", 300 | " return pair[2]\n", 301 | "\n", 302 | "min_pairs.sort(key=sort_score)\n", 303 | "sim_pairs = []\n", 304 | "for i, pair in enumerate(min_pairs):\n", 305 | " if i % 2 == 0:\n", 306 | " sim_pairs.append(pair)\n", 307 | "\n", 308 | "sim_pairs[:10]" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "Then we draw a histogram of how the pair-wise distance score distributed." 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 8, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "(11449,)\n" 328 | ] 329 | }, 330 | { 331 | "data": { 332 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAeb0lEQVR4nO3dfZwcVZ3v8c+XAIoCEiSwMQkG2Ogu+BDDiCiKARR58BJgfQBXiMg16oULvNa9a0BXUJYLroK7rIoblqzAYhDFQNQoRK7CZS8hTCDkgYfNBKIMickovCCIogm/+0edhmLS3VUzmerumfm+X69+dfWpU1W/VDr9S9U5dY4iAjMzs2a2a3cAZmbW+ZwszMyskJOFmZkVcrIwM7NCThZmZlZo+3YHUJU99tgjJk+e3O4wzMyGjaVLl/4mIsbVWzdik8XkyZPp7u5udxhmZsOGpF82WufbUGZmVsjJwszMCjlZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVmjEPsFt1qkmz/5xW4679pJj23JcGxl8ZWFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMytUWbKQNEnSzyU9KGmVpLNT+e6SFkland7HpnJJulxSj6Tlkqbl9jUz1V8taWZVMZuZWX1VXllsBj4TEX8JHAycIWl/YDZwW0RMAW5LnwGOBqak1yzgCsiSC3A+8DbgIOD8WoIxM7PWqCxZRMT6iLg3LW8CHgQmADOAq1O1q4Hj0/IM4JrILAZ2kzQeeB+wKCKeiIgngUXAUVXFbWZmW2tJm4WkycBbgLuBvSJiPWQJBdgzVZsAPJbbrDeVNSqvd5xZkroldff19Q3lH8HMbFSrPFlI2hm4ETgnIp5uVrVOWTQp37owYk5EdEVE17hx4wYerJmZ1VVpspC0A1miuC4ifpCKN6TbS6T3jam8F5iU23wisK5JuZmZtUiVvaEEXAU8GBGX5VYtAGo9mmYCN+fKT029og4Gnkq3qW4BjpQ0NjVsH5nKzMysRaqcz+IQ4BRghaRlqew84BLgBkmnA78CPpjWLQSOAXqAZ4HTACLiCUkXAvekel+KiCcqjNvMzPqpLFlExJ3Ub28AOKJO/QDOaLCvucDcoYvOzMwGwk9wm5lZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVsjJwszMCjlZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVqjKmfLmStooaWWu7LuSlqXX2tqkSJImS/p9bt23ctscKGmFpB5Jl6cZ+MzMrIWqnCnv28DXgWtqBRHx4dqypEuBp3L110TE1Dr7uQKYBSwmm03vKOAnFcRrZmYNVHZlERF3AHWnP01XBx8C5jXbh6TxwK4RcVeaSe8a4PihjtXMzJprV5vFu4ANEbE6V7aPpPsk3S7pXalsAtCbq9ObyuqSNEtSt6Tuvr6+oY/azGyUaleyOJmXXlWsB/aOiLcAfwN8R9Ku1J/DOxrtNCLmRERXRHSNGzduSAM2MxvNqmyzqEvS9sCJwIG1soh4DnguLS+VtAZ4HdmVxMTc5hOBda2L1szMoMSVhaT9JL0sLU+XdJak3bbhmO8BHoqIF24vSRonaUxa3heYAjwSEeuBTZIOTu0cpwI3b8OxzcxsEMrchroR2CLpz4GrgH2A7xRtJGkecBfwekm9kk5Pq05i64btQ4Hlku4Hvg98KiJqjeOfBv4N6AHW4J5QZmYtV+Y21PMRsVnSCcA/RcS/SLqvaKOIOLlB+cfqlN1IlpTq1e8G3lAiTjMzq0iZK4s/SToZmAn8KJXtUF1IZmbWacoki9OAtwMXRcSjkvYB/qPasMzMrJM0vQ2VGp3Pi4iP1soi4lHgkqoDMzOzztH0yiIitgDjJO3YonjMzKwDlWngXgv8p6QFwO9qhRFxWVVBmZlZZymTLNal13bALtWGY2ZmnagwWUTEF1sRiJmZda6GyULSP0XEOZJ+SJ3xmCLiuEojMzOzjtHsyuLa9P7VVgRiZmadq2GyiIil6f321oVjZmadqLDNQtIU4GJgf+DltfKI2LfCuMzMrIOUeYL738mmNt0MHEY2W921TbcwM7MRpUyy2CkibgMUEb+MiAuAw6sNy8zMOkmZ5yz+IGk7YLWkM4HHgT2rDcvMzDpJmSuLc4BXAGeRzW53CtkItGZmNkoUJouIuCcingGeBs6KiBMjYnHRdpLmStooaWWu7AJJj0tall7H5NadK6lH0sOS3pcrPyqV9UiaPfA/opmZbasy06p2SVoBLAdWSLpf0oFF2wHfBo6qU/61iJiaXgvTMfYnm0HvgLTNNyWNSaPefgM4mqw31smprpmZtVCZNou5wP+IiP8LIOmdZD2k3tRso4i4Q9LkknHMAK6PiOeARyX1AAeldT0R8Ug69vWp7gMl92tmZkOgTJvFplqiAIiIO4FN23DMMyUtT7epxqayCcBjuTq9qaxRuZmZtVCZZLFE0r9Kmi7p3ZK+CfxC0jRJ0wZ4vCuA/YCpwHrg0lSuOnWjSXldkmZJ6pbU3dfXN8DQzMyskTK3oaam9/P7lb+D7Ie79DMXEbGhtizpSl6c07sXmJSrOpFsWHSalNfb/xxgDkBXV1fDpGJmZgNTZojyw4bqYJLGR8T69PEEoNZTagHwHUmXAa8BpgBLyK4spqR5vx8nawT/yFDFY2Zm5ZS5shgUSfOA6cAeknrJrkymS5pKdkWyFvgkQESsknQDWcP1ZuCMNKUr6UHAW4AxwNyIWFVVzGZmVl9lySIiTq5TfFWT+hcBF9UpXwgsHMLQzMxsgMo0cJuZ2ShX6spC0juAyfn6EXFNRTGZmVmHKTOfxbVk3V2XAVtScZANVW5mZqNAmSuLLmD/iHBXVDOzUapMm8VK4M+qDsTMzDpXmSuLPYAHJC0BnqsVRsRxlUVlZmYdpUyyuKDqIMzMrLOVeYL7dkl7AW9NRUsiYmO1YZmZWScpM5/Fh8iG3vgg8CHgbkkfqDowMzPrHGVuQ30OeGvtakLSOOBnwPerDMzMzDpHmd5Q2/W77fTbktuZmdkIUebK4qeSbgHmpc8fxmM1mZmNKmUauP+XpL8CDiEbMnxORMyvPDIzM+sYpcaGiogbgRsrjsXMzDpUw2Qh6c6IeKekTbx0KlMBERG7Vh6dmZl1hIbJIiLemd53aV04ZmbWico8Z3FtmbI6deZK2ihpZa7sK5IekrRc0nxJu6XyyZJ+L2lZen0rt82BklZI6pF0uSSV/+OZmdlQKNMF9oD8B0nbAweW2O7bwFH9yhYBb4iINwH/BZybW7cmIqam16dy5VcAs8jm5Z5SZ59mZlaxhslC0rmpveJNkp5Or03ABuDmoh1HxB3AE/3Kbo2IzenjYmBis31IGg/sGhF3pSHSrwGOLzq2mZkNrYbJIiIuTu0VX4mIXdNrl4h4dUSc22i7Afg48JPc530k3SfpdknvSmUTgN5cnd5UVpekWZK6JXX39fUNQYhmZgblnrM4V9JYsltAL8+V3zHYg0r6HLAZuC4VrQf2jojfSjoQuEnSAWQ9r7YKqUmsc4A5AF1dXZ6sycxsiJSZVvW/A2eT3TJaBhwM3AUcPpgDSpoJvB84ojb7XkQ8R5orIyKWSloDvI7sSiJ/q2oisG4wxzUzs8Er08B9Ntnw5L+MiMOAtwCDuscj6Sjgs8BxEfFsrnycpDFpeV+yq5hHImI9sEnSwakX1KmUaC8xM7OhVeYJ7j9ExB8kIellEfGQpNcXbSRpHjAd2ENSL3A+We+nlwGLUg/Yxann06HAlyRtBrYAn4qIWuP4p8l6Vu1E1saRb+cwM7MWKJMsetPzEDeR/cg/SYlbQRFxcp3iqxrUbTicSER0A28oEaeZmVWkTAP3CWnxAkk/B14F/LTSqMzMrKM0Gxtq9zrFK9L7zvR7hsLMzEauZlcWS8m6qea7r9Y+B7BvhXGZmVkHaTaQ4D6tDMTMzDpXmYEEJemjkv4+fd5b0kHVh2ZmZp2izHMW3wTeDnwkfd4EfKOyiMzMrOOU6Tr7toiYJuk+gIh4UtKOFcdlZmYdpMyVxZ/S09UB2dPWwPOVRmVmZh2lTLK4HJgP7CnpIuBO4H9XGpWZmXWUMg/lXSdpKXAEWbfZ4yPiwcojMzOzjtE0WUjaDlgeEW8AHmpNSGZm1mma3oaKiOeB+yXt3aJ4zMysA5XpDTUeWCVpCfC7WmFEHFdZVGZm1lHKJIsvVh6FmZl1tDIN3Le3IhAzM+tcZbrODpqkuZI2SlqZK9td0iJJq9P72FQuSZdL6pG0XNK03DYzU/3VaVpWMzNroUqTBdkMd0f1K5sN3BYRU4Db0meAo8mmU50CzAKugBeGSj8feBtwEHB+LcGYmVlrNEwWkm5L718e7M4j4g62nvdiBnB1Wr4aOD5Xfk1kFgO7SRoPvA9YFBFPRMSTwCK2TkBmZlahZm0W4yW9GzhO0vW8dF4LIuLeQR5zr4hYn/axXtKeqXwC8FiuXm8qa1S+FUmzyK5K2Htv9/Y1MxsqzZLFF8huEU0ELuu3LoDDhzgW1SnrP/lSvnzrwog5wByArq6uunXMzGzgmk1+9H3g+5L+PiIuHMJjbpA0Pl1VjAc2pvJeYFKu3kRgXSqf3q/8F0MYj5mZFShs4I6ICyUdJ+mr6fX+bTzmAqDWo2kmcHOu/NTUK+pg4Kl0u+oW4EhJY1PD9pGpzMzMWqTwOQtJF5P1QrouFZ0t6ZCIOLfEtvPIrgr2kNRL1qvpEuAGSacDvwI+mKovBI4BeoBngdMAIuIJSRcC96R6X4qI/o3mZmZWoTJPcB8LTE3jRCHpauA+oDBZRMTJDVYdUaduAGc02M9cYG6JWM3MrAJln7PYLbf8qioCMTOzzlXmyuJi4D5JPyfrmXQoJa4qzMxs5CgzNtQ8Sb8A3kqWLD4bEb+uOjAzM+scZa4sSL2SFlQci5mZdaiqx4YyM7MRwMnCzMwKNU0WkrbLDy9uZmajk+fgNjOzQp6D28zMCnkObjMzK1RqDm5JrwWmRMTPJL0CGFN9aGZm1inKDCT4CbIJhXYH9iObeOhb1BnfyWw4mTz7x+0OwWzYKNN19gzgEOBpgIhYDezZdAszMxtRyiSL5yLij7UPkranwUx1ZmY2MpVJFrdLOg/YSdJ7ge8BP6w2LDMz6yRlksVsoA9YAXySbJKiz1cZlJmZdZYyvaGeTxMe3U12++nhNFHRoEh6PfDdXNG+wBfI5sz4BFliAjgvIhambc4FTge2AGdFhKdVNTNroTK9oY4l6/20hmyI8n0kfTIifjKYA0bEw8DUtO8xwOPAfLJpVL8WEV/td/z9gZOAA4DXAD+T9LqI2DKY45uZ2cCVeSjvUuCwiOgBkLQf8GNgUMminyOANRHxS0mN6swAro+I54BHJfWQzQl+1xAc38zMSijTZrGxliiSR4CNQ3T8k4B5uc9nSlouaa6ksalsAvBYrk5vKtuKpFmSuiV19/X11atiZmaD0DBZSDpR0olk40ItlPQxSTPJekLds60HlrQjcBxZ7yqAK8ge+psKrCe7ooHs1ld/ddtMImJORHRFRNe4ceO2NUQzM0ua3Yb6b7nlDcC703IfMHbr6gN2NHBvRGwAqL0DSLoS+FH62AtMym03EVg3BMc3M7OSGiaLiDit4mOfTO4WlKTxafpWgBOA2jwaC4DvSLqMrIF7CrCk4tjMbAi1a2iVtZcc25bjjkRlekPtA/xPYHK+/rYMUZ4GI3wv2XMbNf8oaSrZLaa1tXURsUrSDcADwGbgDPeEMjNrrTK9oW4CriJrq3h+KA4aEc8Cr+5XdkqT+hcBFw3Fsc3MbODKJIs/RMTllUdiZmYdq0yy+GdJ5wO3As/VCiPi3sqiMrMh5yHZbVuUSRZvBE4BDufF21CRPpuZ2ShQJlmcAOybH6bczMxGlzJPcN9PNsifmZmNUmWuLPYCHpJ0Dy9tsxh011kzMxteyiSL8yuPwszMOlqZ+Sxub0UgZmbWuco8wb2JFwfu2xHYAfhdROxaZWBmZtY5ylxZ7JL/LOl4svkkzMxslCjTG+olIuIm/IyFmdmoUuY21Im5j9sBXTSYT8LMzEamMr2h8vNabCYbEXZGJdGYmVlHKtNmUfW8FmZm1uEaJgtJX2iyXUTEhRXEY2ZmHahZA/fv6rwATgc+u60HlrRW0gpJyyR1p7LdJS2StDq9j03lknS5pB5JyyVN29bjm5lZeQ2TRURcWnsBc4CdgNOA64F9h+j4h0XE1IjoSp9nA7dFxBTgtvQZsvm6p6TXLOCKITq+mZmV0LTrbPqf/j8Ay8luWU2LiM9GxMaK4pkBXJ2WrwaOz5VfE5nFwG6SxlcUg5mZ9dMwWUj6CnAPsAl4Y0RcEBFPDuGxA7hV0lJJs1LZXhGxHiC975nKJwCP5bbtTWX9Y54lqVtSd19f3xCGamY2ujXrDfUZslFmPw98TlKtXGQN3Ns63MchEbFO0p7AIkkPNamrOmVbPesREXPIbpnR1dXlZ0HMRrl2zg649pJj23bsKjRMFhEx4Ke7ByIi1qX3jZLmkw0hskHS+IhYn24z1W539QKTcptPBNZVGZ+Zmb2o0oTQiKRXStqltgwcCawEFgAzU7WZwM1peQFwauoVdTDwVO12lZmZVa/ME9xV2AuYn25tbQ98JyJ+miZYukHS6cCvgA+m+guBY4Ae4FmyXllmZtYibUkWEfEI8OY65b8FjqhTHsAZLQjNzMzqaMttKDMzG16cLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoXaNdyHdZh2jc450kbmNBupfGVhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVqjlyULSJEk/l/SgpFWSzk7lF0h6XNKy9Domt825knokPSzpfa2O2cxstGtH19nNwGci4t40tepSSYvSuq9FxFfzlSXtD5wEHAC8BviZpNdFxJaWRm1mNoq1PFmkubPXp+VNkh4EJjTZZAZwfUQ8BzwqqQc4CLir8mCtcu16vsPMBqatbRaSJgNvAe5ORWdKWi5prqSxqWwC8Fhus14aJBdJsyR1S+ru6+urKGozs9GnbclC0s7AjcA5EfE0cAWwHzCV7Mrj0lrVOptHvX1GxJyI6IqIrnHjxlUQtZnZ6NSWZCFpB7JEcV1E/AAgIjZExJaIeB64kuxWE2RXEpNym08E1rUyXjOz0a4dvaEEXAU8GBGX5crH56qdAKxMywuAkyS9TNI+wBRgSaviNTOz9vSGOgQ4BVghaVkqOw84WdJUsltMa4FPAkTEKkk3AA+Q9aQ6wz2hzMxaqx29oe6kfjvEwibbXARcVFlQZmbWlJ/gNjOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrFA7HsozMxvx2jWi8tpLjq1kv76yMDOzQk4WZmZWyLehOognAjKzTuUrCzMzK+RkYWZmhZwszMyskJOFmZkVcrIwM7NCwyZZSDpK0sOSeiTNbnc8ZmajybBIFpLGAN8Ajgb2J5uCdf/2RmVmNnoMl+csDgJ6IuIRAEnXAzPI5uUecn7ewczspYZLspgAPJb73Au8rX8lSbOAWenjM5IebkFsNXsAv2nh8QZrOMQ5HGKE4RHncIgRhkecwyFG9OVtivO1jVYMl2ShOmWxVUHEHGBO9eFsTVJ3RHS149gDMRziHA4xwvCIczjECMMjzuEQI1QX57BosyC7kpiU+zwRWNemWMzMRp3hkizuAaZI2kfSjsBJwII2x2RmNmoMi9tQEbFZ0pnALcAYYG5ErGpzWP215fbXIAyHOIdDjDA84hwOMcLwiHM4xAgVxamIrW79m5mZvcRwuQ1lZmZt5GRhZmaFnCwGQNIkST+X9KCkVZLOrlNnuqSnJC1Lry+0Kda1klakGLrrrJeky9PwKcslTWtxfK/PnaNlkp6WdE6/Om05l5LmStooaWWubHdJiyStTu9jG2w7M9VZLWlmi2P8iqSH0t/nfEm7Ndi26XejBXFeIOnx3N/rMQ22bckQPw1i/G4uvrWSljXYtpXnsu7vT8u+mxHhV8kXMB6YlpZ3Af4L2L9fnenAjzog1rXAHk3WHwP8hOwZloOBu9sY6xjg18BrO+FcAocC04CVubJ/BGan5dnAl+tstzvwSHofm5bHtjDGI4Ht0/KX68VY5rvRgjgvAP62xHdiDbAvsCNwf/9/a1XG2G/9pcAXOuBc1v39adV301cWAxAR6yPi3rS8CXiQ7Ony4WgGcE1kFgO7SRrfpliOANZExC/bdPyXiIg7gCf6Fc8Ark7LVwPH19n0fcCiiHgiIp4EFgFHtSrGiLg1Ijanj4vJnkdqqwbnsowXhviJiD8CtSF+hlyzGCUJ+BAwr4pjD0ST35+WfDedLAZJ0mTgLcDddVa/XdL9kn4i6YCWBvaiAG6VtDQNg9JfvSFU2pX4TqLxP8ZOOJcAe0XEesj+0QJ71qnTSef042RXjvUUfTda4cx0u2xug9smnXIu3wVsiIjVDda35Vz2+/1pyXfTyWIQJO0M3AicExFP91t9L9ntlDcD/wLc1Or4kkMiYhrZSL1nSDq03/pSQ6hULT1keRzwvTqrO+VcltUp5/RzwGbgugZVir4bVbsC2A+YCqwnu83TX0ecS+Bkml9VtPxcFvz+NNysTtmAzqeTxQBJ2oHsL+q6iPhB//UR8XREPJOWFwI7SNqjxWESEevS+0ZgPtllfV6nDKFyNHBvRGzov6JTzmWyoXabLr1vrFOn7ec0NVy+H/jrSDer+yvx3ahURGyIiC0R8TxwZYPjd8K53B44EfhuozqtPpcNfn9a8t10shiAdP/yKuDBiLisQZ0/S/WQdBDZOf5t66IESa+UtEttmazhc2W/aguAU1OvqIOBp2qXsi3W8H9unXAucxYAtR4kM4Gb69S5BThS0th0a+XIVNYSko4CPgscFxHPNqhT5rtRqX5tYyc0OH4nDPHzHuChiOitt7LV57LJ709rvputaMUfKS/gnWSXbsuBZel1DPAp4FOpzpnAKrLeG4uBd7Qhzn3T8e9PsXwulefjFNmEUmuAFUBXG+J8BdmP/6tyZW0/l2TJaz3wJ7L/kZ0OvBq4DVid3ndPdbuAf8tt+3GgJ71Oa3GMPWT3pWvfzW+luq8BFjb7brQ4zmvTd2452Q/d+P5xps/HkPX4WVNlnPViTOXfrn0Xc3XbeS4b/f605Lvp4T7MzKyQb0OZmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKysGFJ0ha9dNTayQPYdrqkHw1RHMcNxYiokn6RRlhdrmzk2K8rN2qspP9XsP152xqDWTPuOmvDkqRnImLnJuu3jxcH1eu/bjrZqKfvryq+gZL0C7KYutNDaBeTPfvy7pLbNz0fZtvKVxY2Ykj6mKTvSfoh2eBuUjbHw8o058CHc9V3VTbnwwOSviVpu7SPIyXdJenetK+dU/laSV9M5Ssk/UXumF9Py9+W9IFcPM+k9/GS7khXQCslvavZnyOyUVb/Dthb0puL9iXpEmCnVHZdqndTGtxuVX6AO0nPSLpI2eCMiyXtlcr3Sufj/vR6Ryr/qKQlad//KmnMNvwV2TDmZGHDVe3HcZmk+bnytwMzI+JwsnF9pgJvJhu64Su5oSYOAj4DvJFsULsTlY079XngPZENDtcN/E1u379J5VcAfzuAWD8C3BIRtVjqTqSTFxFbyJ4M/ouifUXEbOD3ETE1Iv461ft4RBxI9hTvWZJencpfCSyObHDGO4BPpPLLgdtT+TRglaS/BD5MNljeVGALUNu/jTLbtzsAs0H6ffoB629RRNTmJngnMC/98G6QdDvwVuBpYElEPAIgaV6q+weyyWT+Mw1JtSNwV27ftYHblpIlorLuAeYqGwTupogoTBZJvZFCy+7rLEknpOVJwBSyoVX+CNTaa5YC703LhwOnwguJ6ilJpwAHAvek87ET9Qeps1HAVxY20vwut1zvx7amf2NdpPqL0v/Qp0bE/hFxeq7Oc+l9C/X/o7WZ9G8qDfq2I7wwuc6hwOPAtZJOLfpDpNs9bySb4ObFIEvsK7XJvAd4e7pSuA94eVr9p3ixobLRn+OFXQFX587H6yPigqLYbWRysrCR7A7gw5LGSBpH9iO7JK07KI1ouh3ZrZY7yQYrPETSnwNIeoWk1w3geGvJ/icO2exlO6T9vBbYGBFXko0a2nS+83TVcDHwWEQs77eu0b7+lLYDeBXwZEQ8m9pWDi4R+23Ap9MxxkjaNZV9QNKeqXz3dHwbhZwsbCSbTzZC5/3A/wH+LiJ+ndbdBVxCNqT0o8D8iOgDPgbMk7ScLHn0bzNo5krg3ZKWAG/jxauc6cAySfcBfwX8c4Ptr0vHXUnWtlBvGtFG+5oDLE8N3D8Ftk/7ujD9OYqcDRwmaQXZ7akDIuIBsjacW9O+FpHNA22jkLvOmplZIV9ZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVuj/A0Kq1zSCggDyAAAAAElFTkSuQmCC\n", 333 | "text/plain": [ 334 | "
" 335 | ] 336 | }, 337 | "metadata": { 338 | "needs_background": "light" 339 | }, 340 | "output_type": "display_data" 341 | } 342 | ], 343 | "source": [ 344 | "similarity=similarity.flatten()\n", 345 | "print(similarity.shape)\n", 346 | "\n", 347 | "# cleanup self-compare and dup-compare\n", 348 | "s = similarity > 0\n", 349 | "s = np.unique(similarity[s])\n", 350 | "plt.xlabel('Frobenius Distance')\n", 351 | "plt.ylabel('Number of relation pairs')\n", 352 | "plt.hist(s)\n", 353 | "plt.show()" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.7.6" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 2 385 | } 386 | -------------------------------------------------------------------------------- /embedding_analysis/Train_embeddings.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Training DRKG Using TransE_L2\n", 8 | "This notebook shows how to train DRKG embeddings using TransE_L2\n", 9 | "\n", 10 | "Before training the model, you need to download the original DRKG source file into your local storage, e.g., ./data/drkg.tsv" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Install DGL-KE\n", 18 | "Before training the model, we need to install dgl and dgl-ke packages as well as other dependencies. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "!pip3 install torch\n", 28 | "!pip3 install dgl==0.4.3post2 \n", 29 | "!pip3 install dglke" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Prepare train/valid/test set\n", 37 | "Before training, we need to split the original drkg into train/valid/test set with a 9:0.5:0.5 manner." 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "import pandas as pd\n", 47 | "import numpy as np\n", 48 | "import sys\n", 49 | "sys.path.insert(1, '../utils')\n", 50 | "from utils import download_and_extract\n", 51 | "download_and_extract()\n", 52 | "drkg_file = '../data/drkg/drkg.tsv'\n", 53 | "\n", 54 | "df = pd.read_csv(drkg_file, sep=\"\\t\", header=None)\n", 55 | "triples = df.values.tolist()" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "We get 5,869,293 triples, now we will split them into three files" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "num_triples = len(triples)\n", 72 | "num_triples" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "# Please make sure the output directory exist.\n", 82 | "seed = np.arange(num_triples)\n", 83 | "np.random.shuffle(seed)\n", 84 | "\n", 85 | "train_cnt = int(num_triples * 0.9)\n", 86 | "valid_cnt = int(num_triples * 0.05)\n", 87 | "train_set = seed[:train_cnt]\n", 88 | "train_set = train_set.tolist()\n", 89 | "valid_set = seed[train_cnt:train_cnt+valid_cnt].tolist()\n", 90 | "test_set = seed[train_cnt+valid_cnt:].tolist()\n", 91 | "\n", 92 | "with open(\"train/drkg_train.tsv\", 'w+') as f:\n", 93 | " for idx in train_set:\n", 94 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))\n", 95 | " \n", 96 | "with open(\"train/drkg_valid.tsv\", 'w+') as f:\n", 97 | " for idx in valid_set:\n", 98 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))\n", 99 | "\n", 100 | "with open(\"train/drkg_test.tsv\", 'w+') as f:\n", 101 | " for idx in test_set:\n", 102 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "## Training TransE_l2 model\n", 110 | "We can training the TransE_l2 model by simplying using DGL-KE command line. For more information about using DGL-KE please refer to https://github.com/awslabs/dgl-ke.\n", 111 | "\n", 112 | "Here we train the model using 8 GPUs on an AWS p3.16xlarge instance." 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./train --data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' --model_name TransE_l2 --batch_size 2048 \\\n", 122 | "--neg_sample_size 256 --hidden_dim 400 --gamma 12.0 --lr 0.1 --max_step 100000 --log_interval 1000 --batch_size_eval 16 -adv --regularization_coef 1.00E-07 --test --num_thread 1 --gpu 0 1 2 3 4 5 6 7 --num_proc 8 --neg_sample_size_eval 10000 --async_update" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "## Get Entity and Relation Embeddings\n", 130 | "The resulting model, i.e., the entity and relation embeddings can be found under ./ckpts. (Please refer to the first line of the training log for the specific location.)\n", 131 | "\n", 132 | "The overall process will generate 4 important files:\n", 133 | "\n", 134 | " - Entity embedding: ./ckpts/__/xxx\\_entity.npy\n", 135 | " - Relation embedding: ./ckpts/__/xxx\\_relation.npy\n", 136 | " - The entity id mapping, formated in pair: /entities.tsv\n", 137 | " - The relation id mapping, formated in pair: /relations.tsv" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "!ls ./ckpts/TransE_l2_DRKG_0/\n", 147 | "!ls ./train/" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "## A Glance of the Entity and Relation Embeddings" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "node_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_entity.npy')\n", 164 | "relation_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_relation.npy')\n", 165 | "\n", 166 | "print(node_emb.shape)\n", 167 | "print(relation_emb.shape)" 168 | ] 169 | } 170 | ], 171 | "metadata": { 172 | "kernelspec": { 173 | "display_name": "Python 3", 174 | "language": "python", 175 | "name": "python3" 176 | }, 177 | "language_info": { 178 | "codemirror_mode": { 179 | "name": "ipython", 180 | "version": 3 181 | }, 182 | "file_extension": ".py", 183 | "mimetype": "text/x-python", 184 | "name": "python", 185 | "nbconvert_exporter": "python", 186 | "pygments_lexer": "ipython3", 187 | "version": "3.6.9" 188 | } 189 | }, 190 | "nbformat": 4, 191 | "nbformat_minor": 2 192 | } 193 | -------------------------------------------------------------------------------- /licenses/Readme.md: -------------------------------------------------------------------------------- 1 | # Source licenses 2 | The following table provides the licenses for resources contributing nodes or edges to the network. Source indicates the date when and location where the license information was retrieved. See following table for more details. For Bibliography extracted data, it don't provide any license, but we have requested permission for Non-Commercial usage. 3 | 4 | | | Resource | License | Source | 5 | | --- | -------- | ------- | ------ | 6 | | 1 | DrugBank| [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [drugbank](https://www.drugbank.ca/) | 7 | | 2 | GNBR | [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [GNBR](https://zenodo.org/record/1134693#.XqZLZpMza8U) | 8 | | 3 | Hetionet | [Mixed License](https://github.com/hetio/hetionet#License) | [Hetionet](https://het.io/) | 9 | | 4 | String | [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [String](https://string-db.org/) | 10 | | 5 | IntAct | Apache License, Version 2.0 | [IntAct](https://www.ebi.ac.uk/intact/) | 11 | | 6 | DGIdb | [Mixed License](http://www.dgidb.org/downloads) | [DGIdb](http://www.dgidb.org/) | 12 | | 7 | Bibliography extracted data | Non-Commercial | | 13 | -------------------------------------------------------------------------------- /raw_graph_analysis/Readme.md: -------------------------------------------------------------------------------- 1 | # Basic Graph Analysis of DRKG 2 | Here we evaluate the structural similarity among a pair of relation types by computing their Jaccard similarity coefficient and the overlap among the two edge types via the overlap coeffcient in [Jaccard_scores_among_all_edge_types_in_DRKG.ipynb](Jaccard_scores_among_all_edge_types_in_DRKG.ipynb) -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tarfile 3 | 4 | def download_and_extract(): 5 | import shutil 6 | import requests 7 | 8 | url = "https://s3.us-west-2.amazonaws.com/dgl-data/dataset/DRKG/drkg.tar.gz" 9 | path = "../data/" 10 | filename = "drkg.tar.gz" 11 | fn = os.path.join(path, filename) 12 | if os.path.exists("../data/drkg/drkg.tsv"): 13 | return 14 | 15 | opener, mode = tarfile.open, 'r:gz' 16 | os.makedirs(path, exist_ok=True) 17 | cwd = os.getcwd() 18 | os.chdir(path) 19 | while True: 20 | try: 21 | file = opener(filename, mode) 22 | try: file.extractall() 23 | finally: file.close() 24 | break 25 | except Exception: 26 | f_remote = requests.get(url, stream=True) 27 | sz = f_remote.headers.get('content-length') 28 | assert f_remote.status_code == 200, 'fail to open {}'.format(url) 29 | with open(filename, 'wb') as writer: 30 | for chunk in f_remote.iter_content(chunk_size=1024*1024): 31 | writer.write(chunk) 32 | print('Download finished. Unzipping the file...') 33 | os.chdir(cwd) 34 | --------------------------------------------------------------------------------