├── .gitignore
├── DRKG Drug Repurposing Knowledge Graph.pdf
├── LICENSE
├── Readme.md
├── connectivity.png
├── drkg_with_dgl
├── .ipynb_checkpoints
│ └── loading_drkg_in_dgl-checkpoint.ipynb
├── Readme.md
└── loading_drkg_in_dgl.ipynb
├── drug_repurpose
├── COVID-19_drug_repurposing.ipynb
├── COVID-19_drug_repurposing_via_genes.ipynb
├── COVID19_clinical_trial_drugs.tsv
├── Readme.md
├── coronavirus-related-host-genes.tsv
├── covid19-host-genes.tsv
└── infer_drug.tsv
├── drugbank_info
├── README.md
├── drugbank_biotech.txt
├── drugbank_small_molecule.txt
├── drugbank_smiles.txt
└── drugbank_weight.txt
├── embedding_analysis
├── Edge_score_analysis.ipynb
├── Edge_similarity_based_on_link_recommendation_results.ipynb
├── Entity_similarity_analysis.ipynb
├── Readme.md
├── Relation_similarity_analysis.ipynb
└── Train_embeddings.ipynb
├── licenses
└── Readme.md
├── raw_graph_analysis
├── Jaccard_scores_among_all_edge_types_in_DRKG.ipynb
└── Readme.md
└── utils
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
--------------------------------------------------------------------------------
/DRKG Drug Repurposing Knowledge Graph.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gnn4dr/DRKG/d4bb1974312013c4bd79a13e42c1d9492033f8c7/DRKG Drug Repurposing Knowledge Graph.pdf
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
1 | # Drug Repurposing Knowledge Graph (DRKG)
2 | Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms. DRKG includes information from six existing databases including DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and data collected from recent publications particularly related to Covid19. It includes 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. These 107 edge-types show a type of interaction between one of the 17 entity-type pairs (multiple types of interactions are possible between the same entity-pair), as depicted in the figure below. It also includes a bunch of notebooks about how to explore and analysis the DRKG using statistical methodologies or using machine learning methodologies such as knowledge graph embedding.
3 |
4 |
5 |
6 |
7 |
8 | Figure : Interactions in the DRKG. The number next to an edge indicates the number of relation-types for that entity-pair in DRKG.
9 |
10 |
11 | ## Statistics of DRKG
12 | The type-wise distribution of the entities in DRKG and their original data-source(s) is shown in following table.
13 |
14 | | Entity type | Drugbank | GNBR | Hetionet | STRING | IntAct | DGIdb | Bibliography | Total Entities |
15 | |:--------------------|---------:|------:|---------:|-------:|-------:|------:|-------------:|---------------:|
16 | | Anatomy | \- | \- | 400 | \- | \- | \- | \- | 400 |
17 | | Atc | 4,048 | \- | \- | \- | \- | \- | \- | 4,048 |
18 | | Biological Process | \- | \- | 11,381 | \- | \- | \- | \- | 11,381 |
19 | | Cellular Component | \- | \- | 1,391 | \- | \- | \- | \- | 1,391 |
20 | | Compound | 9,708 | 11,961 | 1,538 | \- | 153 | 6,348 | 6,250 | 24,313 |
21 | | Disease | 1,182 | 4,746 | 257 | \- | \- | \- | 33 | 5,103 |
22 | | Gene | 4,973 | 27,111 | 19,145 | 18,316 | 16,321 | 2,551 | 3,181 | 39,220 |
23 | | Molecular Function | \- | \- | 2,884 | \- | \- | \- | \- | 2,884 |
24 | | Pathway | \- | \- | 1,822 | \- | \- | \- | \- | 1,822 |
25 | | Pharmacologic Class | \- | \- | 345 | \- | \- | \- | \- | 345 |
26 | | Side Effect | \- | \- | 5,701 | \- | \- | \- | \- | 5,701 |
27 | | Symptom | \- | \- | 415 | \- | \- | \- | \- | 415 |
28 | | Tax | \- | 215 | \- | \- | \- | \- | \- | 215 |
29 | | Total | 19,911 | 44,033 | 45,279 | 18,316 | 16,474 | 8,899 | 9,464 | 97,238 |
30 |
31 |
32 | The following table shows the number of triplets between different entity-type pairs in DRKG for DRKG and various datasources.
33 |
34 | | Entity\-type pair | Drugbank | GNBR | Hetionet | STRING | IntAct | DGIdb | Bibliography | Total interactions |
35 | |:--------------------------------------|-----------:|-------:|---------:|--------:|-------:|------:|-------------:|-------------------:|
36 | | \(Gene, Gene\) | \- | 66,722 | 474,526 | 1,496,708 | 254,346 | \- | 58,629 | 2,350,931 |
37 | | \(Compound, Gene\) | 24,801 | 80,803 | 51,429 | \- | 1,805 | 26,290 | 25,666 | 210,794 |
38 | | \(Disease, Gene\) | \- | 95,399 | 27,977 | \- | \- | \- | 461 | 123,837 |
39 | | \(Atc, Compound\) | 15,750 | \- | \- | \- | \- | \- | \- | 15,750 |
40 | | \(Compound, Compound\) | 1,379,271 | \- | 6,486 | \- | \- | \- | \- | 1,385,757 |
41 | | \(Compound, Disease\) | 4,968 | 77,782 | 1,145 | \- | \- | \- | \- | 83,895 |
42 | | \(Gene, Tax\) | \- | 14,663 | \- | \- | \- | \- | \- | 14,663 |
43 | | \(Biological Process, Gene\) | \- | \- | 559,504 | \- | \- | \- | \- | 559,504 |
44 | | \(Disease, Symptom\) | \- | \- | 3,357 | \- | \- | \- | \- | 3,357 |
45 | | \(Anatomy, Disease\) | \- | \- | 3,602 | \- | \- | \- | \- | 3,602 |
46 | | \(Disease, Disease\) | \- | \- | 543 | \- | \- | \- | \- | 543 |
47 | | \(Anatomy, Gene\) | \- | \- | 726,495 | \- | \- | \- | \- | 726,495 |
48 | | \(Gene, Molecular Function\) | \- | \- | 97,222 | \- | \- | \- | \- | 97,222 |
49 | | \(Compound, Pharmacologic Class\) | \- | \- | 1,029 | \- | \- | \- | \- | 1,029 |
50 | | \(Cellular Component, Gene\) | \- | \- | 73,566 | \- | \- | \- | \- | 73,566 |
51 | | \(Gene, Pathway\) | \- | \- | 84,372 | \- | \- | \- | \- | 84,372 |
52 | | \(Compound, Side Effect\) | \- | \- | 138,944 | \- | \- | \- | \- | 138,944 |
53 | | Total | 1,424,790 | 335,369 | 2,250,197 | 1,496,708 | 256,151 | 26,290 | 84,756 | 5,874,261 |
54 |
55 |
56 | ## Download DRKG
57 | To analyze DRKG, you can directly download drkg by following commands:
58 | ```
59 | wget https://dgl-data.s3-us-west-2.amazonaws.com/dataset/DRKG/drkg.tar.gz
60 | ```
61 | If you use our notebooks provided in this repository, you don't need to download the file manually. The notebooks can automatically download the file for you.
62 |
63 | When you untar `drkg.tar.gz`, you will see the following files:
64 |
65 | ```
66 | ./drkg.tsv
67 | ./entity2src.tsv
68 | ./relation_glossary.tsv
69 | ./embed
70 | ./embed/DRKG_TransE_l2_relation.npy
71 | ./embed/relations.tsv
72 | ./embed/entities.tsv
73 | ./embed/Readme.md
74 | ./embed/DRKG_TransE_l2_entity.npy
75 | ./embed/mol_contextpred.npy
76 | ./embed/mol_masking.npy
77 | ./embed/mol_infomax.npy
78 | ./embed/mol_edgepred.npy
79 | ```
80 |
81 | ### DRKG dataset
82 | The whole dataset contains four part:
83 | - drkg.tsv, a tsv file containing the original drkg in the format of (h, r, t) triplets.
84 | - embed, a folder containing the pretrained Knowledge Graph Embedding using the entire drkg.tsv as the training set and pretrained GNN-based molecule embeddings from [molecule SMILES](./drugbank_info/drugbank_smiles.txt)
85 | - entity2src.tsv, a file mapping entities in drkg to their original sources.
86 | - relation_glossary.tsv, a file containing rge glossary of the relations in DRKG, and other associated information with sources (if available).
87 |
88 | ### Pretrained DRKG embedding
89 | The DRKG mebedding is trained using TransE\_l2 model with dimention size of 400, there are four files:
90 |
91 | - DRKG\_TransE\_l2\_entity.npy, NumPy binary data, storing the entity embedding
92 | - DRKG\_TransE\_l2\_relation.npy, NumPy binary data, storing the relation embedding
93 | - entities.tsv, mapping from entity\_name to tentity\_id.
94 | - relations.tsv, mapping from relation\_name to relation\_id
95 |
96 | To use the pretrained embedding, one can use np.load to load the entity embeddings and relation embeddings separately:
97 |
98 | ```
99 | import numpy as np
100 | entity_emb = np.load('./embed/DRKG_TransE_l2_entity.npy')
101 | rel_emb = np.load('./embed/DRKG_TransE_l2_relation.npy')
102 | ```
103 |
104 | ### Pretrained Molecule Embedding
105 |
106 | We also provide molecule embeddings for most small-molecule drugs in DrugBank using pre-trained GNNs. In particular,
107 | [Strategies for Pre-training Graph Neural Networks](https://arxiv.org/abs/1905.12265) develops multiple approaches for
108 | pre-training GNN-based molecular representations, combining supervised molecular property prediction with
109 | self-supervised learning approaches. We employ their method to compute four variants of molecule embeddings
110 | using [DGL-LifeSci](https://github.com/awslabs/dgl-lifesci/tree/master/examples/molecule_embeddings).
111 |
112 | - `mol_contextpred.npy`: From a model pre-trained to predict surrounding graph structures of molecular subgraphs
113 | - `mol_infomax.npy`: From a model pre-trained to maximize the mutual information between local node representations
114 | and a global graph representation
115 | - `mol_edgepred.npy`: From a model pre-trained to encourage nearby nodes to have similar representations and enforcing
116 | disparate notes to have distinct representations
117 | - `mol_masking.npy`: From a model pre-trained to predict randomly masked node and edge attributes
118 |
119 | ## Tools to analyze DRKG
120 | We analyze DRKG with some deep learning frameworks, including [DGL](https://github.com/dmlc/dgl) (a framework for graph neural networks) and [DGL-KE](https://github.com/awslabs/dgl-ke) (a library for computing knowledge graph embeddings). Please follow the instructions below to install the deep learning frameworks.
121 |
122 | ### Install PyTorch
123 | Currently all notebooks use PyTorch as Deep Learning backend. For install other version of pytorch please goto [Install PyTorch](https://pytorch.org/)
124 | ```
125 | sudo pip3 install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
126 | ```
127 |
128 | ### Install DGL
129 | Please install [DGL](https://www.dgl.ai/) (a framework for graph neural networks) with the following command. It installs DGL with CUDA support.
130 | ```
131 | sudo pip3 install dgl-cu101
132 | ```
133 | For installing other versions of DGL, please go to [Install DGL](https://docs.dgl.ai/en/latest/install/index.html)
134 |
135 | ### Install DGL-KE
136 | If you want to training the model with notebooks (e.g., using Train_embeddings.ipynb or Edge_score_analysis.ipynb) at [knowledge-graph-embedding-based-analysis-of-drkg], you need to install both DGL and [DGL-KE](https://github.com/awslabs/dgl-ke) package here.
137 | DGL-KE can work with DGL >= 0.4.3 (either CPU or GPU)
138 | ```
139 | sudo pip3 install dglke
140 | ```
141 |
142 | ## Notebooks for analyzing DRKG
143 | We provide a set of notebooks to analyze DRKG. Some of the notebooks use the tools installed in the previous section.
144 |
145 | ### Basic Graph Analysis of DRKG
146 | To evaluate the structural similarity among a pair of relation types we compute their Jaccard similarity coefficient and the overlap among the two edge types via the overlap coeffcient. This analysis is given in
147 | - [Jaccard_scores_among_all_edge_types_in_DRKG.ipynb](raw_graph_analysis/Jaccard_scores_among_all_edge_types_in_DRKG.ipynb)
148 |
149 | ### Knowledge Graph Embedding Based Analysis of DRKG
150 | We analyze the extracted DRKG by learning a TransE KGE model that utilizes the  distance. As DRKG combines information from different data sources, we want to verify that meaningful entity and relation embeddings can be generated using knowledge graph embedding technology.
151 |
152 | We split the edge triplets in training, validation and test sets as follows 90%, 5%, and 5% and train the KGE model as shown in following notebook:
153 | - [Train_embeddings.ipynb](embedding_analysis/Train_embeddings.ipynb)
154 |
155 | Finally, we obtain the entity and relation embeddings for the DRKG. We can do various embedding based analysis as provided in the following notebooks:
156 | - [Relation_similarity_analysis.ipynb](embedding_analysis/Relation_similarity_analysis.ipynb), analyzing the generate relation embedding similarity.
157 | - [Entity_similarity_analysis.ipynb](embedding_analysis/Entity_similarity_analysis.ipynb), analyzing the generate entity embedding similarity.
158 | - [Edge_score_analysis.ipynb](embedding_analysis/Edge_score_analysis.ipynb), evaluating whether the learned KGE model can predict the edges of DRGK
159 | - [Edge_similarity_based_on_link_recommendation_results.ipynb](embedding_analysis/Edge_similarity_based_on_link_recommendation_results.ipynb), evaluating how similar are the predicted links among different relation types.
160 |
161 | ### Drug Repurposing Using Pretrained Model for COVID-19
162 | We present an example of using pretrained DRKG model for drug repurposing for COVID-19. In the example, we directly use the pretrained model provided at [DRKG dataset](#drkg-dataset) and proposed 100 drugs for COVID-19. The following notebook provides the details:
163 |
164 | - [COVID-19_drug_repurposing.ipynb](drug_repurpose/COVID-19_drug_repurposing.ipynb)
165 |
166 | ### DRKG with DGL
167 | We provide a notebook, with example of using DRKG with Deep Graph Library (DGL).
168 |
169 | The following notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph:
170 | - [loading_drkg_in_dgl.ipynb](drkg_with_dgl/loading_drkg_in_dgl.ipynb)
171 |
172 | ## Additional Information for DrugBank
173 |
174 | Some additional information about compounds from DrugBank is included in [drugbank_info](/drugbank_info), including the
175 | type and weight of drugs, and the SMILES of small-molecule drugs.
176 |
177 | ## Licence
178 | This project is licensed under the Apache-2.0 License. However, the DRKG integrates data from many resources and users should consider the licensing of each source (see this [table](https://github.com/shuix007/COVID-19-KG/blob/master/licenses/Readme.md)) . We apply a license attribute on a per node and per edge basis for sources with defined licenses.
179 |
180 | ## Cite
181 |
182 | Please cite our dataset if you use this code and data in your work.
183 |
184 | ```
185 | @misc{drkg2020,
186 | author = {Ioannidis, Vassilis N. and Song, Xiang and Manchanda, Saurav and Li, Mufei and Pan, Xiaoqin
187 | and Zheng, Da and Ning, Xia and Zeng, Xiangxiang and Karypis, George},
188 | title = {DRKG - Drug Repurposing Knowledge Graph for Covid-19},
189 | howpublished = "\url{https://github.com/gnn4dr/DRKG/}",
190 | year = {2020}
191 | }
192 | ```
193 | A preprint describing this work will be available soon.
194 |
195 |
--------------------------------------------------------------------------------
/connectivity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gnn4dr/DRKG/d4bb1974312013c4bd79a13e42c1d9492033f8c7/connectivity.png
--------------------------------------------------------------------------------
/drkg_with_dgl/.ipynb_checkpoints/loading_drkg_in_dgl-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Working with DRKG in Deep Graph Library (DGL)\n",
8 | "This notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ \""
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import pandas as pd\n",
18 | "import numpy as np\n",
19 | "import dgl\n",
20 | "drkg_file = '~/covid/data/processed/filtered/drkg.tsv'\n",
21 | "df = pd.read_csv(drkg_file, sep =\"\\t\")\n",
22 | "triplets = df.values.tolist()"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "metadata": {},
28 | "source": [
29 | "Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID."
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 2,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "entity_dictionary = {}\n",
39 | "def insert_entry(entry, ent_type, dic):\n",
40 | " if ent_type not in dic:\n",
41 | " dic[ent_type] = {}\n",
42 | " ent_n_id = len(dic[ent_type])\n",
43 | " if entry not in dic[ent_type]:\n",
44 | " dic[ent_type][entry] = ent_n_id\n",
45 | " return dic\n",
46 | "\n",
47 | "for triple in triplets:\n",
48 | " src = triple[0]\n",
49 | " split_src = src.split('::')\n",
50 | " src_type = split_src[0]\n",
51 | " dest = triple[2]\n",
52 | " split_dest = dest.split('::')\n",
53 | " dest_type = split_dest[0]\n",
54 | " insert_entry(src,src_type,entity_dictionary)\n",
55 | " insert_entry(dest,dest_type,entity_dictionary)"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples."
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 3,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": [
71 | "edge_dictionary={}\n",
72 | "for triple in triplets:\n",
73 | " src = triple[0]\n",
74 | " split_src = src.split('::')\n",
75 | " src_type = split_src[0]\n",
76 | " dest = triple[2]\n",
77 | " split_dest = dest.split('::')\n",
78 | " dest_type = split_dest[0]\n",
79 | " \n",
80 | " src_int_id = entity_dictionary[src_type][src]\n",
81 | " dest_int_id = entity_dictionary[dest_type][dest]\n",
82 | " \n",
83 | " pair = (src_int_id,dest_int_id)\n",
84 | " etype = (src_type,triple[1],dest_type)\n",
85 | " if etype in edge_dictionary:\n",
86 | " edge_dictionary[etype] += [pair]\n",
87 | " else:\n",
88 | " edge_dictionary[etype] = [pair]"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "## Create a DGL heterograph using the dictionary of relations"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 4,
101 | "metadata": {},
102 | "outputs": [],
103 | "source": [
104 | "graph = dgl.heterograph(edge_dictionary);"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "## Print the statistics of the created graph"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "Number of nodes for each node-type"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": 5,
124 | "metadata": {},
125 | "outputs": [
126 | {
127 | "name": "stdout",
128 | "output_type": "stream",
129 | "text": [
130 | "Anatomy \t 400\n",
131 | "Atc \t 4048\n",
132 | "Biological Process \t 11381\n",
133 | "Cellular Component \t 1391\n",
134 | "Compound \t 24313\n",
135 | "Disease \t 4920\n",
136 | "Gene \t 39220\n",
137 | "Molecular Function \t 2884\n",
138 | "Pathway \t 1822\n",
139 | "Pharmacologic Class \t 345\n",
140 | "Side Effect \t 5701\n",
141 | "Symptom \t 415\n",
142 | "Tax \t 215\n",
143 | "Graph contains 97055 nodes from 13 node-types.\n"
144 | ]
145 | }
146 | ],
147 | "source": [
148 | "total_nodes = 0;\n",
149 | "for ntype in graph.ntypes:\n",
150 | " print(ntype, '\\t', graph.number_of_nodes(ntype));\n",
151 | " total_nodes += graph.number_of_nodes(ntype);\n",
152 | "print(\"Graph contains {} nodes from {} node-types.\".format(total_nodes, len(graph.ntypes)))"
153 | ]
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {},
158 | "source": [
159 | "Number of edges for each relation (edge-type)"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": 6,
165 | "metadata": {},
166 | "outputs": [
167 | {
168 | "name": "stdout",
169 | "output_type": "stream",
170 | "text": [
171 | "bioarx::HumGenHumGen:Gene:Gene \t 58093\n",
172 | "bioarx::VirGenHumGen:Gene:Gene \t 535\n",
173 | "bioarx::DrugVirGen:Compound:Gene \t 1165\n",
174 | "bioarx::DrugHumGen:Compound:Gene \t 24501\n",
175 | "Covid2_acc_host_gene::Disease:Gene \t 332\n",
176 | "Coronavirus_ass_host_gene::Disease:Gene \t 129\n",
177 | "DGIDB::INHIBITOR::Gene:Compound \t 5971\n",
178 | "DGIDB::ANTAGONIST::Gene:Compound \t 3006\n",
179 | "DGIDB::OTHER::Gene:Compound \t 11070\n",
180 | "DGIDB::AGONIST::Gene:Compound \t 3012\n",
181 | "DGIDB::BINDER::Gene:Compound \t 143\n",
182 | "DGIDB::MODULATOR::Gene:Compound \t 243\n",
183 | "DGIDB::BLOCKER::Gene:Compound \t 979\n",
184 | "DGIDB::CHANNEL BLOCKER::Gene:Compound \t 352\n",
185 | "DGIDB::ANTIBODY::Gene:Compound \t 188\n",
186 | "DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound \t 618\n",
187 | "DGIDB::ALLOSTERIC MODULATOR::Gene:Compound \t 317\n",
188 | "DGIDB::ACTIVATOR::Gene:Compound \t 316\n",
189 | "DGIDB::PARTIAL AGONIST::Gene:Compound \t 75\n",
190 | "DRUGBANK::drugbank_vocabulary:x-atc::Compound:Atc \t 15750\n",
191 | "DRUGBANK::drugbank_vocabulary:ddi-interactor-in::Compound:Compound \t 1379271\n",
192 | "DRUGBANK::drugbank_vocabulary:target::Compound:Gene \t 19158\n",
193 | "DRUGBANK::drugbank_vocabulary:enzyme::Compound:Gene \t 4923\n",
194 | "DRUGBANK::drugbank_vocabulary:carrier::Compound:Gene \t 720\n",
195 | "GNBR::E::Compound:Gene \t 32743\n",
196 | "GNBR::A+::Compound:Gene \t 1568\n",
197 | "GNBR::N::Compound:Gene \t 12521\n",
198 | "GNBR::K::Compound:Gene \t 12411\n",
199 | "GNBR::A-::Compound:Gene \t 1108\n",
200 | "GNBR::E+::Compound:Gene \t 1970\n",
201 | "GNBR::B::Compound:Gene \t 7170\n",
202 | "GNBR::E-::Compound:Gene \t 2918\n",
203 | "GNBR::O::Compound:Gene \t 5573\n",
204 | "GNBR::Z::Compound:Gene \t 2821\n",
205 | "GNBR::T::Compound:Disease \t 54020\n",
206 | "GNBR::C::Compound:Disease \t 1739\n",
207 | "GNBR::Sa::Compound:Disease \t 16923\n",
208 | "GNBR::Pa::Compound:Disease \t 2619\n",
209 | "GNBR::Mp::Compound:Disease \t 495\n",
210 | "GNBR::Pr::Compound:Disease \t 966\n",
211 | "GNBR::J::Compound:Disease \t 1020\n",
212 | "GNBR::L::Gene:Disease \t 48385\n",
213 | "GNBR::U::Gene:Disease \t 6432\n",
214 | "GNBR::Y::Gene:Disease \t 1948\n",
215 | "GNBR::J::Gene:Disease \t 30234\n",
216 | "GNBR::Te::Gene:Disease \t 2836\n",
217 | "GNBR::Md::Gene:Disease \t 1279\n",
218 | "GNBR::G::Gene:Disease \t 2055\n",
219 | "GNBR::D::Gene:Disease \t 500\n",
220 | "GNBR::X::Gene:Disease \t 1324\n",
221 | "GNBR::Ud::Gene:Disease \t 407\n",
222 | "GNBR::V+::Gene:Gene \t 8689\n",
223 | "GNBR::Q::Gene:Gene \t 19372\n",
224 | "GNBR::Rg::Gene:Gene \t 11018\n",
225 | "GNBR::B::Gene:Gene \t 8164\n",
226 | "GNBR::I::Gene:Gene \t 5434\n",
227 | "GNBR::E+::Gene:Gene \t 10838\n",
228 | "GNBR::H::Gene:Gene \t 2509\n",
229 | "GNBR::W::Gene:Gene \t 280\n",
230 | "GNBR::E::Gene:Gene \t 418\n",
231 | "GNBR:in_tax::Gene:Tax \t 14663\n",
232 | "Hetionet::GpBP::Gene:Biological Process \t 559504\n",
233 | "Hetionet::GiG::Gene:Gene \t 147164\n",
234 | "Hetionet::CrC::Compound:Compound \t 6486\n",
235 | "Hetionet::DdG::Disease:Gene \t 7623\n",
236 | "Hetionet::DpS::Disease:Symptom \t 3357\n",
237 | "Hetionet::DlA::Disease:Anatomy \t 3602\n",
238 | "Hetionet::CtD::Compound:Disease \t 755\n",
239 | "Hetionet::CbG::Compound:Gene \t 11571\n",
240 | "Hetionet::CuG::Compound:Gene \t 18756\n",
241 | "Hetionet::DrD::Disease:Disease \t 543\n",
242 | "Hetionet::DaG::Disease:Gene \t 12623\n",
243 | "Hetionet::CpD::Compound:Disease \t 390\n",
244 | "Hetionet::AdG::Anatomy:Gene \t 102240\n",
245 | "Hetionet::AuG::Anatomy:Gene \t 97848\n",
246 | "Hetionet::GcG::Gene:Gene \t 61690\n",
247 | "Hetionet::GpMF::Gene:Molecular Function \t 97222\n",
248 | "Hetionet::PCiC::Pharmacologic Class:Compound \t 1029\n",
249 | "Hetionet::GpCC::Gene:Cellular Component \t 73566\n",
250 | "Hetionet::Gr>G::Gene:Gene \t 265672\n",
251 | "Hetionet::CdG::Compound:Gene \t 21102\n",
252 | "Hetionet::DuG::Disease:Gene \t 7731\n",
253 | "Hetionet::GpPW::Gene:Pathway \t 84372\n",
254 | "Hetionet::CcSE::Compound:Side Effect \t 138944\n",
255 | "Hetionet::AeG::Anatomy:Gene \t 526407\n",
256 | "INTACT::PHYSICAL ASSOCIATION::Compound:Gene \t 203\n",
257 | "INTACT::DIRECT INTERACTION::Compound:Gene \t 155\n",
258 | "INTACT::ASSOCIATION::Compound:Gene \t 1447\n",
259 | "INTACT::ASSOCIATION::Gene:Gene \t 112390\n",
260 | "INTACT::PHYSICAL ASSOCIATION::Gene:Gene \t 129318\n",
261 | "INTACT::COLOCALIZATION::Gene:Gene \t 3468\n",
262 | "INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene \t 303\n",
263 | "INTACT::CLEAVAGE REACTION::Gene:Gene \t 93\n",
264 | "INTACT::DIRECT INTERACTION::Gene:Gene \t 6950\n",
265 | "INTACT::PHOSPHORYLATION REACTION::Gene:Gene \t 1328\n",
266 | "INTACT::ADP RIBOSYLATION REACTION::Gene:Gene \t 58\n",
267 | "INTACT::UBIQUITINATION REACTION::Gene:Gene \t 371\n",
268 | "INTACT::PROTEIN CLEAVAGE::Gene:Gene \t 67\n",
269 | "STRING::REACTION::Gene:Gene \t 400426\n",
270 | "STRING::CATALYSIS::Gene:Gene \t 343533\n",
271 | "STRING::ACTIVATION::Gene:Gene \t 81355\n",
272 | "STRING::INHIBITION::Gene:Gene \t 28959\n",
273 | "STRING::OTHER::Gene:Gene \t 310690\n",
274 | "STRING::BINDING::Gene:Gene \t 315875\n",
275 | "STRING::PTMOD::Gene:Gene \t 15113\n",
276 | "STRING::EXPRESSION::Gene:Gene \t 757\n",
277 | "Graph contains 5869293 edges from 106 edge-types.\n"
278 | ]
279 | }
280 | ],
281 | "source": [
282 | "total_edges = 0;\n",
283 | "for etype in graph.etypes:\n",
284 | " print(etype, '\\t', graph.number_of_edges(etype))\n",
285 | " total_edges += graph.number_of_edges(etype);\n",
286 | "print(\"Graph contains {} edges from {} edge-types.\".format(total_edges, len(graph.etypes)))"
287 | ]
288 | },
289 | {
290 | "cell_type": "markdown",
291 | "metadata": {},
292 | "source": [
293 | "Just printing the graph (\"print(graph)\") will also print the graph summary"
294 | ]
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": 7,
299 | "metadata": {},
300 | "outputs": [
301 | {
302 | "name": "stdout",
303 | "output_type": "stream",
304 | "text": [
305 | "Graph(num_nodes={'Anatomy': 400, 'Atc': 4048, 'Biological Process': 11381, 'Cellular Component': 1391, 'Compound': 24313, 'Disease': 4920, 'Gene': 39220, 'Molecular Function': 2884, 'Pathway': 1822, 'Pharmacologic Class': 345, 'Side Effect': 5701, 'Symptom': 415, 'Tax': 215},\n",
306 | " num_edges={('Gene', 'bioarx::HumGenHumGen:Gene:Gene', 'Gene'): 58093, ('Gene', 'bioarx::VirGenHumGen:Gene:Gene', 'Gene'): 535, ('Compound', 'bioarx::DrugVirGen:Compound:Gene', 'Gene'): 1165, ('Compound', 'bioarx::DrugHumGen:Compound:Gene', 'Gene'): 24501, ('Disease', 'Covid2_acc_host_gene::Disease:Gene', 'Gene'): 332, ('Disease', 'Coronavirus_ass_host_gene::Disease:Gene', 'Gene'): 129, ('Gene', 'DGIDB::INHIBITOR::Gene:Compound', 'Compound'): 5971, ('Gene', 'DGIDB::ANTAGONIST::Gene:Compound', 'Compound'): 3006, ('Gene', 'DGIDB::OTHER::Gene:Compound', 'Compound'): 11070, ('Gene', 'DGIDB::AGONIST::Gene:Compound', 'Compound'): 3012, ('Gene', 'DGIDB::BINDER::Gene:Compound', 'Compound'): 143, ('Gene', 'DGIDB::MODULATOR::Gene:Compound', 'Compound'): 243, ('Gene', 'DGIDB::BLOCKER::Gene:Compound', 'Compound'): 979, ('Gene', 'DGIDB::CHANNEL BLOCKER::Gene:Compound', 'Compound'): 352, ('Gene', 'DGIDB::ANTIBODY::Gene:Compound', 'Compound'): 188, ('Gene', 'DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 618, ('Gene', 'DGIDB::ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 317, ('Gene', 'DGIDB::ACTIVATOR::Gene:Compound', 'Compound'): 316, ('Gene', 'DGIDB::PARTIAL AGONIST::Gene:Compound', 'Compound'): 75, ('Compound', 'DRUGBANK::drugbank_vocabulary:x-atc::Compound:Atc', 'Atc'): 15750, ('Compound', 'DRUGBANK::drugbank_vocabulary:ddi-interactor-in::Compound:Compound', 'Compound'): 1379271, ('Compound', 'DRUGBANK::drugbank_vocabulary:target::Compound:Gene', 'Gene'): 19158, ('Compound', 'DRUGBANK::drugbank_vocabulary:enzyme::Compound:Gene', 'Gene'): 4923, ('Compound', 'DRUGBANK::drugbank_vocabulary:carrier::Compound:Gene', 'Gene'): 720, ('Compound', 'GNBR::E::Compound:Gene', 'Gene'): 32743, ('Compound', 'GNBR::A+::Compound:Gene', 'Gene'): 1568, ('Compound', 'GNBR::N::Compound:Gene', 'Gene'): 12521, ('Compound', 'GNBR::K::Compound:Gene', 'Gene'): 12411, ('Compound', 'GNBR::A-::Compound:Gene', 'Gene'): 1108, ('Compound', 'GNBR::E+::Compound:Gene', 'Gene'): 1970, ('Compound', 'GNBR::B::Compound:Gene', 'Gene'): 7170, ('Compound', 'GNBR::E-::Compound:Gene', 'Gene'): 2918, ('Compound', 'GNBR::O::Compound:Gene', 'Gene'): 5573, ('Compound', 'GNBR::Z::Compound:Gene', 'Gene'): 2821, ('Compound', 'GNBR::T::Compound:Disease', 'Disease'): 54020, ('Compound', 'GNBR::C::Compound:Disease', 'Disease'): 1739, ('Compound', 'GNBR::Sa::Compound:Disease', 'Disease'): 16923, ('Compound', 'GNBR::Pa::Compound:Disease', 'Disease'): 2619, ('Compound', 'GNBR::Mp::Compound:Disease', 'Disease'): 495, ('Compound', 'GNBR::Pr::Compound:Disease', 'Disease'): 966, ('Compound', 'GNBR::J::Compound:Disease', 'Disease'): 1020, ('Gene', 'GNBR::L::Gene:Disease', 'Disease'): 48385, ('Gene', 'GNBR::U::Gene:Disease', 'Disease'): 6432, ('Gene', 'GNBR::Y::Gene:Disease', 'Disease'): 1948, ('Gene', 'GNBR::J::Gene:Disease', 'Disease'): 30234, ('Gene', 'GNBR::Te::Gene:Disease', 'Disease'): 2836, ('Gene', 'GNBR::Md::Gene:Disease', 'Disease'): 1279, ('Gene', 'GNBR::G::Gene:Disease', 'Disease'): 2055, ('Gene', 'GNBR::D::Gene:Disease', 'Disease'): 500, ('Gene', 'GNBR::X::Gene:Disease', 'Disease'): 1324, ('Gene', 'GNBR::Ud::Gene:Disease', 'Disease'): 407, ('Gene', 'GNBR::V+::Gene:Gene', 'Gene'): 8689, ('Gene', 'GNBR::Q::Gene:Gene', 'Gene'): 19372, ('Gene', 'GNBR::Rg::Gene:Gene', 'Gene'): 11018, ('Gene', 'GNBR::B::Gene:Gene', 'Gene'): 8164, ('Gene', 'GNBR::I::Gene:Gene', 'Gene'): 5434, ('Gene', 'GNBR::E+::Gene:Gene', 'Gene'): 10838, ('Gene', 'GNBR::H::Gene:Gene', 'Gene'): 2509, ('Gene', 'GNBR::W::Gene:Gene', 'Gene'): 280, ('Gene', 'GNBR::E::Gene:Gene', 'Gene'): 418, ('Gene', 'GNBR:in_tax::Gene:Tax', 'Tax'): 14663, ('Gene', 'Hetionet::GpBP::Gene:Biological Process', 'Biological Process'): 559504, ('Gene', 'Hetionet::GiG::Gene:Gene', 'Gene'): 147164, ('Compound', 'Hetionet::CrC::Compound:Compound', 'Compound'): 6486, ('Disease', 'Hetionet::DdG::Disease:Gene', 'Gene'): 7623, ('Disease', 'Hetionet::DpS::Disease:Symptom', 'Symptom'): 3357, ('Disease', 'Hetionet::DlA::Disease:Anatomy', 'Anatomy'): 3602, ('Compound', 'Hetionet::CtD::Compound:Disease', 'Disease'): 755, ('Compound', 'Hetionet::CbG::Compound:Gene', 'Gene'): 11571, ('Compound', 'Hetionet::CuG::Compound:Gene', 'Gene'): 18756, ('Disease', 'Hetionet::DrD::Disease:Disease', 'Disease'): 543, ('Disease', 'Hetionet::DaG::Disease:Gene', 'Gene'): 12623, ('Compound', 'Hetionet::CpD::Compound:Disease', 'Disease'): 390, ('Anatomy', 'Hetionet::AdG::Anatomy:Gene', 'Gene'): 102240, ('Anatomy', 'Hetionet::AuG::Anatomy:Gene', 'Gene'): 97848, ('Gene', 'Hetionet::GcG::Gene:Gene', 'Gene'): 61690, ('Gene', 'Hetionet::GpMF::Gene:Molecular Function', 'Molecular Function'): 97222, ('Pharmacologic Class', 'Hetionet::PCiC::Pharmacologic Class:Compound', 'Compound'): 1029, ('Gene', 'Hetionet::GpCC::Gene:Cellular Component', 'Cellular Component'): 73566, ('Gene', 'Hetionet::Gr>G::Gene:Gene', 'Gene'): 265672, ('Compound', 'Hetionet::CdG::Compound:Gene', 'Gene'): 21102, ('Disease', 'Hetionet::DuG::Disease:Gene', 'Gene'): 7731, ('Gene', 'Hetionet::GpPW::Gene:Pathway', 'Pathway'): 84372, ('Compound', 'Hetionet::CcSE::Compound:Side Effect', 'Side Effect'): 138944, ('Anatomy', 'Hetionet::AeG::Anatomy:Gene', 'Gene'): 526407, ('Compound', 'INTACT::PHYSICAL ASSOCIATION::Compound:Gene', 'Gene'): 203, ('Compound', 'INTACT::DIRECT INTERACTION::Compound:Gene', 'Gene'): 155, ('Compound', 'INTACT::ASSOCIATION::Compound:Gene', 'Gene'): 1447, ('Gene', 'INTACT::ASSOCIATION::Gene:Gene', 'Gene'): 112390, ('Gene', 'INTACT::PHYSICAL ASSOCIATION::Gene:Gene', 'Gene'): 129318, ('Gene', 'INTACT::COLOCALIZATION::Gene:Gene', 'Gene'): 3468, ('Gene', 'INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 303, ('Gene', 'INTACT::CLEAVAGE REACTION::Gene:Gene', 'Gene'): 93, ('Gene', 'INTACT::DIRECT INTERACTION::Gene:Gene', 'Gene'): 6950, ('Gene', 'INTACT::PHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 1328, ('Gene', 'INTACT::ADP RIBOSYLATION REACTION::Gene:Gene', 'Gene'): 58, ('Gene', 'INTACT::UBIQUITINATION REACTION::Gene:Gene', 'Gene'): 371, ('Gene', 'INTACT::PROTEIN CLEAVAGE::Gene:Gene', 'Gene'): 67, ('Gene', 'STRING::REACTION::Gene:Gene', 'Gene'): 400426, ('Gene', 'STRING::CATALYSIS::Gene:Gene', 'Gene'): 343533, ('Gene', 'STRING::ACTIVATION::Gene:Gene', 'Gene'): 81355, ('Gene', 'STRING::INHIBITION::Gene:Gene', 'Gene'): 28959, ('Gene', 'STRING::OTHER::Gene:Gene', 'Gene'): 310690, ('Gene', 'STRING::BINDING::Gene:Gene', 'Gene'): 315875, ('Gene', 'STRING::PTMOD::Gene:Gene', 'Gene'): 15113, ('Gene', 'STRING::EXPRESSION::Gene:Gene', 'Gene'): 757},\n",
307 | " metagraph=[('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Tax'), ('Gene', 'Biological Process'), ('Gene', 'Molecular Function'), ('Gene', 'Cellular Component'), ('Gene', 'Pathway'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Atc'), ('Compound', 'Compound'), ('Compound', 'Compound'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Side Effect'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Symptom'), ('Disease', 'Anatomy'), ('Disease', 'Disease'), ('Pharmacologic Class', 'Compound')])\n"
308 | ]
309 | }
310 | ],
311 | "source": [
312 | "print(graph)"
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": null,
318 | "metadata": {},
319 | "outputs": [],
320 | "source": []
321 | }
322 | ],
323 | "metadata": {
324 | "kernelspec": {
325 | "display_name": "Python 3",
326 | "language": "python",
327 | "name": "python3"
328 | },
329 | "language_info": {
330 | "codemirror_mode": {
331 | "name": "ipython",
332 | "version": 3
333 | },
334 | "file_extension": ".py",
335 | "mimetype": "text/x-python",
336 | "name": "python",
337 | "nbconvert_exporter": "python",
338 | "pygments_lexer": "ipython3",
339 | "version": "3.6.9"
340 | }
341 | },
342 | "nbformat": 4,
343 | "nbformat_minor": 2
344 | }
345 |
--------------------------------------------------------------------------------
/drkg_with_dgl/Readme.md:
--------------------------------------------------------------------------------
1 | # Query Basic DRKG Graph Information with DGL
2 | Here we show how to load DRKG dataset into a heterograph in DGL and give several examples of queries on investigating the graph structure of DRKG in [loading_drkg_in_dgl.ipynb](loading_drkg_in_dgl.ipynb).
--------------------------------------------------------------------------------
/drkg_with_dgl/loading_drkg_in_dgl.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Working with DRKG in Deep Graph Library (DGL)\n",
8 | "This notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ \""
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "metadata": {},
15 | "outputs": [
16 | {
17 | "name": "stderr",
18 | "output_type": "stream",
19 | "text": [
20 | "Using backend: pytorch\n"
21 | ]
22 | }
23 | ],
24 | "source": [
25 | "import pandas as pd\n",
26 | "import numpy as np\n",
27 | "import dgl\n",
28 | "import sys\n",
29 | "sys.path.insert(1, '../utils')\n",
30 | "from utils import download_and_extract\n",
31 | "download_and_extract()\n",
32 | "drkg_file = '../data/drkg/drkg.tsv'\n",
33 | "df = pd.read_csv(drkg_file, sep =\"\\t\", header=None)\n",
34 | "triplets = df.values.tolist()"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID."
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 2,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "entity_dictionary = {}\n",
51 | "def insert_entry(entry, ent_type, dic):\n",
52 | " if ent_type not in dic:\n",
53 | " dic[ent_type] = {}\n",
54 | " ent_n_id = len(dic[ent_type])\n",
55 | " if entry not in dic[ent_type]:\n",
56 | " dic[ent_type][entry] = ent_n_id\n",
57 | " return dic\n",
58 | "\n",
59 | "for triple in triplets:\n",
60 | " src = triple[0]\n",
61 | " split_src = src.split('::')\n",
62 | " src_type = split_src[0]\n",
63 | " dest = triple[2]\n",
64 | " split_dest = dest.split('::')\n",
65 | " dest_type = split_dest[0]\n",
66 | " insert_entry(src,src_type,entity_dictionary)\n",
67 | " insert_entry(dest,dest_type,entity_dictionary)"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples."
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": 3,
80 | "metadata": {},
81 | "outputs": [],
82 | "source": [
83 | "edge_dictionary={}\n",
84 | "for triple in triplets:\n",
85 | " src = triple[0]\n",
86 | " split_src = src.split('::')\n",
87 | " src_type = split_src[0]\n",
88 | " dest = triple[2]\n",
89 | " split_dest = dest.split('::')\n",
90 | " dest_type = split_dest[0]\n",
91 | " \n",
92 | " src_int_id = entity_dictionary[src_type][src]\n",
93 | " dest_int_id = entity_dictionary[dest_type][dest]\n",
94 | " \n",
95 | " pair = (src_int_id,dest_int_id)\n",
96 | " etype = (src_type,triple[1],dest_type)\n",
97 | " if etype in edge_dictionary:\n",
98 | " edge_dictionary[etype] += [pair]\n",
99 | " else:\n",
100 | " edge_dictionary[etype] = [pair]"
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "## Create a DGL heterograph using the dictionary of relations"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": 4,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "graph = dgl.heterograph(edge_dictionary);"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "## Print the statistics of the created graph"
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {},
129 | "source": [
130 | "Number of nodes for each node-type"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 5,
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "name": "stdout",
140 | "output_type": "stream",
141 | "text": [
142 | "Anatomy \t 400\n",
143 | "Atc \t 4048\n",
144 | "Biological Process \t 11381\n",
145 | "Cellular Component \t 1391\n",
146 | "Compound \t 24313\n",
147 | "Disease \t 5103\n",
148 | "Gene \t 39220\n",
149 | "Molecular Function \t 2884\n",
150 | "Pathway \t 1822\n",
151 | "Pharmacologic Class \t 345\n",
152 | "Side Effect \t 5701\n",
153 | "Symptom \t 415\n",
154 | "Tax \t 215\n",
155 | "Graph contains 97238 nodes from 13 node-types.\n"
156 | ]
157 | }
158 | ],
159 | "source": [
160 | "total_nodes = 0;\n",
161 | "for ntype in graph.ntypes:\n",
162 | " print(ntype, '\\t', graph.number_of_nodes(ntype));\n",
163 | " total_nodes += graph.number_of_nodes(ntype);\n",
164 | "print(\"Graph contains {} nodes from {} node-types.\".format(total_nodes, len(graph.ntypes)))"
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "Number of edges for each relation (edge-type)"
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": 6,
177 | "metadata": {},
178 | "outputs": [
179 | {
180 | "name": "stdout",
181 | "output_type": "stream",
182 | "text": [
183 | "bioarx::HumGenHumGen:Gene:Gene \t 58094\n",
184 | "bioarx::VirGenHumGen:Gene:Gene \t 535\n",
185 | "bioarx::DrugVirGen:Compound:Gene \t 1165\n",
186 | "bioarx::DrugHumGen:Compound:Gene \t 24501\n",
187 | "bioarx::Covid2_acc_host_gene::Disease:Gene \t 332\n",
188 | "bioarx::Coronavirus_ass_host_gene::Disease:Gene \t 129\n",
189 | "DGIDB::INHIBITOR::Gene:Compound \t 5971\n",
190 | "DGIDB::ANTAGONIST::Gene:Compound \t 3006\n",
191 | "DGIDB::OTHER::Gene:Compound \t 11070\n",
192 | "DGIDB::AGONIST::Gene:Compound \t 3012\n",
193 | "DGIDB::BINDER::Gene:Compound \t 143\n",
194 | "DGIDB::MODULATOR::Gene:Compound \t 243\n",
195 | "DGIDB::BLOCKER::Gene:Compound \t 979\n",
196 | "DGIDB::CHANNEL BLOCKER::Gene:Compound \t 352\n",
197 | "DGIDB::ANTIBODY::Gene:Compound \t 188\n",
198 | "DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound \t 618\n",
199 | "DGIDB::ALLOSTERIC MODULATOR::Gene:Compound \t 317\n",
200 | "DGIDB::ACTIVATOR::Gene:Compound \t 316\n",
201 | "DGIDB::PARTIAL AGONIST::Gene:Compound \t 75\n",
202 | "DRUGBANK::x-atc::Compound:Atc \t 15750\n",
203 | "DRUGBANK::ddi-interactor-in::Compound:Compound \t 1379271\n",
204 | "DRUGBANK::target::Compound:Gene \t 19158\n",
205 | "DRUGBANK::enzyme::Compound:Gene \t 4923\n",
206 | "DRUGBANK::carrier::Compound:Gene \t 720\n",
207 | "DRUGBANK::treats::Compound:Disease \t 4968\n",
208 | "GNBR::E::Compound:Gene \t 32743\n",
209 | "GNBR::A+::Compound:Gene \t 1568\n",
210 | "GNBR::N::Compound:Gene \t 12521\n",
211 | "GNBR::K::Compound:Gene \t 12411\n",
212 | "GNBR::A-::Compound:Gene \t 1108\n",
213 | "GNBR::E+::Compound:Gene \t 1970\n",
214 | "GNBR::B::Compound:Gene \t 7170\n",
215 | "GNBR::E-::Compound:Gene \t 2918\n",
216 | "GNBR::O::Compound:Gene \t 5573\n",
217 | "GNBR::Z::Compound:Gene \t 2821\n",
218 | "GNBR::T::Compound:Disease \t 54020\n",
219 | "GNBR::C::Compound:Disease \t 1739\n",
220 | "GNBR::Sa::Compound:Disease \t 16923\n",
221 | "GNBR::Pa::Compound:Disease \t 2619\n",
222 | "GNBR::Mp::Compound:Disease \t 495\n",
223 | "GNBR::Pr::Compound:Disease \t 966\n",
224 | "GNBR::J::Compound:Disease \t 1020\n",
225 | "GNBR::L::Gene:Disease \t 48384\n",
226 | "GNBR::U::Gene:Disease \t 6432\n",
227 | "GNBR::Y::Gene:Disease \t 1948\n",
228 | "GNBR::J::Gene:Disease \t 30234\n",
229 | "GNBR::Te::Gene:Disease \t 2836\n",
230 | "GNBR::Md::Gene:Disease \t 1279\n",
231 | "GNBR::G::Gene:Disease \t 2055\n",
232 | "GNBR::D::Gene:Disease \t 500\n",
233 | "GNBR::X::Gene:Disease \t 1324\n",
234 | "GNBR::Ud::Gene:Disease \t 407\n",
235 | "GNBR::V+::Gene:Gene \t 8689\n",
236 | "GNBR::Q::Gene:Gene \t 19372\n",
237 | "GNBR::Rg::Gene:Gene \t 11018\n",
238 | "GNBR::B::Gene:Gene \t 8164\n",
239 | "GNBR::I::Gene:Gene \t 5434\n",
240 | "GNBR::E+::Gene:Gene \t 10838\n",
241 | "GNBR::H::Gene:Gene \t 2509\n",
242 | "GNBR::W::Gene:Gene \t 280\n",
243 | "GNBR::E::Gene:Gene \t 418\n",
244 | "GNBR::in_tax::Gene:Tax \t 14663\n",
245 | "Hetionet::GpBP::Gene:Biological Process \t 559504\n",
246 | "Hetionet::GiG::Gene:Gene \t 147164\n",
247 | "Hetionet::CrC::Compound:Compound \t 6486\n",
248 | "Hetionet::DdG::Disease:Gene \t 7623\n",
249 | "Hetionet::DpS::Disease:Symptom \t 3357\n",
250 | "Hetionet::DlA::Disease:Anatomy \t 3602\n",
251 | "Hetionet::CtD::Compound:Disease \t 755\n",
252 | "Hetionet::CbG::Compound:Gene \t 11571\n",
253 | "Hetionet::CuG::Compound:Gene \t 18756\n",
254 | "Hetionet::DrD::Disease:Disease \t 543\n",
255 | "Hetionet::DaG::Disease:Gene \t 12623\n",
256 | "Hetionet::CpD::Compound:Disease \t 390\n",
257 | "Hetionet::AdG::Anatomy:Gene \t 102240\n",
258 | "Hetionet::AuG::Anatomy:Gene \t 97848\n",
259 | "Hetionet::GcG::Gene:Gene \t 61690\n",
260 | "Hetionet::GpMF::Gene:Molecular Function \t 97222\n",
261 | "Hetionet::PCiC::Pharmacologic Class:Compound \t 1029\n",
262 | "Hetionet::GpCC::Gene:Cellular Component \t 73566\n",
263 | "Hetionet::Gr>G::Gene:Gene \t 265672\n",
264 | "Hetionet::CdG::Compound:Gene \t 21102\n",
265 | "Hetionet::DuG::Disease:Gene \t 7731\n",
266 | "Hetionet::GpPW::Gene:Pathway \t 84372\n",
267 | "Hetionet::CcSE::Compound:Side Effect \t 138944\n",
268 | "Hetionet::AeG::Anatomy:Gene \t 526407\n",
269 | "INTACT::PHYSICAL ASSOCIATION::Compound:Gene \t 203\n",
270 | "INTACT::DIRECT INTERACTION::Compound:Gene \t 155\n",
271 | "INTACT::ASSOCIATION::Compound:Gene \t 1447\n",
272 | "INTACT::ASSOCIATION::Gene:Gene \t 112390\n",
273 | "INTACT::PHYSICAL ASSOCIATION::Gene:Gene \t 129318\n",
274 | "INTACT::COLOCALIZATION::Gene:Gene \t 3468\n",
275 | "INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene \t 303\n",
276 | "INTACT::CLEAVAGE REACTION::Gene:Gene \t 93\n",
277 | "INTACT::DIRECT INTERACTION::Gene:Gene \t 6950\n",
278 | "INTACT::PHOSPHORYLATION REACTION::Gene:Gene \t 1328\n",
279 | "INTACT::ADP RIBOSYLATION REACTION::Gene:Gene \t 58\n",
280 | "INTACT::UBIQUITINATION REACTION::Gene:Gene \t 371\n",
281 | "INTACT::PROTEIN CLEAVAGE::Gene:Gene \t 67\n",
282 | "STRING::REACTION::Gene:Gene \t 400426\n",
283 | "STRING::CATALYSIS::Gene:Gene \t 343533\n",
284 | "STRING::ACTIVATION::Gene:Gene \t 81355\n",
285 | "STRING::INHIBITION::Gene:Gene \t 28959\n",
286 | "STRING::OTHER::Gene:Gene \t 310690\n",
287 | "STRING::BINDING::Gene:Gene \t 315875\n",
288 | "STRING::PTMOD::Gene:Gene \t 15113\n",
289 | "STRING::EXPRESSION::Gene:Gene \t 757\n",
290 | "Graph contains 5874261 edges from 107 edge-types.\n"
291 | ]
292 | }
293 | ],
294 | "source": [
295 | "total_edges = 0;\n",
296 | "for etype in graph.etypes:\n",
297 | " print(etype, '\\t', graph.number_of_edges(etype))\n",
298 | " total_edges += graph.number_of_edges(etype);\n",
299 | "print(\"Graph contains {} edges from {} edge-types.\".format(total_edges, len(graph.etypes)))"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {},
305 | "source": [
306 | "Just printing the graph (\"print(graph)\") will also print the graph summary"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": 7,
312 | "metadata": {},
313 | "outputs": [
314 | {
315 | "name": "stdout",
316 | "output_type": "stream",
317 | "text": [
318 | "Graph(num_nodes={'Anatomy': 400, 'Atc': 4048, 'Biological Process': 11381, 'Cellular Component': 1391, 'Compound': 24313, 'Disease': 5103, 'Gene': 39220, 'Molecular Function': 2884, 'Pathway': 1822, 'Pharmacologic Class': 345, 'Side Effect': 5701, 'Symptom': 415, 'Tax': 215},\n",
319 | " num_edges={('Gene', 'bioarx::HumGenHumGen:Gene:Gene', 'Gene'): 58094, ('Gene', 'bioarx::VirGenHumGen:Gene:Gene', 'Gene'): 535, ('Compound', 'bioarx::DrugVirGen:Compound:Gene', 'Gene'): 1165, ('Compound', 'bioarx::DrugHumGen:Compound:Gene', 'Gene'): 24501, ('Disease', 'bioarx::Covid2_acc_host_gene::Disease:Gene', 'Gene'): 332, ('Disease', 'bioarx::Coronavirus_ass_host_gene::Disease:Gene', 'Gene'): 129, ('Gene', 'DGIDB::INHIBITOR::Gene:Compound', 'Compound'): 5971, ('Gene', 'DGIDB::ANTAGONIST::Gene:Compound', 'Compound'): 3006, ('Gene', 'DGIDB::OTHER::Gene:Compound', 'Compound'): 11070, ('Gene', 'DGIDB::AGONIST::Gene:Compound', 'Compound'): 3012, ('Gene', 'DGIDB::BINDER::Gene:Compound', 'Compound'): 143, ('Gene', 'DGIDB::MODULATOR::Gene:Compound', 'Compound'): 243, ('Gene', 'DGIDB::BLOCKER::Gene:Compound', 'Compound'): 979, ('Gene', 'DGIDB::CHANNEL BLOCKER::Gene:Compound', 'Compound'): 352, ('Gene', 'DGIDB::ANTIBODY::Gene:Compound', 'Compound'): 188, ('Gene', 'DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 618, ('Gene', 'DGIDB::ALLOSTERIC MODULATOR::Gene:Compound', 'Compound'): 317, ('Gene', 'DGIDB::ACTIVATOR::Gene:Compound', 'Compound'): 316, ('Gene', 'DGIDB::PARTIAL AGONIST::Gene:Compound', 'Compound'): 75, ('Compound', 'DRUGBANK::x-atc::Compound:Atc', 'Atc'): 15750, ('Compound', 'DRUGBANK::ddi-interactor-in::Compound:Compound', 'Compound'): 1379271, ('Compound', 'DRUGBANK::target::Compound:Gene', 'Gene'): 19158, ('Compound', 'DRUGBANK::enzyme::Compound:Gene', 'Gene'): 4923, ('Compound', 'DRUGBANK::carrier::Compound:Gene', 'Gene'): 720, ('Compound', 'DRUGBANK::treats::Compound:Disease', 'Disease'): 4968, ('Compound', 'GNBR::E::Compound:Gene', 'Gene'): 32743, ('Compound', 'GNBR::A+::Compound:Gene', 'Gene'): 1568, ('Compound', 'GNBR::N::Compound:Gene', 'Gene'): 12521, ('Compound', 'GNBR::K::Compound:Gene', 'Gene'): 12411, ('Compound', 'GNBR::A-::Compound:Gene', 'Gene'): 1108, ('Compound', 'GNBR::E+::Compound:Gene', 'Gene'): 1970, ('Compound', 'GNBR::B::Compound:Gene', 'Gene'): 7170, ('Compound', 'GNBR::E-::Compound:Gene', 'Gene'): 2918, ('Compound', 'GNBR::O::Compound:Gene', 'Gene'): 5573, ('Compound', 'GNBR::Z::Compound:Gene', 'Gene'): 2821, ('Compound', 'GNBR::T::Compound:Disease', 'Disease'): 54020, ('Compound', 'GNBR::C::Compound:Disease', 'Disease'): 1739, ('Compound', 'GNBR::Sa::Compound:Disease', 'Disease'): 16923, ('Compound', 'GNBR::Pa::Compound:Disease', 'Disease'): 2619, ('Compound', 'GNBR::Mp::Compound:Disease', 'Disease'): 495, ('Compound', 'GNBR::Pr::Compound:Disease', 'Disease'): 966, ('Compound', 'GNBR::J::Compound:Disease', 'Disease'): 1020, ('Gene', 'GNBR::L::Gene:Disease', 'Disease'): 48384, ('Gene', 'GNBR::U::Gene:Disease', 'Disease'): 6432, ('Gene', 'GNBR::Y::Gene:Disease', 'Disease'): 1948, ('Gene', 'GNBR::J::Gene:Disease', 'Disease'): 30234, ('Gene', 'GNBR::Te::Gene:Disease', 'Disease'): 2836, ('Gene', 'GNBR::Md::Gene:Disease', 'Disease'): 1279, ('Gene', 'GNBR::G::Gene:Disease', 'Disease'): 2055, ('Gene', 'GNBR::D::Gene:Disease', 'Disease'): 500, ('Gene', 'GNBR::X::Gene:Disease', 'Disease'): 1324, ('Gene', 'GNBR::Ud::Gene:Disease', 'Disease'): 407, ('Gene', 'GNBR::V+::Gene:Gene', 'Gene'): 8689, ('Gene', 'GNBR::Q::Gene:Gene', 'Gene'): 19372, ('Gene', 'GNBR::Rg::Gene:Gene', 'Gene'): 11018, ('Gene', 'GNBR::B::Gene:Gene', 'Gene'): 8164, ('Gene', 'GNBR::I::Gene:Gene', 'Gene'): 5434, ('Gene', 'GNBR::E+::Gene:Gene', 'Gene'): 10838, ('Gene', 'GNBR::H::Gene:Gene', 'Gene'): 2509, ('Gene', 'GNBR::W::Gene:Gene', 'Gene'): 280, ('Gene', 'GNBR::E::Gene:Gene', 'Gene'): 418, ('Gene', 'GNBR::in_tax::Gene:Tax', 'Tax'): 14663, ('Gene', 'Hetionet::GpBP::Gene:Biological Process', 'Biological Process'): 559504, ('Gene', 'Hetionet::GiG::Gene:Gene', 'Gene'): 147164, ('Compound', 'Hetionet::CrC::Compound:Compound', 'Compound'): 6486, ('Disease', 'Hetionet::DdG::Disease:Gene', 'Gene'): 7623, ('Disease', 'Hetionet::DpS::Disease:Symptom', 'Symptom'): 3357, ('Disease', 'Hetionet::DlA::Disease:Anatomy', 'Anatomy'): 3602, ('Compound', 'Hetionet::CtD::Compound:Disease', 'Disease'): 755, ('Compound', 'Hetionet::CbG::Compound:Gene', 'Gene'): 11571, ('Compound', 'Hetionet::CuG::Compound:Gene', 'Gene'): 18756, ('Disease', 'Hetionet::DrD::Disease:Disease', 'Disease'): 543, ('Disease', 'Hetionet::DaG::Disease:Gene', 'Gene'): 12623, ('Compound', 'Hetionet::CpD::Compound:Disease', 'Disease'): 390, ('Anatomy', 'Hetionet::AdG::Anatomy:Gene', 'Gene'): 102240, ('Anatomy', 'Hetionet::AuG::Anatomy:Gene', 'Gene'): 97848, ('Gene', 'Hetionet::GcG::Gene:Gene', 'Gene'): 61690, ('Gene', 'Hetionet::GpMF::Gene:Molecular Function', 'Molecular Function'): 97222, ('Pharmacologic Class', 'Hetionet::PCiC::Pharmacologic Class:Compound', 'Compound'): 1029, ('Gene', 'Hetionet::GpCC::Gene:Cellular Component', 'Cellular Component'): 73566, ('Gene', 'Hetionet::Gr>G::Gene:Gene', 'Gene'): 265672, ('Compound', 'Hetionet::CdG::Compound:Gene', 'Gene'): 21102, ('Disease', 'Hetionet::DuG::Disease:Gene', 'Gene'): 7731, ('Gene', 'Hetionet::GpPW::Gene:Pathway', 'Pathway'): 84372, ('Compound', 'Hetionet::CcSE::Compound:Side Effect', 'Side Effect'): 138944, ('Anatomy', 'Hetionet::AeG::Anatomy:Gene', 'Gene'): 526407, ('Compound', 'INTACT::PHYSICAL ASSOCIATION::Compound:Gene', 'Gene'): 203, ('Compound', 'INTACT::DIRECT INTERACTION::Compound:Gene', 'Gene'): 155, ('Compound', 'INTACT::ASSOCIATION::Compound:Gene', 'Gene'): 1447, ('Gene', 'INTACT::ASSOCIATION::Gene:Gene', 'Gene'): 112390, ('Gene', 'INTACT::PHYSICAL ASSOCIATION::Gene:Gene', 'Gene'): 129318, ('Gene', 'INTACT::COLOCALIZATION::Gene:Gene', 'Gene'): 3468, ('Gene', 'INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 303, ('Gene', 'INTACT::CLEAVAGE REACTION::Gene:Gene', 'Gene'): 93, ('Gene', 'INTACT::DIRECT INTERACTION::Gene:Gene', 'Gene'): 6950, ('Gene', 'INTACT::PHOSPHORYLATION REACTION::Gene:Gene', 'Gene'): 1328, ('Gene', 'INTACT::ADP RIBOSYLATION REACTION::Gene:Gene', 'Gene'): 58, ('Gene', 'INTACT::UBIQUITINATION REACTION::Gene:Gene', 'Gene'): 371, ('Gene', 'INTACT::PROTEIN CLEAVAGE::Gene:Gene', 'Gene'): 67, ('Gene', 'STRING::REACTION::Gene:Gene', 'Gene'): 400426, ('Gene', 'STRING::CATALYSIS::Gene:Gene', 'Gene'): 343533, ('Gene', 'STRING::ACTIVATION::Gene:Gene', 'Gene'): 81355, ('Gene', 'STRING::INHIBITION::Gene:Gene', 'Gene'): 28959, ('Gene', 'STRING::OTHER::Gene:Gene', 'Gene'): 310690, ('Gene', 'STRING::BINDING::Gene:Gene', 'Gene'): 315875, ('Gene', 'STRING::PTMOD::Gene:Gene', 'Gene'): 15113, ('Gene', 'STRING::EXPRESSION::Gene:Gene', 'Gene'): 757},\n",
320 | " metagraph=[('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Anatomy', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Gene'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Compound'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Disease'), ('Gene', 'Tax'), ('Gene', 'Biological Process'), ('Gene', 'Molecular Function'), ('Gene', 'Cellular Component'), ('Gene', 'Pathway'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Gene'), ('Compound', 'Atc'), ('Compound', 'Compound'), ('Compound', 'Compound'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Disease'), ('Compound', 'Side Effect'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Gene'), ('Disease', 'Symptom'), ('Disease', 'Anatomy'), ('Disease', 'Disease'), ('Pharmacologic Class', 'Compound')])\n"
321 | ]
322 | }
323 | ],
324 | "source": [
325 | "print(graph)"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": null,
331 | "metadata": {},
332 | "outputs": [],
333 | "source": []
334 | }
335 | ],
336 | "metadata": {
337 | "kernelspec": {
338 | "display_name": "Python 3",
339 | "language": "python",
340 | "name": "python3"
341 | },
342 | "language_info": {
343 | "codemirror_mode": {
344 | "name": "ipython",
345 | "version": 3
346 | },
347 | "file_extension": ".py",
348 | "mimetype": "text/x-python",
349 | "name": "python",
350 | "nbconvert_exporter": "python",
351 | "pygments_lexer": "ipython3",
352 | "version": "3.7.6"
353 | }
354 | },
355 | "nbformat": 4,
356 | "nbformat_minor": 2
357 | }
358 |
--------------------------------------------------------------------------------
/drug_repurpose/COVID-19_drug_repurposing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# COVID-19 Drug Repurposing via disease-compounds relations\n",
8 | "This example shows how to do drug repurposing using DRKG even with the pretrained model."
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "## Collecting COVID-19 related disease\n",
16 | "At the very beginning we need to collect a list of disease of Corona-Virus(COV) in DRKG. We can easily use the Disease ID that DRKG uses for encoding the disease. Here we take all of the COV disease as target."
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 1,
22 | "metadata": {},
23 | "outputs": [],
24 | "source": [
25 | "COV_disease_list = [\n",
26 | "'Disease::SARS-CoV2 E',\n",
27 | "'Disease::SARS-CoV2 M',\n",
28 | "'Disease::SARS-CoV2 N',\n",
29 | "'Disease::SARS-CoV2 Spike',\n",
30 | "'Disease::SARS-CoV2 nsp1',\n",
31 | "'Disease::SARS-CoV2 nsp10',\n",
32 | "'Disease::SARS-CoV2 nsp11',\n",
33 | "'Disease::SARS-CoV2 nsp12',\n",
34 | "'Disease::SARS-CoV2 nsp13',\n",
35 | "'Disease::SARS-CoV2 nsp14',\n",
36 | "'Disease::SARS-CoV2 nsp15',\n",
37 | "'Disease::SARS-CoV2 nsp2',\n",
38 | "'Disease::SARS-CoV2 nsp4',\n",
39 | "'Disease::SARS-CoV2 nsp5',\n",
40 | "'Disease::SARS-CoV2 nsp5_C145A',\n",
41 | "'Disease::SARS-CoV2 nsp6',\n",
42 | "'Disease::SARS-CoV2 nsp7',\n",
43 | "'Disease::SARS-CoV2 nsp8',\n",
44 | "'Disease::SARS-CoV2 nsp9',\n",
45 | "'Disease::SARS-CoV2 orf10',\n",
46 | "'Disease::SARS-CoV2 orf3a',\n",
47 | "'Disease::SARS-CoV2 orf3b',\n",
48 | "'Disease::SARS-CoV2 orf6',\n",
49 | "'Disease::SARS-CoV2 orf7a',\n",
50 | "'Disease::SARS-CoV2 orf8',\n",
51 | "'Disease::SARS-CoV2 orf9b',\n",
52 | "'Disease::SARS-CoV2 orf9c',\n",
53 | "'Disease::MESH:D045169',\n",
54 | "'Disease::MESH:D045473',\n",
55 | "'Disease::MESH:D001351',\n",
56 | "'Disease::MESH:D065207',\n",
57 | "'Disease::MESH:D028941',\n",
58 | "'Disease::MESH:D058957',\n",
59 | "'Disease::MESH:D006517'\n",
60 | "]"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "## Candidate drugs\n",
68 | "Now we use FDA-approved drugs in Drugbank as candidate drugs. (we exclude drugs with molecule weight < 250) The drug list is in infer\\_drug.tsv"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": 2,
74 | "metadata": {},
75 | "outputs": [],
76 | "source": [
77 | "import csv\n",
78 | "\n",
79 | "# Load entity file\n",
80 | "drug_list = []\n",
81 | "with open(\"./infer_drug.tsv\", newline='', encoding='utf-8') as csvfile:\n",
82 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['drug','ids'])\n",
83 | " for row_val in reader:\n",
84 | " drug_list.append(row_val['drug'])"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 3,
90 | "metadata": {},
91 | "outputs": [
92 | {
93 | "data": {
94 | "text/plain": [
95 | "8104"
96 | ]
97 | },
98 | "execution_count": 3,
99 | "metadata": {},
100 | "output_type": "execute_result"
101 | }
102 | ],
103 | "source": [
104 | "len(drug_list)"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "## Treatment relation"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "Two treatment relations in this context"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": 4,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "treatment = ['Hetionet::CtD::Compound:Disease','GNBR::T::Compound:Disease']"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## Get pretrained model\n",
135 | "We can directly use the pretrianed model to do drug repurposing."
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 5,
141 | "metadata": {},
142 | "outputs": [
143 | {
144 | "name": "stdout",
145 | "output_type": "stream",
146 | "text": [
147 | "Download finished. Unzipping the file...\n"
148 | ]
149 | }
150 | ],
151 | "source": [
152 | "import pandas as pd\n",
153 | "import numpy as np\n",
154 | "import sys\n",
155 | "sys.path.insert(1, '../utils')\n",
156 | "from utils import download_and_extract\n",
157 | "download_and_extract()"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": 6,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "entity_idmap_file = '../data/drkg/embed/entities.tsv'\n",
167 | "relation_idmap_file = '../data/drkg/embed/relations.tsv'"
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "metadata": {},
173 | "source": [
174 | "## Get embeddings for diseases and drugs"
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": 7,
180 | "metadata": {},
181 | "outputs": [],
182 | "source": [
183 | "# Get drugname/disease name to entity ID mappings\n",
184 | "entity_map = {}\n",
185 | "entity_id_map = {}\n",
186 | "relation_map = {}\n",
187 | "with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:\n",
188 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n",
189 | " for row_val in reader:\n",
190 | " entity_map[row_val['name']] = int(row_val['id'])\n",
191 | " entity_id_map[int(row_val['id'])] = row_val['name']\n",
192 | " \n",
193 | "with open(relation_idmap_file, newline='', encoding='utf-8') as csvfile:\n",
194 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n",
195 | " for row_val in reader:\n",
196 | " relation_map[row_val['name']] = int(row_val['id'])\n",
197 | " \n",
198 | "# handle the ID mapping\n",
199 | "drug_ids = []\n",
200 | "disease_ids = []\n",
201 | "for drug in drug_list:\n",
202 | " drug_ids.append(entity_map[drug])\n",
203 | " \n",
204 | "for disease in COV_disease_list:\n",
205 | " disease_ids.append(entity_map[disease])\n",
206 | "\n",
207 | "treatment_rid = [relation_map[treat] for treat in treatment]"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 8,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "# Load embeddings\n",
217 | "import torch as th\n",
218 | "entity_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_entity.npy')\n",
219 | "rel_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_relation.npy')\n",
220 | "\n",
221 | "drug_ids = th.tensor(drug_ids).long()\n",
222 | "disease_ids = th.tensor(disease_ids).long()\n",
223 | "treatment_rid = th.tensor(treatment_rid)\n",
224 | "\n",
225 | "drug_emb = th.tensor(entity_emb[drug_ids])\n",
226 | "treatment_embs = [th.tensor(rel_emb[rid]) for rid in treatment_rid]"
227 | ]
228 | },
229 | {
230 | "cell_type": "markdown",
231 | "metadata": {},
232 | "source": [
233 | "## Drug Repurposing Based on Edge Score\n",
234 | "We use following algorithm to calculate the edge score. Note, here we use logsigmiod to make all scores < 0. The larger the score is, the stronger the $h$ will have $r$ with $t$.\n",
235 | "\n",
236 | "$\\mathbf{d} = \\gamma - ||\\mathbf{h}+\\mathbf{r}-\\mathbf{t}||_{2}$\n",
237 | "\n",
238 | "$\\mathbf{score} = \\log\\left(\\frac{1}{1+\\exp(\\mathbf{-d})}\\right)$\n",
239 | "\n",
240 | "When doing drug repurposing, we only use the treatment related relations."
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 9,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": [
249 | "import torch.nn.functional as fn\n",
250 | "\n",
251 | "gamma=12.0\n",
252 | "def transE_l2(head, rel, tail):\n",
253 | " score = head + rel - tail\n",
254 | " return gamma - th.norm(score, p=2, dim=-1)\n",
255 | "\n",
256 | "scores_per_disease = []\n",
257 | "dids = []\n",
258 | "for rid in range(len(treatment_embs)):\n",
259 | " treatment_emb=treatment_embs[rid]\n",
260 | " for disease_id in disease_ids:\n",
261 | " disease_emb = entity_emb[disease_id]\n",
262 | " score = fn.logsigmoid(transE_l2(drug_emb, treatment_emb, disease_emb))\n",
263 | " scores_per_disease.append(score)\n",
264 | " dids.append(drug_ids)\n",
265 | "scores = th.cat(scores_per_disease)\n",
266 | "dids = th.cat(dids)\n"
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": 10,
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "# sort scores in decending order\n",
276 | "idx = th.flip(th.argsort(scores), dims=[0])\n",
277 | "scores = scores[idx].numpy()\n",
278 | "dids = dids[idx].numpy()"
279 | ]
280 | },
281 | {
282 | "cell_type": "markdown",
283 | "metadata": {},
284 | "source": [
285 | "### Now we output proposed treatments"
286 | ]
287 | },
288 | {
289 | "cell_type": "code",
290 | "execution_count": 11,
291 | "metadata": {},
292 | "outputs": [],
293 | "source": [
294 | "_, unique_indices = np.unique(dids, return_index=True)\n",
295 | "topk=100\n",
296 | "topk_indices = np.sort(unique_indices)[:topk]\n",
297 | "proposed_dids = dids[topk_indices]\n",
298 | "proposed_scores = scores[topk_indices]"
299 | ]
300 | },
301 | {
302 | "cell_type": "markdown",
303 | "metadata": {},
304 | "source": [
305 | "Now we list the pairs of in form of (drug, treat, disease, score) \n",
306 | "\n",
307 | "We select top K relevent drugs according the edge score"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 12,
313 | "metadata": {},
314 | "outputs": [
315 | {
316 | "name": "stdout",
317 | "output_type": "stream",
318 | "text": [
319 | "Compound::DB00811\t-0.21416784822940826\n",
320 | "Compound::DB00993\t-0.8350892663002014\n",
321 | "Compound::DB00635\t-0.8974801898002625\n",
322 | "Compound::DB01082\t-0.9854875802993774\n",
323 | "Compound::DB01234\t-0.9984006881713867\n",
324 | "Compound::DB00982\t-1.0160722732543945\n",
325 | "Compound::DB00563\t-1.0189464092254639\n",
326 | "Compound::DB00290\t-1.064104437828064\n",
327 | "Compound::DB01394\t-1.080674648284912\n",
328 | "Compound::DB01222\t-1.084547519683838\n",
329 | "Compound::DB00415\t-1.0853980779647827\n",
330 | "Compound::DB01004\t-1.096668004989624\n",
331 | "Compound::DB00860\t-1.1004775762557983\n",
332 | "Compound::DB00681\t-1.1011559963226318\n",
333 | "Compound::DB00688\t-1.125687599182129\n",
334 | "Compound::DB00624\t-1.1428285837173462\n",
335 | "Compound::DB00959\t-1.1618402004241943\n",
336 | "Compound::DB00115\t-1.1868144273757935\n",
337 | "Compound::DB00091\t-1.1906721591949463\n",
338 | "Compound::DB01024\t-1.2051165103912354\n",
339 | "Compound::DB00741\t-1.2147064208984375\n",
340 | "Compound::DB00441\t-1.2320444583892822\n",
341 | "Compound::DB00158\t-1.2346539497375488\n",
342 | "Compound::DB00499\t-1.2525147199630737\n",
343 | "Compound::DB00929\t-1.2730510234832764\n",
344 | "Compound::DB00770\t-1.2825534343719482\n",
345 | "Compound::DB01331\t-1.2960500717163086\n",
346 | "Compound::DB00958\t-1.2967796325683594\n",
347 | "Compound::DB02527\t-1.303438663482666\n",
348 | "Compound::DB00196\t-1.3053392171859741\n",
349 | "Compound::DB00537\t-1.3131829500198364\n",
350 | "Compound::DB00644\t-1.3131871223449707\n",
351 | "Compound::DB01048\t-1.3267226219177246\n",
352 | "Compound::DB00552\t-1.3272088766098022\n",
353 | "Compound::DB00328\t-1.3286101818084717\n",
354 | "Compound::DB00171\t-1.3300385475158691\n",
355 | "Compound::DB01212\t-1.3330755233764648\n",
356 | "Compound::DB09093\t-1.3382999897003174\n",
357 | "Compound::DB00783\t-1.338560938835144\n",
358 | "Compound::DB09341\t-1.3396968841552734\n",
359 | "Compound::DB00558\t-1.3425884246826172\n",
360 | "Compound::DB05382\t-1.3575129508972168\n",
361 | "Compound::DB01112\t-1.3584508895874023\n",
362 | "Compound::DB00515\t-1.3608112335205078\n",
363 | "Compound::DB01101\t-1.381548523902893\n",
364 | "Compound::DB01165\t-1.3838160037994385\n",
365 | "Compound::DB01183\t-1.3862146139144897\n",
366 | "Compound::DB00815\t-1.3863483667373657\n",
367 | "Compound::DB00755\t-1.3881785869598389\n",
368 | "Compound::DB00198\t-1.3885014057159424\n",
369 | "Compound::DB00480\t-1.3935325145721436\n",
370 | "Compound::DB00806\t-1.3996552228927612\n",
371 | "Compound::DB01656\t-1.3999741077423096\n",
372 | "Compound::DB00759\t-1.404650092124939\n",
373 | "Compound::DB00917\t-1.4116020202636719\n",
374 | "Compound::DB01181\t-1.4148889780044556\n",
375 | "Compound::DB01039\t-1.4176580905914307\n",
376 | "Compound::DB00512\t-1.4207379817962646\n",
377 | "Compound::DB01233\t-1.4211887121200562\n",
378 | "Compound::DB11996\t-1.425789475440979\n",
379 | "Compound::DB00738\t-1.4274098873138428\n",
380 | "Compound::DB00716\t-1.4327492713928223\n",
381 | "Compound::DB03461\t-1.437927484512329\n",
382 | "Compound::DB00591\t-1.4404338598251343\n",
383 | "Compound::DB01327\t-1.4408743381500244\n",
384 | "Compound::DB00131\t-1.4446886777877808\n",
385 | "Compound::DB00693\t-1.4460749626159668\n",
386 | "Compound::DB00369\t-1.4505752325057983\n",
387 | "Compound::DB04630\t-1.453115463256836\n",
388 | "Compound::DB00878\t-1.456466555595398\n",
389 | "Compound::DB08818\t-1.4633680582046509\n",
390 | "Compound::DB00682\t-1.4691765308380127\n",
391 | "Compound::DB01068\t-1.4700121879577637\n",
392 | "Compound::DB00446\t-1.4720206260681152\n",
393 | "Compound::DB01115\t-1.4729849100112915\n",
394 | "Compound::DB00355\t-1.4770021438598633\n",
395 | "Compound::DB01030\t-1.485068678855896\n",
396 | "Compound::DB00620\t-1.4973516464233398\n",
397 | "Compound::DB00396\t-1.4976921081542969\n",
398 | "Compound::DB01073\t-1.4987037181854248\n",
399 | "Compound::DB00640\t-1.5026229619979858\n",
400 | "Compound::DB00999\t-1.5034282207489014\n",
401 | "Compound::DB01060\t-1.504364252090454\n",
402 | "Compound::DB00493\t-1.5072362422943115\n",
403 | "Compound::DB01240\t-1.5090957880020142\n",
404 | "Compound::DB00364\t-1.509944200515747\n",
405 | "Compound::DB01263\t-1.511993169784546\n",
406 | "Compound::DB00746\t-1.513066053390503\n",
407 | "Compound::DB00718\t-1.5183149576187134\n",
408 | "Compound::DB01065\t-1.5207160711288452\n",
409 | "Compound::DB01205\t-1.521277904510498\n",
410 | "Compound::DB01137\t-1.5229592323303223\n",
411 | "Compound::DB08894\t-1.5239660739898682\n",
412 | "Compound::DB00813\t-1.5308701992034912\n",
413 | "Compound::DB01157\t-1.5316557884216309\n",
414 | "Compound::DB04570\t-1.5430843830108643\n",
415 | "Compound::DB00459\t-1.5503207445144653\n",
416 | "Compound::DB01752\t-1.5541703701019287\n",
417 | "Compound::DB00775\t-1.5559712648391724\n",
418 | "Compound::DB01610\t-1.5563474893569946\n"
419 | ]
420 | }
421 | ],
422 | "source": [
423 | "for i in range(topk):\n",
424 | " drug = int(proposed_dids[i])\n",
425 | " score = proposed_scores[i]\n",
426 | " \n",
427 | " print(\"{}\\t{}\".format(entity_id_map[drug], score))"
428 | ]
429 | },
430 | {
431 | "cell_type": "markdown",
432 | "metadata": {},
433 | "source": [
434 | "### Check Clinial Trial Drugs\n",
435 | "There are seven clinial trial drugs hit in top100. (Note: Ribavirin exists in DRKG as a treatment for SARS)"
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": 13,
441 | "metadata": {},
442 | "outputs": [
443 | {
444 | "name": "stdout",
445 | "output_type": "stream",
446 | "text": [
447 | "[0]\tRibavirin\t-0.21416784822940826\n",
448 | "[4]\tDexamethasone\t-0.9984006881713867\n",
449 | "[8]\tColchicine\t-1.080674648284912\n",
450 | "[16]\tMethylprednisolone\t-1.1618402004241943\n",
451 | "[49]\tOseltamivir\t-1.3885014057159424\n",
452 | "[87]\tDeferoxamine\t-1.513066053390503\n"
453 | ]
454 | }
455 | ],
456 | "source": [
457 | "clinical_drugs_file = './COVID19_clinical_trial_drugs.tsv'\n",
458 | "clinical_drug_map = {}\n",
459 | "with open(clinical_drugs_file, newline='', encoding='utf-8') as csvfile:\n",
460 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['id', 'drug_name','drug_id'])\n",
461 | " for row_val in reader:\n",
462 | " clinical_drug_map[row_val['drug_id']] = row_val['drug_name']\n",
463 | " \n",
464 | "for i in range(topk):\n",
465 | " drug = entity_id_map[int(proposed_dids[i])][10:17]\n",
466 | " if clinical_drug_map.get(drug, None) is not None:\n",
467 | " score = proposed_scores[i]\n",
468 | " print(\"[{}]\\t{}\\t{}\".format(i, clinical_drug_map[drug],score , proposed_scores[i]))"
469 | ]
470 | },
471 | {
472 | "cell_type": "code",
473 | "execution_count": 14,
474 | "metadata": {},
475 | "outputs": [
476 | {
477 | "data": {
478 | "text/plain": [
479 | "32"
480 | ]
481 | },
482 | "execution_count": 14,
483 | "metadata": {},
484 | "output_type": "execute_result"
485 | }
486 | ],
487 | "source": [
488 | "len(clinical_drug_map)"
489 | ]
490 | },
491 | {
492 | "cell_type": "code",
493 | "execution_count": null,
494 | "metadata": {},
495 | "outputs": [],
496 | "source": []
497 | }
498 | ],
499 | "metadata": {
500 | "kernelspec": {
501 | "display_name": "Python 3",
502 | "language": "python",
503 | "name": "python3"
504 | },
505 | "language_info": {
506 | "codemirror_mode": {
507 | "name": "ipython",
508 | "version": 3
509 | },
510 | "file_extension": ".py",
511 | "mimetype": "text/x-python",
512 | "name": "python",
513 | "nbconvert_exporter": "python",
514 | "pygments_lexer": "ipython3",
515 | "version": "3.7.6"
516 | }
517 | },
518 | "nbformat": 4,
519 | "nbformat_minor": 4
520 | }
521 |
--------------------------------------------------------------------------------
/drug_repurpose/COVID-19_drug_repurposing_via_genes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# COVID-19 Drug Repurposing via gene-compounds relations\n",
8 | "This example shows how to do drug repurposing using DRKG even with the pretrained model."
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "## Collecting COVID-19 related disease\n",
16 | "At the very beginning we need to collect a list of associated genes for Corona-Virus(COV) in DRKG. "
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 1,
22 | "metadata": {},
23 | "outputs": [
24 | {
25 | "name": "stdout",
26 | "output_type": "stream",
27 | "text": [
28 | "442\n"
29 | ]
30 | }
31 | ],
32 | "source": [
33 | "import pandas as pd\n",
34 | "import numpy as np\n",
35 | "file='coronavirus-related-host-genes.tsv'\n",
36 | "df = pd.read_csv(file, sep=\"\\t\")\n",
37 | "cov_genes = np.unique(df.values[:,2]).tolist()\n",
38 | "file='covid19-host-genes.tsv'\n",
39 | "df = pd.read_csv(file, sep=\"\\t\")\n",
40 | "cov2_genes = np.unique(df.values[:,2]).tolist()\n",
41 | "# keep unique related genes\n",
42 | "\n",
43 | "cov_related_genes=list(set(cov_genes+cov2_genes))\n",
44 | "#cov_related_genes=list(set(cov2_genes))\n",
45 | "print(len(cov_related_genes))"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "## Candidate drugs\n",
53 | "Now we use FDA-approved drugs in Drugbank as candidate drugs. (we exclude drugs with molecule weight < 250) The drug list is in infer\\_drug.tsv"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 2,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "import csv\n",
63 | "\n",
64 | "# Load entity file\n",
65 | "drug_list = []\n",
66 | "with open(\"./infer_drug.tsv\", newline='', encoding='utf-8') as csvfile:\n",
67 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['drug','ids'])\n",
68 | " for row_val in reader:\n",
69 | " drug_list.append(row_val['drug'])"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 3,
75 | "metadata": {},
76 | "outputs": [
77 | {
78 | "data": {
79 | "text/plain": [
80 | "8104"
81 | ]
82 | },
83 | "execution_count": 3,
84 | "metadata": {},
85 | "output_type": "execute_result"
86 | }
87 | ],
88 | "source": [
89 | "len(drug_list)"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "## Inhibits relation"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "metadata": {},
102 | "source": [
103 | "One inhibit relation in this context"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 4,
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "treatment = ['GNBR::N::Compound:Gene']#'DRUGBANK::target::Compound:Gene','DGIDB::INHIBITOR::Gene:Compound']"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "## Get pretrained model\n",
120 | "We can directly use the pretrianed model to do drug repurposing."
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 5,
126 | "metadata": {},
127 | "outputs": [],
128 | "source": [
129 | "import pandas as pd\n",
130 | "import numpy as np\n",
131 | "import sys\n",
132 | "import csv\n",
133 | "sys.path.insert(1, '../utils')\n",
134 | "from utils import download_and_extract\n",
135 | "download_and_extract()"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 6,
141 | "metadata": {},
142 | "outputs": [],
143 | "source": [
144 | "entity_idmap_file = '../data/drkg/embed/entities.tsv'\n",
145 | "relation_idmap_file = '../data/drkg/embed/relations.tsv'"
146 | ]
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "## Get embeddings for genes and drugs"
153 | ]
154 | },
155 | {
156 | "cell_type": "code",
157 | "execution_count": 7,
158 | "metadata": {},
159 | "outputs": [],
160 | "source": [
161 | "# Get drugname/disease name to entity ID mappings\n",
162 | "entity_map = {}\n",
163 | "entity_id_map = {}\n",
164 | "relation_map = {}\n",
165 | "with open(entity_idmap_file, newline='', encoding='utf-8') as csvfile:\n",
166 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n",
167 | " for row_val in reader:\n",
168 | " entity_map[row_val['name']] = int(row_val['id'])\n",
169 | " entity_id_map[int(row_val['id'])] = row_val['name']\n",
170 | " \n",
171 | "with open(relation_idmap_file, newline='', encoding='utf-8') as csvfile:\n",
172 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['name','id'])\n",
173 | " for row_val in reader:\n",
174 | " relation_map[row_val['name']] = int(row_val['id'])\n",
175 | " \n",
176 | "# handle the ID mapping\n",
177 | "drug_ids = []\n",
178 | "gene_ids = []\n",
179 | "for drug in drug_list:\n",
180 | " drug_ids.append(entity_map[drug])\n",
181 | " \n",
182 | "for gene in cov_related_genes:\n",
183 | " gene_ids.append(entity_map[gene])\n",
184 | "\n",
185 | "treatment_rid = [relation_map[treat] for treat in treatment]"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 8,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "# Load embeddings\n",
195 | "import torch as th\n",
196 | "entity_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_entity.npy')\n",
197 | "rel_emb = np.load('../data/drkg/embed/DRKG_TransE_l2_relation.npy')\n",
198 | "\n",
199 | "drug_ids = th.tensor(drug_ids).long()\n",
200 | "gene_ids = th.tensor(gene_ids).long()\n",
201 | "treatment_rid = th.tensor(treatment_rid)\n",
202 | "\n",
203 | "drug_emb = th.tensor(entity_emb[drug_ids])\n",
204 | "treatment_embs = [th.tensor(rel_emb[rid]) for rid in treatment_rid]"
205 | ]
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "metadata": {},
210 | "source": [
211 | "## Drug Repurposing Based on Edge Score\n",
212 | "We use following algorithm to calculate the edge score. Note, here we use logsigmiod to make all scores < 0. The larger the score is, the stronger the $h$ will have $r$ with $t$.\n",
213 | "\n",
214 | "$\\mathbf{d} = \\gamma - ||\\mathbf{h}+\\mathbf{r}-\\mathbf{t}||_{2}$\n",
215 | "\n",
216 | "$\\mathbf{score} = \\log\\left(\\frac{1}{1+\\exp(\\mathbf{-d})}\\right)$\n",
217 | "\n",
218 | "When doing drug repurposing, we only use the treatment related relations."
219 | ]
220 | },
221 | {
222 | "cell_type": "code",
223 | "execution_count": 9,
224 | "metadata": {},
225 | "outputs": [],
226 | "source": [
227 | "import torch.nn.functional as fn\n",
228 | "\n",
229 | "gamma=12.0\n",
230 | "def transE_l2(head, rel, tail):\n",
231 | " score = head + rel - tail\n",
232 | " return gamma - th.norm(score, p=2, dim=-1)\n",
233 | "\n",
234 | "scores_per_gene = []\n",
235 | "dids_per_gene = []\n",
236 | "for rid in range(len(treatment_embs)):\n",
237 | " treatment_emb=treatment_embs[rid]\n",
238 | " for gene_id in gene_ids:\n",
239 | " gene_emb = th.tensor(entity_emb[gene_id])\n",
240 | " if treatment[rid]=='DGIDB::INHIBITOR::Gene:Compound':\n",
241 | " score = fn.logsigmoid(transE_l2(gene_emb, treatment_emb,\n",
242 | " drug_emb))\n",
243 | " else:\n",
244 | " score = fn.logsigmoid(transE_l2(drug_emb, treatment_emb,\n",
245 | " gene_emb))\n",
246 | " scores_per_gene.append(score)\n",
247 | " dids_per_gene.append(drug_ids)\n",
248 | "scores = th.cat(scores_per_gene)\n",
249 | "dids = th.cat(dids_per_gene)\n"
250 | ]
251 | },
252 | {
253 | "cell_type": "markdown",
254 | "metadata": {},
255 | "source": [
256 | "### Check clinical trial drugs per gene\n",
257 | "Here we load the clinical trial drugs"
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 10,
263 | "metadata": {},
264 | "outputs": [],
265 | "source": [
266 | "clinical_drugs_file = './COVID19_clinical_trial_drugs.tsv'\n",
267 | "clinical_drug_map = {}\n",
268 | "with open(clinical_drugs_file, newline='', encoding='utf-8') as csvfile:\n",
269 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['id', 'drug_name','drug_id'])\n",
270 | " for row_val in reader:\n",
271 | " clinical_drug_map[row_val['drug_id']] = row_val['drug_name']"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "Next we measure some statistics per gene."
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": 13,
284 | "metadata": {
285 | "scrolled": true
286 | },
287 | "outputs": [
288 | {
289 | "name": "stdout",
290 | "output_type": "stream",
291 | "text": [
292 | "Gene::6441\t9\t\n",
293 | "[0]Dexamethasone\n",
294 | "[29]Methylprednisolone\n",
295 | "[30]Ribavirin\n",
296 | "[40]Thalidomide\n",
297 | "[46]Chloroquine\n",
298 | "[77]Losartan\n",
299 | "[86]Sargramostim\n",
300 | "[88]Azithromycin\n",
301 | "[90]Hydroxychloroquine\n",
302 | "\n",
303 | "DB01234\tDexamethasone\t401\t17.424322932617844\n",
304 | "DB01041\tThalidomide\t336\t9.52602832899466\n",
305 | "DB00608\tChloroquine\t258\t5.281556104219857\n",
306 | "DB00746\tDeferoxamine\t111\t2.3803197362314727\n",
307 | "DB01394\tColchicine\t108\t1.9397152439066307\n",
308 | "DB00959\tMethylprednisolone\t105\t1.6800667504790185\n",
309 | "DB00678\tLosartan\t92\t1.9905909204249115\n",
310 | "DB00811\tRibavirin\t92\t2.0302922908647756\n",
311 | "DB08877\tRuxolitinib\t47\t0.7744534092963637\n",
312 | "DB08895\tTofacitinib\t33\t0.46233716095307054\n",
313 | "DB01611\tHydroxychloroquine\t14\t0.20167135495496702\n",
314 | "DB05511\tPiclidenoson\t6\t0.1513038675225646\n",
315 | "DB00207\tAzithromycin\t5\t0.05829492617697397\n",
316 | "DB00198\tOseltamivir\t1\t0.2\n",
317 | "DB00020\tSargramostim\t1\t0.011494252873563218\n"
318 | ]
319 | }
320 | ],
321 | "source": [
322 | "maxhit=0\n",
323 | "drugs_in_top_k={}\n",
324 | "drugsfr_in_top_k={}\n",
325 | "for i in range(len(scores_per_gene)):\n",
326 | " score=scores_per_gene[i]\n",
327 | " did=dids_per_gene[i]\n",
328 | " idx = th.flip(th.argsort(score), dims=[0])\n",
329 | " score = score[idx].numpy()\n",
330 | " did = did[idx].numpy()\n",
331 | " #print(did)\n",
332 | " _, unique_indices = np.unique(did, return_index=True)\n",
333 | " topk=100\n",
334 | " topk_indices = np.sort(unique_indices)[:topk]\n",
335 | " proposed_did = did[topk_indices]\n",
336 | " proposed_score = score[topk_indices]\n",
337 | " found_in_top_k=0\n",
338 | " found_drugs=\"\\n\"\n",
339 | " for j in range(topk):\n",
340 | " drug = entity_id_map[int(proposed_did[j])][10:17]\n",
341 | " if clinical_drug_map.get(drug, None) is not None:\n",
342 | " found_in_top_k+=1\n",
343 | " score = proposed_score[j]\n",
344 | " if drug in drugs_in_top_k:\n",
345 | " drugs_in_top_k[drug]+=1\n",
346 | " drugsfr_in_top_k[drug]+=1/(j+1)\n",
347 | " else:\n",
348 | " drugs_in_top_k[drug]=1\n",
349 | " drugsfr_in_top_k[drug]=1/(j+1)\n",
350 | " found_drugs+=\"[{}]{}\\n\".format(j, clinical_drug_map[drug])\n",
351 | " #print(\"[{}]{}\".format(j, clinical_drug_map[drug]))\n",
352 | " #print(\"{}\\t{}\".format(cov_related_genes[i], found_in_top_k))\n",
353 | " if maxhit< found_in_top_k:\n",
354 | " maxhit=found_in_top_k\n",
355 | " maxgene=cov_related_genes[i]\n",
356 | " max_dugs=found_drugs\n",
357 | "print(\"{}\\t{}\\t{}\".format(maxgene, maxhit,max_dugs))\n",
358 | "\n",
359 | "res=[[drug, clinical_drug_map[drug] ,drugs_in_top_k[drug],drugsfr_in_top_k[drug]] for drug in drugs_in_top_k.keys()]\n",
360 | "res=reversed(sorted(res, key=lambda x : x[2]))\n",
361 | "for drug in res:\n",
362 | " print(\"{}\\t{}\\t{}\\t{}\".format(drug[0], drug[1] ,drug[2],drug[3]))\n",
363 | " "
364 | ]
365 | },
366 | {
367 | "cell_type": "code",
368 | "execution_count": null,
369 | "metadata": {},
370 | "outputs": [],
371 | "source": []
372 | }
373 | ],
374 | "metadata": {
375 | "kernelspec": {
376 | "display_name": "Python 3",
377 | "language": "python",
378 | "name": "python3"
379 | },
380 | "language_info": {
381 | "codemirror_mode": {
382 | "name": "ipython",
383 | "version": 3
384 | },
385 | "file_extension": ".py",
386 | "mimetype": "text/x-python",
387 | "name": "python",
388 | "nbconvert_exporter": "python",
389 | "pygments_lexer": "ipython3",
390 | "version": "3.7.6"
391 | }
392 | },
393 | "nbformat": 4,
394 | "nbformat_minor": 4
395 | }
396 |
--------------------------------------------------------------------------------
/drug_repurpose/COVID19_clinical_trial_drugs.tsv:
--------------------------------------------------------------------------------
1 | 1 Deferoxamine DB00746
2 | 2 Piclidenoson DB05511
3 | 3 Losartan DB00678
4 | 4 Ibuprofen DB01050
5 | 5 Favipiravir DB12466
6 | 6 Ruxolitinib DB08877
7 | 7 Dexamethasone DB01234
8 | 8 Thalidomide DB01041
9 | 9 Tranexamic acid DB00302
10 | 10 Tocilizumab DB06273
11 | 11 Sarilumab DB11767
12 | 12 Tradipitant DB12580
13 | 13 Angiotensin 1-7 DB11720
14 | 14 Oseltamivir DB00198
15 | 15 Baricitinib DB11817
16 | 16 Sargramostim DB00020
17 | 17 Chloroquine DB00608
18 | 18 Anakinra DB00026
19 | 19 Mavrilimumab DB12534
20 | 20 Azithromycin DB00207
21 | 21 Tetrandrine DB14066
22 | 22 Ribavirin DB00811
23 | 23 Tofacitinib DB08895
24 | 24 Siltuximab DB09036
25 | 25 Nivolumab DB09035
26 | 26 Nitric Oxide DB00435
27 | 27 Colchicine DB01394
28 | 28 Remdesivir DB14761
29 | 29 Hydroxychloroquine DB01611
30 | 30 Eculizumab DB01257
31 | 31 Methylprednisolone DB00959
32 | 32 Bevacizumab DB00112
--------------------------------------------------------------------------------
/drug_repurpose/Readme.md:
--------------------------------------------------------------------------------
1 | # Drug Repurposing Examples
2 | Here we show how to do drug repurposing using pretrained knowledge graph embedding of DRKG.
3 |
4 | ## Drug Repurposing for Covid-19 via Disease
5 | [COVID-19_drug_repurposing.ipynb](COVID-19_drug_repurposing.ipynb) shows how to do drug repurposing for Covid-19 by predicting links between the disease entities and the drug entitites in the DRKG. The target disease entities are listed in the notebook and the candidate drug entities are listed in [infer_drug.tsv](infer_drug.tsv). The drugs are all from Drugbank and we exclude drugs with molecule weight less than 250 daltons which results in 8104 candidates. Two edge types are chosen here: Hetionet::CtD::Compound:Disease' and 'GNBR::T::Compound:Disease, which represent the treatment relationship between a certain drug for a disease. To evaluate the repurposed drugs, we compare them with the clinical drugs as there is no treatment for Covid-19 right now. The list of clinical drugs are shown in [COVID19_clinical_trial_drugs.tsv](COVID19_clinical_trial_drugs.tsv) which is collected from http://www.covid19-trials.com/.
6 |
7 | ## Drug Repurposing for Covid-19 via Disease Related Host Genes
8 | [COVID-19_drug_repurposing_via_genes.ipynb](COVID-19_drug_repurposing_via_genes.ipynb) shows how to do drug repurposing for Covid-19 by predicting links between the disease related host gene entities and the drug entities in the DRKG. The target host gene entities are listed in [covid19-host-genes.tsv](covid19-host-genes.tsv) and [coronavirus-related-host-genes.tsv](coronavirus-related-host-genes.tsv). The drugs are all from Drugbank and we exclude drugs with molecule weight less than 250 daltons which results in 8104 candidates. The edge type used here is GNBR::N::Compound:Gene. One can also use DRUGBANK::target::Compound:Gene, DGIDB::INHIBITOR::Gene:Compound or combination of them. To evaluate the repurposed drugs, we also compare them with the clinical drugs.
9 |
--------------------------------------------------------------------------------
/drug_repurpose/coronavirus-related-host-genes.tsv:
--------------------------------------------------------------------------------
1 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2931
2 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2932
3 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::26986
4 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8761
5 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178
6 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3181
7 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4869
8 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10146
9 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9908
10 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6143
11 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::142
12 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4691
13 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1653
14 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23429
15 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5478
16 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::408050
17 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::2280
18 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9360
19 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4140
20 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10465
21 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11123
22 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9146
23 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7917
24 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23564
25 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::819
26 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::25978
27 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::116841
28 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23609
29 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7177
30 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::710
31 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7681
32 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5683
33 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::81926
34 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5204
35 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::57409
36 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4705
37 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::79001
38 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::81887
39 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::55506
40 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6224
41 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11200
42 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7013
43 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10540
44 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1655
45 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::119710
46 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8665
47 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1915
48 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::857
49 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3551
50 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7329
51 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6449
52 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9550
53 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::689
54 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::22809
55 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4539
56 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4513
57 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178
58 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272
59 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::339390
60 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::30835
61 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10332
62 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3661
63 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3838
64 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6441
65 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5478
66 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::598
67 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::599
68 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4170
69 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::597
70 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::596
71 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6502
72 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3840
73 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8575
74 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::928
75 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7113
76 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3551
77 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::290
78 | Disease::MESH:D058957 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272
79 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::85437
80 | Disease::MESH:D065207 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::1803
81 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3178
82 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10492
83 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5725
84 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::634
85 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::85437
86 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::59272
87 | Disease::MESH:D001351 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::302
88 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3181
89 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::220988
90 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::50
91 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3337
92 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3329
93 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3313
94 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::9276
95 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23521
96 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3646
97 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8668
98 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4836
99 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::128866
100 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8665
101 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::8729
102 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6241
103 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3832
104 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5707
105 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6729
106 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::84955
107 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4666
108 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::51429
109 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::689
110 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::23256
111 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5701
112 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7942
113 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::11344
114 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10652
115 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3837
116 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6811
117 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6776
118 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::51187
119 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::91452
120 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5245
121 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::3725
122 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::6774
123 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::5499
124 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::92521
125 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10875
126 | Disease::MESH:D045473 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7514
127 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::7040
128 | Disease::MESH:D028941 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::10875
129 | Disease::MESH:D006517 bioarx::Coronavirus_ass_host_gene::Disease:Gene Gene::4088
130 |
--------------------------------------------------------------------------------
/drug_repurpose/covid19-host-genes.tsv:
--------------------------------------------------------------------------------
1 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8546
2 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23476
3 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6046
4 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10283
5 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::124245
6 | Disease::SARS-CoV2 E bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57153
7 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9512
8 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10897
9 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::481
10 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::34
11 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2108
12 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2040
13 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2677
14 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::523
15 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5714
16 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7905
17 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23203
18 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::196527
19 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10531
20 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10463
21 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::60493
22 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::148867
23 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10426
24 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79934
25 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::113174
26 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::92840
27 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::92105
28 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::89874
29 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10844
30 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80222
31 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57142
32 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51439
33 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10157
34 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26993
35 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25980
36 | Disease::SARS-CoV2 M bioarx::Covid2_acc_host_gene::Disease:Gene Gene::28969
37 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9136
38 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26986
39 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1459
40 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1460
41 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10146
42 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8761
43 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23367
44 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25940
45 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79753
46 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5976
47 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4343
48 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9908
49 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9188
50 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55131
51 | Disease::SARS-CoV2 N bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25873
52 | Disease::SARS-CoV2 Spike bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51125
53 | Disease::SARS-CoV2 Spike bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25921
54 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5422
55 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5557
56 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5558
57 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23649
58 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79709
59 | Disease::SARS-CoV2 nsp1 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5318
60 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::161
61 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2671
62 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57222
63 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1173
64 | Disease::SARS-CoV2 nsp10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80273
65 | Disease::SARS-CoV2 nsp11 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6902
66 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55206
67 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10295
68 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10270
69 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23077
70 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10569
71 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8737
72 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9898
73 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::219743
74 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51248
75 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84726
76 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55833
77 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::24149
78 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64784
79 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::159195
80 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29066
81 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23185
82 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55285
83 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6938
84 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::53938
85 | Disease::SARS-CoV2 nsp12 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54477
86 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9755
87 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11113
88 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3281
89 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5116
90 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11116
91 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5576
92 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5566
93 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5577
94 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5962
95 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1063
96 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7088
97 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7090
98 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::166
99 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2802
100 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2801
101 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2804
102 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56850
103 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9857
104 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9659
105 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9662
106 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23177
107 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11064
108 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23085
109 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9648
110 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79745
111 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51199
112 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::201134
113 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::145282
114 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8975
115 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79571
116 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::152789
117 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55755
118 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10142
119 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64689
120 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79443
121 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79078
122 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11190
123 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29110
124 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51361
125 | Disease::SARS-CoV2 nsp13 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22981
126 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2717
127 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3615
128 | Disease::SARS-CoV2 nsp14 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23408
129 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10204
130 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::382
131 | Disease::SARS-CoV2 nsp15 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10193
132 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11001
133 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9470
134 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5447
135 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5910
136 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23325
137 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23307
138 | Disease::SARS-CoV2 nsp2 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26058
139 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3416
140 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26519
141 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::440138
142 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23225
143 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::90580
144 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55735
145 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26515
146 | Disease::SARS-CoV2 nsp4 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26520
147 | Disease::SARS-CoV2 nsp5 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3066
148 | Disease::SARS-CoV2 nsp5_C145A bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2876
149 | Disease::SARS-CoV2 nsp5_C145A bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55621
150 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10632
151 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::537
152 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10280
153 | Disease::SARS-CoV2 nsp6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79572
154 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8540
155 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80777
156 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2181
157 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1727
158 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5898
159 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1312
160 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5878
161 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7879
162 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4218
163 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5862
164 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10890
165 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51552
166 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::387
167 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5861
168 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2782
169 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2787
170 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10960
171 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7841
172 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26092
173 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64757
174 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::169714
175 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9653
176 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::91942
177 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51097
178 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::949
179 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57106
180 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79877
181 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26355
182 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::131118
183 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55829
184 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80142
185 | Disease::SARS-CoV2 nsp7 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22931
186 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10199
187 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6731
188 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11101
189 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7468
190 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6728
191 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6729
192 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64432
193 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1662
194 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51574
195 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56257
196 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25983
197 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::11340
198 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79731
199 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79954
200 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79080
201 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51091
202 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56915
203 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51010
204 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26574
205 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25831
206 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51116
207 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64969
208 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23404
209 | Disease::SARS-CoV2 nsp8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23107
210 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2963
211 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2200
212 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2201
213 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8021
214 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23636
215 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10238
216 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7458
217 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::53371
218 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::57534
219 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23111
220 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::91754
221 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84858
222 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4927
223 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9818
224 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27430
225 | Disease::SARS-CoV2 nsp9 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10516
226 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5538
227 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8453
228 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55700
229 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79178
230 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79699
231 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::26521
232 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9978
233 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6921
234 | Disease::SARS-CoV2 orf10 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6923
235 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3162
236 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::286827
237 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::151188
238 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23339
239 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23155
240 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55823
241 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::25777
242 | Disease::SARS-CoV2 orf3a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::29880
243 | Disease::SARS-CoV2 orf3b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::30968
244 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4928
245 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8480
246 | Disease::SARS-CoV2 orf6 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23787
247 | Disease::SARS-CoV2 orf7a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55027
248 | Disease::SARS-CoV2 orf7a bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23195
249 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5352
250 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1861
251 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8614
252 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5327
253 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::3688
254 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::284106
255 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1291
256 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5817
257 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::1786
258 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4015
259 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::5046
260 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::83729
261 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10577
262 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4240
263 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10956
264 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4884
265 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79070
266 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::143888
267 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::56605
268 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23646
269 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80020
270 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79586
271 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::126789
272 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23065
273 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8836
274 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27248
275 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23765
276 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55768
277 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::90161
278 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::6388
279 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4758
280 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9518
281 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80213
282 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23071
283 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::80267
284 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64374
285 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23509
286 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::64093
287 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79666
288 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54850
289 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55757
290 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54480
291 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9510
292 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::10525
293 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51661
294 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8754
295 | Disease::SARS-CoV2 orf8 bioarx::Covid2_acc_host_gene::Disease:Gene Gene::60681
296 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9368
297 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27243
298 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7812
299 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9868
300 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4140
301 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2011
302 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51611
303 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79077
304 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4139
305 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::58155
306 | Disease::SARS-CoV2 orf9b bioarx::Covid2_acc_host_gene::Disease:Gene Gene::9529
307 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7993
308 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::8733
309 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::7466
310 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4363
311 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::2150
312 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::22937
313 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::23333
314 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27346
315 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55676
316 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::202018
317 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79956
318 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79671
319 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::162427
320 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::84720
321 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55711
322 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51295
323 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::79053
324 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::55116
325 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::27069
326 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::28976
327 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::54602
328 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::617
329 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::51103
330 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::50999
331 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::4715
332 | Disease::SARS-CoV2 orf9c bioarx::Covid2_acc_host_gene::Disease:Gene Gene::94005
333 |
--------------------------------------------------------------------------------
/drugbank_info/README.md:
--------------------------------------------------------------------------------
1 | # Information about Compounds from DrugBank
2 |
3 | The KG currently has 9708 compounds from DrugBank. DB05697, DB06517, and DB15351 have been deprecated from DrugBank
4 | in the sense that there are no associated pages. DB02709 is a duplicate of DB05073 and should be merged
5 |
6 | For the rest 9704 ids, 8968 are characterized as "small-molecule" typed drugs in DrugBank and 736 are
7 | characterized as "Biotech" typed drugs in DrugBank. A full list of them are separately stored in `drugbank_smiles.txt`
8 | and `drugbank_biotech.txt`.
9 |
10 | For the 8968 "small-molecule" typed drugs, we manage to extract SMILES
11 | (The simplified molecular-input line-entry system) for 8807 of them from DrugBank, PubChem, KEGG, ChEMBL, and
12 | ChemSpider. The SMILES can be found in `drugbank_smiles.txt`. For the rest "small-molecule" typed drugs, SMILES is not
13 | available either because their structural information is not available or because they are substances of multiple
14 | ingredients.
15 |
16 | We also include the weight of compounds available in DrugBank in `drugbank_weight.txt`. For compounds whose weight
17 | information is missing, we use -1 for a placeholder. The weight information is available for 7158 compounds
18 | and missing for 2547 compounds.
19 |
--------------------------------------------------------------------------------
/drugbank_info/drugbank_biotech.txt:
--------------------------------------------------------------------------------
1 | DB00001
2 | DB00002
3 | DB00003
4 | DB00004
5 | DB00005
6 | DB00008
7 | DB00009
8 | DB00010
9 | DB00011
10 | DB00012
11 | DB00013
12 | DB00015
13 | DB00016
14 | DB00017
15 | DB00018
16 | DB00019
17 | DB00020
18 | DB00022
19 | DB00023
20 | DB00024
21 | DB00025
22 | DB00026
23 | DB00028
24 | DB00029
25 | DB00030
26 | DB00031
27 | DB00032
28 | DB00033
29 | DB00034
30 | DB00036
31 | DB00038
32 | DB00039
33 | DB00040
34 | DB00041
35 | DB00042
36 | DB00043
37 | DB00044
38 | DB00045
39 | DB00046
40 | DB00047
41 | DB00048
42 | DB00049
43 | DB00051
44 | DB00052
45 | DB00053
46 | DB00054
47 | DB00055
48 | DB00056
49 | DB00057
50 | DB00058
51 | DB00059
52 | DB00060
53 | DB00061
54 | DB00062
55 | DB00063
56 | DB00065
57 | DB00066
58 | DB00068
59 | DB00069
60 | DB00070
61 | DB00071
62 | DB00072
63 | DB00073
64 | DB00074
65 | DB00075
66 | DB00076
67 | DB00078
68 | DB00081
69 | DB00082
70 | DB00083
71 | DB00085
72 | DB00086
73 | DB00087
74 | DB00088
75 | DB00089
76 | DB00090
77 | DB00092
78 | DB00094
79 | DB00095
80 | DB00097
81 | DB00098
82 | DB00099
83 | DB00100
84 | DB00102
85 | DB00103
86 | DB00105
87 | DB00107
88 | DB00108
89 | DB00109
90 | DB00110
91 | DB00111
92 | DB00112
93 | DB00113
94 | DB01257
95 | DB01269
96 | DB01270
97 | DB01271
98 | DB01272
99 | DB01276
100 | DB01277
101 | DB01279
102 | DB01281
103 | DB01285
104 | DB01306
105 | DB01307
106 | DB01309
107 | DB01363
108 | DB01381
109 | DB03404
110 | DB04895
111 | DB04897
112 | DB04899
113 | DB04900
114 | DB04901
115 | DB04907
116 | DB04914
117 | DB04919
118 | DB04925
119 | DB04932
120 | DB04949
121 | DB04950
122 | DB04956
123 | DB04958
124 | DB04959
125 | DB04962
126 | DB04964
127 | DB04973
128 | DB04985
129 | DB04988
130 | DB05006
131 | DB05084
132 | DB05097
133 | DB05099
134 | DB05101
135 | DB05110
136 | DB05111
137 | DB05121
138 | DB05136
139 | DB05139
140 | DB05144
141 | DB05164
142 | DB05202
143 | DB05209
144 | DB05218
145 | DB05222
146 | DB05254
147 | DB05258
148 | DB05259
149 | DB05276
150 | DB05299
151 | DB05304
152 | DB05305
153 | DB05306
154 | DB05311
155 | DB05319
156 | DB05321
157 | DB05325
158 | DB05332
159 | DB05336
160 | DB05337
161 | DB05386
162 | DB05396
163 | DB05405
164 | DB05429
165 | DB05437
166 | DB05459
167 | DB05471
168 | DB05481
169 | DB05496
170 | DB05530
171 | DB05545
172 | DB05550
173 | DB05555
174 | DB05578
175 | DB05595
176 | DB05656
177 | DB05675
178 | DB05679
179 | DB05685
180 | DB05718
181 | DB05739
182 | DB05758
183 | DB05771
184 | DB05773
185 | DB05777
186 | DB05787
187 | DB05793
188 | DB05794
189 | DB05797
190 | DB05798
191 | DB05819
192 | DB05829
193 | DB05848
194 | DB05860
195 | DB05867
196 | DB05879
197 | DB05883
198 | DB05889
199 | DB05892
200 | DB05895
201 | DB05906
202 | DB05915
203 | DB05916
204 | DB05941
205 | DB05942
206 | DB05945
207 | DB05996
208 | DB06014
209 | DB06022
210 | DB06043
211 | DB06049
212 | DB06050
213 | DB06081
214 | DB06101
215 | DB06116
216 | DB06162
217 | DB06168
218 | DB06171
219 | DB06186
220 | DB06192
221 | DB06205
222 | DB06226
223 | DB06241
224 | DB06245
225 | DB06271
226 | DB06273
227 | DB06285
228 | DB06304
229 | DB06305
230 | DB06310
231 | DB06317
232 | DB06318
233 | DB06322
234 | DB06324
235 | DB06342
236 | DB06343
237 | DB06356
238 | DB06360
239 | DB06366
240 | DB06371
241 | DB06372
242 | DB06379
243 | DB06400
244 | DB06404
245 | DB06423
246 | DB06467
247 | DB06474
248 | DB06475
249 | DB06492
250 | DB06495
251 | DB06503
252 | DB06511
253 | DB06534
254 | DB06550
255 | DB06552
256 | DB06557
257 | DB06560
258 | DB06584
259 | DB06599
260 | DB06602
261 | DB06606
262 | DB06607
263 | DB06611
264 | DB06612
265 | DB06642
266 | DB06643
267 | DB06647
268 | DB06650
269 | DB06655
270 | DB06662
271 | DB06674
272 | DB06679
273 | DB06681
274 | DB06688
275 | DB06692
276 | DB06719
277 | DB06720
278 | DB06752
279 | DB06759
280 | DB06760
281 | DB06761
282 | DB06773
283 | DB08870
284 | DB08876
285 | DB08879
286 | DB08885
287 | DB08886
288 | DB08888
289 | DB08894
290 | DB08898
291 | DB08900
292 | DB08902
293 | DB08904
294 | DB08935
295 | DB09005
296 | DB09029
297 | DB09033
298 | DB09035
299 | DB09036
300 | DB09037
301 | DB09043
302 | DB09045
303 | DB09046
304 | DB09051
305 | DB09052
306 | DB09057
307 | DB09066
308 | DB09077
309 | DB09098
310 | DB09100
311 | DB09103
312 | DB09105
313 | DB09107
314 | DB09108
315 | DB09109
316 | DB09113
317 | DB09122
318 | DB09126
319 | DB09141
320 | DB09208
321 | DB09222
322 | DB09228
323 | DB09264
324 | DB09302
325 | DB09303
326 | DB09310
327 | DB09312
328 | DB09328
329 | DB09329
330 | DB09331
331 | DB09334
332 | DB09456
333 | DB09527
334 | DB09532
335 | DB09559
336 | DB09564
337 | DB10062
338 | DB10076
339 | DB10276
340 | DB10283
341 | DB10317
342 | DB10318
343 | DB10342
344 | DB10343
345 | DB10349
346 | DB10351
347 | DB10353
348 | DB10365
349 | DB10366
350 | DB10370
351 | DB10379
352 | DB10389
353 | DB10394
354 | DB10395
355 | DB10400
356 | DB10401
357 | DB10403
358 | DB10404
359 | DB10489
360 | DB10494
361 | DB10532
362 | DB10583
363 | DB10584
364 | DB10600
365 | DB10675
366 | DB10676
367 | DB10713
368 | DB10714
369 | DB10715
370 | DB10716
371 | DB10717
372 | DB10719
373 | DB10769
374 | DB10770
375 | DB10772
376 | DB10794
377 | DB10803
378 | DB10804
379 | DB10805
380 | DB10836
381 | DB10879
382 | DB10949
383 | DB10962
384 | DB10963
385 | DB10964
386 | DB10966
387 | DB10982
388 | DB10989
389 | DB10990
390 | DB11003
391 | DB11038
392 | DB11040
393 | DB11041
394 | DB11044
395 | DB11050
396 | DB11051
397 | DB11065
398 | DB11066
399 | DB11095
400 | DB11097
401 | DB11113
402 | DB11131
403 | DB11166
404 | DB11198
405 | DB11199
406 | DB11294
407 | DB11300
408 | DB11311
409 | DB11312
410 | DB11330
411 | DB11338
412 | DB11358
413 | DB11563
414 | DB11567
415 | DB11568
416 | DB11569
417 | DB11571
418 | DB11572
419 | DB11580
420 | DB11595
421 | DB11597
422 | DB11598
423 | DB11601
424 | DB11603
425 | DB11604
426 | DB11606
427 | DB11607
428 | DB11608
429 | DB11621
430 | DB11624
431 | DB11626
432 | DB11627
433 | DB11631
434 | DB11639
435 | DB11646
436 | DB11657
437 | DB11661
438 | DB11668
439 | DB11680
440 | DB11685
441 | DB11714
442 | DB11715
443 | DB11727
444 | DB11731
445 | DB11746
446 | DB11756
447 | DB11767
448 | DB11771
449 | DB11776
450 | DB11803
451 | DB11826
452 | DB11834
453 | DB11840
454 | DB11849
455 | DB11850
456 | DB11854
457 | DB11856
458 | DB11857
459 | DB11862
460 | DB11866
461 | DB11884
462 | DB11914
463 | DB11930
464 | DB11945
465 | DB11959
466 | DB11972
467 | DB11976
468 | DB11988
469 | DB12023
470 | DB12034
471 | DB12053
472 | DB12056
473 | DB12077
474 | DB12089
475 | DB12090
476 | DB12102
477 | DB12104
478 | DB12118
479 | DB12119
480 | DB12142
481 | DB12152
482 | DB12157
483 | DB12159
484 | DB12169
485 | DB12182
486 | DB12189
487 | DB12202
488 | DB12205
489 | DB12213
490 | DB12240
491 | DB12246
492 | DB12250
493 | DB12258
494 | DB12261
495 | DB12274
496 | DB12281
497 | DB12296
498 | DB12317
499 | DB12331
500 | DB12335
501 | DB12342
502 | DB12344
503 | DB12363
504 | DB12366
505 | DB12374
506 | DB12386
507 | DB12396
508 | DB12409
509 | DB12413
510 | DB12431
511 | DB12456
512 | DB12489
513 | DB12498
514 | DB12520
515 | DB12525
516 | DB12530
517 | DB12534
518 | DB12560
519 | DB12568
520 | DB12584
521 | DB12589
522 | DB12609
523 | DB12656
524 | DB12683
525 | DB12688
526 | DB12698
527 | DB12699
528 | DB12701
529 | DB12718
530 | DB12726
531 | DB12734
532 | DB12747
533 | DB12768
534 | DB12773
535 | DB12775
536 | DB12797
537 | DB12807
538 | DB12814
539 | DB12815
540 | DB12820
541 | DB12826
542 | DB12844
543 | DB12845
544 | DB12849
545 | DB12872
546 | DB12891
547 | DB12893
548 | DB12909
549 | DB12917
550 | DB12943
551 | DB12950
552 | DB12976
553 | DB13007
554 | DB13017
555 | DB13037
556 | DB13045
557 | DB13073
558 | DB13127
559 | DB13133
560 | DB13140
561 | DB13144
562 | DB13147
563 | DB13148
564 | DB13149
565 | DB13150
566 | DB13151
567 | DB13152
568 | DB13161
569 | DB13173
570 | DB13189
571 | DB13192
572 | DB13193
573 | DB13194
574 | DB13195
575 | DB13196
576 | DB13197
577 | DB13198
578 | DB13199
579 | DB13200
580 | DB13201
581 | DB13241
582 | DB13260
583 | DB13263
584 | DB13281
585 | DB13287
586 | DB13375
587 | DB13521
588 | DB13535
589 | DB13646
590 | DB13761
591 | DB13864
592 | DB13881
593 | DB13884
594 | DB13886
595 | DB13892
596 | DB13893
597 | DB13894
598 | DB13896
599 | DB13897
600 | DB13898
601 | DB13899
602 | DB13900
603 | DB13901
604 | DB13902
605 | DB13903
606 | DB13906
607 | DB13915
608 | DB13923
609 | DB13924
610 | DB13926
611 | DB13932
612 | DB13933
613 | DB13961
614 | DB13975
615 | DB13976
616 | DB13979
617 | DB13998
618 | DB13999
619 | DB14004
620 | DB14009
621 | DB14012
622 | DB14016
623 | DB14022
624 | DB14039
625 | DB14040
626 | DB14041
627 | DB14042
628 | DB14211
629 | DB14240
630 | DB14256
631 | DB14337
632 | DB14384
633 | DB14385
634 | DB14394
635 | DB14409
636 | DB14443
637 | DB14445
638 | DB14449
639 | DB14473
640 | DB14562
641 | DB14580
642 | DB14582
643 | DB14585
644 | DB14597
645 | DB14619
646 | DB14620
647 | DB14685
648 | DB14700
649 | DB14707
650 | DB14711
651 | DB14712
652 | DB14713
653 | DB14724
654 | DB14730
655 | DB14731
656 | DB14738
657 | DB14740
658 | DB14749
659 | DB14751
660 | DB14762
661 | DB14776
662 | DB14778
663 | DB14784
664 | DB14809
665 | DB14811
666 | DB14824
667 | DB14843
668 | DB14864
669 | DB14865
670 | DB14871
671 | DB14877
672 | DB14891
673 | DB14897
674 | DB14905
675 | DB14907
676 | DB14908
677 | DB14919
678 | DB14947
679 | DB14952
680 | DB14959
681 | DB14962
682 | DB14967
683 | DB14988
684 | DB14997
685 | DB14999
686 | DB15014
687 | DB15022
688 | DB15044
689 | DB15045
690 | DB15066
691 | DB15076
692 | DB15089
693 | DB15090
694 | DB15093
695 | DB15095
696 | DB15101
697 | DB15104
698 | DB15112
699 | DB15113
700 | DB15118
701 | DB15119
702 | DB15131
703 | DB15135
704 | DB15160
705 | DB15171
706 | DB15172
707 | DB15217
708 | DB15225
709 | DB15252
710 | DB15253
711 | DB15263
712 | DB15271
713 | DB15274
714 | DB15277
715 | DB15290
716 | DB15332
717 | DB15336
718 | DB15349
719 | DB15354
720 | DB15363
721 | DB15383
722 | DB15397
723 | DB15409
724 | DB15415
725 | DB15428
726 | DB15432
727 | DB15441
728 | DB15443
729 | DB15453
730 | DB15461
731 | DB15478
732 | DB15483
733 | DB15528
734 | DB15559
735 | DB15593
736 | DB15595
737 |
--------------------------------------------------------------------------------
/embedding_analysis/Edge_similarity_based_on_link_recommendation_results.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# DRKG Relation Similarity Analysis based on link recommendations\n",
8 | "\n",
9 | "This notebook performs an similarity analysis of different link types in the DRKG based on their recommendation outcome. Speciffically, for a certain node we predict the K most similar neighbors for a certain link type. Then we repeat this prediction for all link types. Link types that have a significant overlap of predicted neighbors will be more similar."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 3,
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import numpy as np\n",
20 | "import os\n",
21 | "import csv\n",
22 | "import sys\n",
23 | "import torch as th\n",
24 | "sys.path.insert(1, '../utils')\n",
25 | "from utils import download_and_extract\n",
26 | "download_and_extract()"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "Define the function used for scoring the edges. This should cooincide with the function used to learn the embeddinds."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 4,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "def transE_l2(head, rel, tail):\n",
43 | " gamma=12.0\n",
44 | " score = head + rel - tail\n",
45 | " return gamma - th.norm(score, p=2, dim=-1)"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "## Loading Mapping files\n",
53 | "\n",
54 | "Load the mapping files that give the ids used by the embedding models for the corresponding DRKG id. Load the entity and relation embeddings as well. Change input files and embedding files."
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 6,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "97238\n",
67 | "107\n"
68 | ]
69 | }
70 | ],
71 | "source": [
72 | "# folders holding data\n",
73 | "folder_with_training_part='../data/drkg/embed/'\n",
74 | "folder_with_embeddings='../data/drkg/embed/'\n",
75 | "\n",
76 | "ids = []\n",
77 | "entity2id = {}\n",
78 | "with open(folder_with_training_part+\"entities.tsv\", newline='', encoding='utf-8') as csvfile:\n",
79 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=[ 'entity','id'])\n",
80 | " for row_val in reader:\n",
81 | " id = row_val['id']\n",
82 | "\n",
83 | " entity2id[row_val['entity']] = int(id)\n",
84 | "\n",
85 | "print(len(entity2id))\n",
86 | "\n",
87 | "rel2id = {}\n",
88 | "with open(folder_with_training_part+\"relations.tsv\", newline='', encoding='utf-8') as csvfile:\n",
89 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['entity','id'])\n",
90 | " for row_val in reader:\n",
91 | " id = row_val['id']\n",
92 | "\n",
93 | " rel2id[row_val['entity']] = int(id)\n",
94 | "\n",
95 | "print(len(rel2id))\n",
96 | "\n",
97 | "node_emb = np.load(folder_with_embeddings+'DRKG_TransE_l2_entity.npy')\n",
98 | "rel_emb = np.load(folder_with_embeddings+'DRKG_TransE_l2_relation.npy')"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "## Loading triplets\n",
106 | "\n",
107 | "Load triplets and map them to DRKG id space"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "head_ids = []\n",
117 | "rel_ids = []\n",
118 | "tail_ids = []\n",
119 | "p0_rows = []\n",
120 | "folder_with_training_part\n",
121 | "with open(\"../data/drkg/drkg.tsv\", newline='', encoding='utf-8') as csvfile:\n",
122 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['head', 'rel', 'tail'])\n",
123 | " for row_val in reader:\n",
124 | " head = row_val['head']\n",
125 | " rel = row_val['rel']\n",
126 | " tail = row_val['tail']\n",
127 | "\n",
128 | " head_id = entity2id[head]\n",
129 | " rel_id = rel2id[rel]\n",
130 | " tail_id = entity2id[tail]\n",
131 | " \n",
132 | " head_ids.append(head_id)\n",
133 | " rel_ids.append(rel_id)\n",
134 | " tail_ids.append(tail_id)\n",
135 | " p0_rows.append((head, rel, tail))\n",
136 | " \n",
137 | "head_ids = np.array(head_ids)\n",
138 | "rel_ids = np.array(rel_ids)\n",
139 | "tail_ids = np.array(tail_ids)\n",
140 | "triple_ids = np.arange(head_ids.shape[0])"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "## Link prediction\n",
148 | "\n",
149 | "Specify number of seed nodes to select for link prediction"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 6,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "scores={}\n",
159 | "L=100\n",
160 | "device = th.device('cpu')\n",
161 | "with th.no_grad():\n",
162 | " node_emb = th.tensor(node_emb).to(device)\n",
163 | " rel_emb = th.tensor(rel_emb).to(device)\n",
164 | " head_ids = th.tensor(head_ids).to(device)\n",
165 | " rel_ids = th.tensor(rel_ids).to(device)\n",
166 | " tail_ids = th.tensor(tail_ids).to(device)\n",
167 | "\n",
168 | " head_embedding = node_emb[head_ids]\n",
169 | " rel_embedding = rel_emb[rel_ids]\n",
170 | " tail_embedding = node_emb[tail_ids]\n",
171 | " # select L random heads\n",
172 | " \n",
173 | " perm = th.randperm(head_ids.shape[0])\n",
174 | " seeds = head_ids[perm[:L]]\n",
175 | " seed_heads = node_emb[seeds]"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "Predict the scores per link type among the selected seed nodes and all other nodes"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": 7,
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "for rel in rel2id.keys():\n",
192 | " rel_id=rel2id[rel]\n",
193 | " rel_embedding=((rel_emb[rel_id]).repeat(node_emb.shape[0],1))\n",
194 | " #print(rel_embedding.shape)\n",
195 | " scores[rel] =[transE_l2((seed_heads[i].repeat(node_emb.shape[0],1)), rel_embedding, node_emb) for i in range(seed_heads.shape[0])]# for i in range()]\n",
196 | " #print(scores[rel])\n",
197 | " "
198 | ]
199 | },
200 | {
201 | "cell_type": "markdown",
202 | "metadata": {},
203 | "source": [
204 | "### Top K link predicition\n",
205 | "Specify the number of top scoring neighbors to evaluate the proposed ovelap of link prediction."
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 8,
211 | "metadata": {},
212 | "outputs": [],
213 | "source": [
214 | "K=10\n",
215 | "top_neighbors={}\n",
216 | "for rel in scores.keys():\n",
217 | " top_neighbors[rel]=[th.argsort(score, descending=True)[:K] for score in scores[rel]]"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "## Overlap among predicted neighbors\n",
225 | "Calculate the overlap of predicted neighboring nodes for each per of relation types"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": 9,
231 | "metadata": {},
232 | "outputs": [],
233 | "source": [
234 | "overlap_of_predicted_neighbors=[]\n",
235 | "keys=list(scores.keys()) \n",
236 | "for i in range(len(keys)):\n",
237 | " for j in range(i+1,len(keys)):\n",
238 | " e1=keys[i]\n",
239 | " e2=keys[j]\n",
240 | " n_1=top_neighbors[e1]\n",
241 | " n_2=top_neighbors[e2]\n",
242 | " jacard=0\n",
243 | " for l in range(len(n_1)):\n",
244 | " n1=list(n_1[l].cpu().numpy())\n",
245 | " n2=list(n_2[l].cpu().numpy())\n",
246 | " jacard+=float(len(set(n1).intersection(set(n2)))/len(set(n1).union(set(n2))))\n",
247 | " jacard=jacard/len(n_1)\n",
248 | " overlap_of_predicted_neighbors.append([e1,e2,jacard])"
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {},
254 | "source": [
255 | "Store sorted overlap file"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 10,
261 | "metadata": {},
262 | "outputs": [],
263 | "source": [
264 | "overlap_of_predicted_neighbors_sort=(sorted(overlap_of_predicted_neighbors,key=lambda x: float(x[2])))[::-1]\n",
265 | "\n",
266 | "overlap_of_predicted_neighbors_store=[\"{}\\t{}\\t{}\\n\".format(j[0], j[1], j[2]) for j in overlap_of_predicted_neighbors_sort]\n",
267 | "\n",
268 | "overlap_of_predicted_neighbors_store=[\"edge_type1\\tedge_type2\\tpercentage of overlapping predicted edges\\n\"]+overlap_of_predicted_neighbors_store\n",
269 | "entity_file = \"percentage_of_overlapping_predicted_edges_per_edge_pair\"+str(K)+\"v1.tsv\"\n",
270 | "with open(entity_file, 'w+') as f:\n",
271 | " f.writelines(overlap_of_predicted_neighbors_store)"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": null,
277 | "metadata": {},
278 | "outputs": [],
279 | "source": []
280 | }
281 | ],
282 | "metadata": {
283 | "kernelspec": {
284 | "display_name": "Python 3",
285 | "language": "python",
286 | "name": "python3"
287 | },
288 | "language_info": {
289 | "codemirror_mode": {
290 | "name": "ipython",
291 | "version": 3
292 | },
293 | "file_extension": ".py",
294 | "mimetype": "text/x-python",
295 | "name": "python",
296 | "nbconvert_exporter": "python",
297 | "pygments_lexer": "ipython3",
298 | "version": "3.7.6"
299 | }
300 | },
301 | "nbformat": 4,
302 | "nbformat_minor": 2
303 | }
304 |
--------------------------------------------------------------------------------
/embedding_analysis/Readme.md:
--------------------------------------------------------------------------------
1 | # Knowledge Graph Embedding Based Analysis of DRKG
2 | We analyze the extracted DRKG by learning a TransE KGE model that utilizes the $\ell_2$ distance. As DRKG combines information from different data sources, we want to verify that meaningful entity and relation embeddings can be generated using knowledge graph embedding technology.
3 |
4 | ## Train Knowledge Graph Embedding
5 | Before doing the analysis, we need to train the knowledge graph embedding first. Here, we split the edge triplets in training, validation and test sets as follows 90%, 5%, and 5% and train the KGE model as shown in following notebook:
6 |
7 | - [Train_embeddings.ipynb](Train_embeddings.ipynb)
8 |
9 | ## Analyze the Relation Embedding Similarity
10 | We analyze the relation embedding similarity in [Relation_similarity_analysis.ipynb](Relation_similarity_analysis.ipynb). We first use t-SNE to map relation embedding to a 2D space to show the relation embedding distribution and then plot the pair-wise similarity between different edge relation types.
11 |
12 | ## Analyze the Entity Embedding Similarity
13 | We analyze the entity embedding similarity in [Entity_similarity_analysis.ipynb](Entity_similarity_analysis.ipynb). We first use t-SNE to map relation embedding to a 2D space to show the entity embedding distribution, then plot the embedding distribution of entities of Drugbank drugs, and finally we show the pair-wise similarity between different entities.
14 |
15 | ## Analyze Edge Score
16 | We analyze whether the learned KGE model can predict the edges of DRGK in [Edge_score_analysis.ipynb](Edge_score_analysis.ipynb). In order to avoid the possible bias of over-fitting the triplets in the training set, we split the whole DRKG into 10 equal folds and train 10 KGE models by picking each fold as the test set and the rest other nine folds are the training set. Following this, the score for each triplet is calculated while this triplet was in the test set. Then we show how edge scores distribute.
17 |
18 | ## Analyze Link Type Recommendation Similarity
19 | We analyze how similar are the predicted links among different relation types in [Edge_similarity_based_on_link_recommendation_results.ipynb](Edge_similarity_based_on_link_recommendation_results.ipynb). We evaluate how similar are the predicted links among different relation types. This task examines the similarity across relation types for the link prediction task. For seed node $n^{k}_i$ we find the top 10 neighbors under relation $r_j$ with the highest link prediction score. Next, we repeat the same prediction for relation $r_{j'}$ and calculate the Jaccard similarity coefficient among the predicted sets of top 10 neighbors for $r_j$ and $r_{j'}$.
--------------------------------------------------------------------------------
/embedding_analysis/Relation_similarity_analysis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# DRKG Relation Embedding Similarity Analysis\n",
8 | "This nodebook shows how to analyze the trained relation embeddings. \n",
9 | "\n",
10 | "In this example, we first load the trained embeddings and map them back into original relation names. And then apply three methodologies to analyze these embeddings:\n",
11 | " - Project the embeddings into low dimension space and visualize their distribution.\n",
12 | " - Use cosine distance to analyze the similarity between each relation.\n",
13 | " - Use frobenius distance to analyze the similarity between each relation."
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 1,
19 | "metadata": {},
20 | "outputs": [],
21 | "source": [
22 | "import pandas as pd\n",
23 | "import numpy as np\n",
24 | "import os\n",
25 | "import csv"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## Loading Relation ID Mapping"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 2,
38 | "metadata": {},
39 | "outputs": [
40 | {
41 | "name": "stdout",
42 | "output_type": "stream",
43 | "text": [
44 | "Number of relations: 107\n"
45 | ]
46 | }
47 | ],
48 | "source": [
49 | "rel2id = {}\n",
50 | "id2rel = {}\n",
51 | "with open(\"./train/relations.tsv\", newline='', encoding='utf-8') as csvfile:\n",
52 | " reader = csv.DictReader(csvfile, delimiter='\\t', fieldnames=['rel','id'])\n",
53 | " for row_val in reader:\n",
54 | " id = row_val['id']\n",
55 | " relation = row_val['rel']\n",
56 | "\n",
57 | " rel2id[relation] = int(id)\n",
58 | " id2rel[int(id)] = relation\n",
59 | "\n",
60 | "print(\"Number of relations: {}\".format(len(rel2id)))"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "## Loading Relation Embeddings"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 3,
73 | "metadata": {},
74 | "outputs": [
75 | {
76 | "name": "stdout",
77 | "output_type": "stream",
78 | "text": [
79 | "(107, 400)\n"
80 | ]
81 | }
82 | ],
83 | "source": [
84 | "rel_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_relation.npy')\n",
85 | "print(rel_emb.shape)"
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "## General Relation Embedding Clustering\n",
93 | "Here we use t-SNE to convert relation embeddings into low dimension space and visualize their distribution."
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 9,
99 | "metadata": {},
100 | "outputs": [
101 | {
102 | "name": "stderr",
103 | "output_type": "stream",
104 | "text": [
105 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n",
106 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n",
107 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n",
108 | "The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n"
109 | ]
110 | },
111 | {
112 | "data": {
113 | "image/png": "\n",
114 | "text/plain": [
115 | ""
116 | ]
117 | },
118 | "metadata": {
119 | "needs_background": "light"
120 | },
121 | "output_type": "display_data"
122 | }
123 | ],
124 | "source": [
125 | "from matplotlib import cm\n",
126 | "import matplotlib.pyplot as plt\n",
127 | "from sklearn.utils import check_random_state\n",
128 | "from sklearn.manifold import TSNE\n",
129 | "\n",
130 | "dataset_id = {}\n",
131 | "for rel_name, i in rel2id.items():\n",
132 | " rel_key = rel_name.split('::')[0]\n",
133 | " if dataset_id.get(rel_key, None) is None:\n",
134 | " dataset_id[rel_key] = []\n",
135 | " dataset_id[rel_key].append(i)\n",
136 | "\n",
137 | "X_embedded = TSNE(n_components=2).fit_transform(rel_emb).T\n",
138 | "p = cm.rainbow(int(255/2 * 1))\n",
139 | "fig = plt.figure()\n",
140 | "ax = fig.add_subplot(111)\n",
141 | "for key, val in dataset_id.items():\n",
142 | " val = np.asarray(val, dtype=np.long)\n",
143 | "\n",
144 | " ax.plot(X_embedded[0][val], X_embedded[1][val], '.', label=key)\n",
145 | "\n",
146 | "lgd = ax.legend(bbox_to_anchor=(1.0, 1.0))\n",
147 | "plt.savefig('relation.eps', bbox_extra_artists=(lgd,), bbox_inches='tight', format='eps')"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "## Pair-wise Relation Embedding Cosine Similarity\n",
155 | "We calculate the pair-wise embedding similarity using cosine distance and output the top10 most similar pairs."
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 16,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "data": {
165 | "text/plain": [
166 | "[('GNBR::E::Compound:Gene', 'GNBR::K::Compound:Gene', 0.98599356),\n",
167 | " ('GNBR::E::Compound:Gene', 'GNBR::E+::Compound:Gene', 0.98297095),\n",
168 | " ('GNBR::N::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.96987224),\n",
169 | " ('GNBR::E::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.96532124),\n",
170 | " ('GNBR::K::Compound:Gene', 'GNBR::E+::Compound:Gene', 0.9564862),\n",
171 | " ('GNBR::E+::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.95019233),\n",
172 | " ('GNBR::L::Gene:Disease', 'GNBR::G::Gene:Disease', 0.9419448),\n",
173 | " ('GNBR::K::Compound:Gene', 'GNBR::E-::Compound:Gene', 0.94074607),\n",
174 | " ('GNBR::J::Gene:Disease', 'GNBR::Md::Gene:Disease', 0.9319676),\n",
175 | " ('GNBR::J::Gene:Disease', 'GNBR::Te::Gene:Disease', 0.93183714)]"
176 | ]
177 | },
178 | "execution_count": 16,
179 | "metadata": {},
180 | "output_type": "execute_result"
181 | }
182 | ],
183 | "source": [
184 | "from sklearn.metrics.pairwise import cosine_similarity\n",
185 | "\n",
186 | "similarity = cosine_similarity(rel_emb)\n",
187 | "idx = np.flip(np.argsort(similarity), axis=1)\n",
188 | "\n",
189 | "max_pairs = []\n",
190 | "for i in range(idx.shape[0]):\n",
191 | " j = 1\n",
192 | " while (similarity[i][idx[i][j]] > 0.9):\n",
193 | " max_pairs.append((id2rel[idx[i][0]], id2rel[idx[i][j]], similarity[i][idx[i][j]]))\n",
194 | " j += 1\n",
195 | "\n",
196 | "def sort_score(pair):\n",
197 | " return pair[2]\n",
198 | "\n",
199 | "max_pairs.sort(reverse=True, key=sort_score)\n",
200 | "sim_pairs = []\n",
201 | "for i, pair in enumerate(max_pairs):\n",
202 | " if i % 2 == 0:\n",
203 | " sim_pairs.append(pair)\n",
204 | "\n",
205 | "sim_pairs[:10]"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "Then we draw a histogram of how the pair-wise similarity score distributed."
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 17,
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "name": "stdout",
222 | "output_type": "stream",
223 | "text": [
224 | "(11449,)\n"
225 | ]
226 | },
227 | {
228 | "data": {
229 | "image/png": "\n",
230 | "text/plain": [
231 | ""
232 | ]
233 | },
234 | "metadata": {
235 | "needs_background": "light"
236 | },
237 | "output_type": "display_data"
238 | }
239 | ],
240 | "source": [
241 | "similarity=similarity.flatten()\n",
242 | "print(similarity.shape)\n",
243 | "\n",
244 | "# cleanup self-compare and dup-compare\n",
245 | "s = similarity < 0.99\n",
246 | "s = np.unique(similarity[s])\n",
247 | "plt.xlabel('Cosine similarity')\n",
248 | "plt.ylabel('Number of relation pairs')\n",
249 | "plt.hist(s)\n",
250 | "plt.savefig('relation-sim.eps', format='eps')"
251 | ]
252 | },
253 | {
254 | "cell_type": "markdown",
255 | "metadata": {},
256 | "source": [
257 | "## Pair-wise Relation Embedding Frobenius Similarity\n",
258 | "We calculate the pair-wise embedding similarity using L2 distance and output the top10 most similar pairs."
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": 7,
264 | "metadata": {},
265 | "outputs": [
266 | {
267 | "data": {
268 | "text/plain": [
269 | "[('GNBR::E::Compound:Gene', 'GNBR::K::Compound:Gene', 1.6984596),\n",
270 | " ('GNBR::E::Compound:Gene', 'GNBR::E+::Compound:Gene', 1.8601348),\n",
271 | " ('GNBR::N::Compound:Gene', 'GNBR::E-::Compound:Gene', 2.3698092),\n",
272 | " ('GNBR::E::Compound:Gene', 'GNBR::E-::Compound:Gene', 2.606229),\n",
273 | " ('GNBR::K::Compound:Gene', 'GNBR::E+::Compound:Gene', 2.9946468),\n",
274 | " ('GNBR::E+::Compound:Gene', 'GNBR::E-::Compound:Gene', 3.1560013),\n",
275 | " ('GNBR::L::Gene:Disease', 'GNBR::G::Gene:Disease', 3.4119256),\n",
276 | " ('GNBR::K::Compound:Gene', 'GNBR::E-::Compound:Gene', 3.454293),\n",
277 | " ('GNBR::J::Gene:Disease', 'GNBR::Md::Gene:Disease', 3.6071572),\n",
278 | " ('GNBR::J::Gene:Disease', 'GNBR::Te::Gene:Disease', 3.624401)]"
279 | ]
280 | },
281 | "execution_count": 7,
282 | "metadata": {},
283 | "output_type": "execute_result"
284 | }
285 | ],
286 | "source": [
287 | "from sklearn.metrics.pairwise import euclidean_distances\n",
288 | "\n",
289 | "similarity = euclidean_distances(rel_emb)\n",
290 | "idx = np.argsort(similarity)\n",
291 | "\n",
292 | "min_pairs = []\n",
293 | "for i in range(idx.shape[0]):\n",
294 | " j = 1\n",
295 | " while (similarity[i][idx[i][j]] < 5):\n",
296 | " min_pairs.append((id2rel[idx[i][0]], id2rel[idx[i][j]], similarity[i][idx[i][j]]))\n",
297 | " j += 1\n",
298 | "\n",
299 | "def sort_score(pair):\n",
300 | " return pair[2]\n",
301 | "\n",
302 | "min_pairs.sort(key=sort_score)\n",
303 | "sim_pairs = []\n",
304 | "for i, pair in enumerate(min_pairs):\n",
305 | " if i % 2 == 0:\n",
306 | " sim_pairs.append(pair)\n",
307 | "\n",
308 | "sim_pairs[:10]"
309 | ]
310 | },
311 | {
312 | "cell_type": "markdown",
313 | "metadata": {},
314 | "source": [
315 | "Then we draw a histogram of how the pair-wise distance score distributed."
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": 8,
321 | "metadata": {},
322 | "outputs": [
323 | {
324 | "name": "stdout",
325 | "output_type": "stream",
326 | "text": [
327 | "(11449,)\n"
328 | ]
329 | },
330 | {
331 | "data": {
332 | "image/png": "\n",
333 | "text/plain": [
334 | ""
335 | ]
336 | },
337 | "metadata": {
338 | "needs_background": "light"
339 | },
340 | "output_type": "display_data"
341 | }
342 | ],
343 | "source": [
344 | "similarity=similarity.flatten()\n",
345 | "print(similarity.shape)\n",
346 | "\n",
347 | "# cleanup self-compare and dup-compare\n",
348 | "s = similarity > 0\n",
349 | "s = np.unique(similarity[s])\n",
350 | "plt.xlabel('Frobenius Distance')\n",
351 | "plt.ylabel('Number of relation pairs')\n",
352 | "plt.hist(s)\n",
353 | "plt.show()"
354 | ]
355 | },
356 | {
357 | "cell_type": "code",
358 | "execution_count": null,
359 | "metadata": {},
360 | "outputs": [],
361 | "source": []
362 | }
363 | ],
364 | "metadata": {
365 | "kernelspec": {
366 | "display_name": "Python 3",
367 | "language": "python",
368 | "name": "python3"
369 | },
370 | "language_info": {
371 | "codemirror_mode": {
372 | "name": "ipython",
373 | "version": 3
374 | },
375 | "file_extension": ".py",
376 | "mimetype": "text/x-python",
377 | "name": "python",
378 | "nbconvert_exporter": "python",
379 | "pygments_lexer": "ipython3",
380 | "version": "3.7.6"
381 | }
382 | },
383 | "nbformat": 4,
384 | "nbformat_minor": 2
385 | }
386 |
--------------------------------------------------------------------------------
/embedding_analysis/Train_embeddings.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Training DRKG Using TransE_L2\n",
8 | "This notebook shows how to train DRKG embeddings using TransE_L2\n",
9 | "\n",
10 | "Before training the model, you need to download the original DRKG source file into your local storage, e.g., ./data/drkg.tsv"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## Install DGL-KE\n",
18 | "Before training the model, we need to install dgl and dgl-ke packages as well as other dependencies. "
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "!pip3 install torch\n",
28 | "!pip3 install dgl==0.4.3post2 \n",
29 | "!pip3 install dglke"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Prepare train/valid/test set\n",
37 | "Before training, we need to split the original drkg into train/valid/test set with a 9:0.5:0.5 manner."
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "execution_count": null,
43 | "metadata": {},
44 | "outputs": [],
45 | "source": [
46 | "import pandas as pd\n",
47 | "import numpy as np\n",
48 | "import sys\n",
49 | "sys.path.insert(1, '../utils')\n",
50 | "from utils import download_and_extract\n",
51 | "download_and_extract()\n",
52 | "drkg_file = '../data/drkg/drkg.tsv'\n",
53 | "\n",
54 | "df = pd.read_csv(drkg_file, sep=\"\\t\", header=None)\n",
55 | "triples = df.values.tolist()"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "We get 5,869,293 triples, now we will split them into three files"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": null,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": [
71 | "num_triples = len(triples)\n",
72 | "num_triples"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "# Please make sure the output directory exist.\n",
82 | "seed = np.arange(num_triples)\n",
83 | "np.random.shuffle(seed)\n",
84 | "\n",
85 | "train_cnt = int(num_triples * 0.9)\n",
86 | "valid_cnt = int(num_triples * 0.05)\n",
87 | "train_set = seed[:train_cnt]\n",
88 | "train_set = train_set.tolist()\n",
89 | "valid_set = seed[train_cnt:train_cnt+valid_cnt].tolist()\n",
90 | "test_set = seed[train_cnt+valid_cnt:].tolist()\n",
91 | "\n",
92 | "with open(\"train/drkg_train.tsv\", 'w+') as f:\n",
93 | " for idx in train_set:\n",
94 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))\n",
95 | " \n",
96 | "with open(\"train/drkg_valid.tsv\", 'w+') as f:\n",
97 | " for idx in valid_set:\n",
98 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))\n",
99 | "\n",
100 | "with open(\"train/drkg_test.tsv\", 'w+') as f:\n",
101 | " for idx in test_set:\n",
102 | " f.writelines(\"{}\\t{}\\t{}\\n\".format(triples[idx][0], triples[idx][1], triples[idx][2]))"
103 | ]
104 | },
105 | {
106 | "cell_type": "markdown",
107 | "metadata": {},
108 | "source": [
109 | "## Training TransE_l2 model\n",
110 | "We can training the TransE_l2 model by simplying using DGL-KE command line. For more information about using DGL-KE please refer to https://github.com/awslabs/dgl-ke.\n",
111 | "\n",
112 | "Here we train the model using 8 GPUs on an AWS p3.16xlarge instance."
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": [
121 | "!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./train --data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' --model_name TransE_l2 --batch_size 2048 \\\n",
122 | "--neg_sample_size 256 --hidden_dim 400 --gamma 12.0 --lr 0.1 --max_step 100000 --log_interval 1000 --batch_size_eval 16 -adv --regularization_coef 1.00E-07 --test --num_thread 1 --gpu 0 1 2 3 4 5 6 7 --num_proc 8 --neg_sample_size_eval 10000 --async_update"
123 | ]
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "metadata": {},
128 | "source": [
129 | "## Get Entity and Relation Embeddings\n",
130 | "The resulting model, i.e., the entity and relation embeddings can be found under ./ckpts. (Please refer to the first line of the training log for the specific location.)\n",
131 | "\n",
132 | "The overall process will generate 4 important files:\n",
133 | "\n",
134 | " - Entity embedding: ./ckpts/__/xxx\\_entity.npy\n",
135 | " - Relation embedding: ./ckpts/__/xxx\\_relation.npy\n",
136 | " - The entity id mapping, formated in pair: /entities.tsv\n",
137 | " - The relation id mapping, formated in pair: /relations.tsv"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": null,
143 | "metadata": {},
144 | "outputs": [],
145 | "source": [
146 | "!ls ./ckpts/TransE_l2_DRKG_0/\n",
147 | "!ls ./train/"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "## A Glance of the Entity and Relation Embeddings"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": null,
160 | "metadata": {},
161 | "outputs": [],
162 | "source": [
163 | "node_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_entity.npy')\n",
164 | "relation_emb = np.load('./ckpts/TransE_l2_DRKG_0/DRKG_TransE_l2_relation.npy')\n",
165 | "\n",
166 | "print(node_emb.shape)\n",
167 | "print(relation_emb.shape)"
168 | ]
169 | }
170 | ],
171 | "metadata": {
172 | "kernelspec": {
173 | "display_name": "Python 3",
174 | "language": "python",
175 | "name": "python3"
176 | },
177 | "language_info": {
178 | "codemirror_mode": {
179 | "name": "ipython",
180 | "version": 3
181 | },
182 | "file_extension": ".py",
183 | "mimetype": "text/x-python",
184 | "name": "python",
185 | "nbconvert_exporter": "python",
186 | "pygments_lexer": "ipython3",
187 | "version": "3.6.9"
188 | }
189 | },
190 | "nbformat": 4,
191 | "nbformat_minor": 2
192 | }
193 |
--------------------------------------------------------------------------------
/licenses/Readme.md:
--------------------------------------------------------------------------------
1 | # Source licenses
2 | The following table provides the licenses for resources contributing nodes or edges to the network. Source indicates the date when and location where the license information was retrieved. See following table for more details. For Bibliography extracted data, it don't provide any license, but we have requested permission for Non-Commercial usage.
3 |
4 | | | Resource | License | Source |
5 | | --- | -------- | ------- | ------ |
6 | | 1 | DrugBank| [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [drugbank](https://www.drugbank.ca/) |
7 | | 2 | GNBR | [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [GNBR](https://zenodo.org/record/1134693#.XqZLZpMza8U) |
8 | | 3 | Hetionet | [Mixed License](https://github.com/hetio/hetionet#License) | [Hetionet](https://het.io/) |
9 | | 4 | String | [Creative Common’s Attribution-NonCommercial 4.0 International License.](http://creativecommons.org/licenses/by-nc/4.0/legalcode) | [String](https://string-db.org/) |
10 | | 5 | IntAct | Apache License, Version 2.0 | [IntAct](https://www.ebi.ac.uk/intact/) |
11 | | 6 | DGIdb | [Mixed License](http://www.dgidb.org/downloads) | [DGIdb](http://www.dgidb.org/) |
12 | | 7 | Bibliography extracted data | Non-Commercial | |
13 |
--------------------------------------------------------------------------------
/raw_graph_analysis/Readme.md:
--------------------------------------------------------------------------------
1 | # Basic Graph Analysis of DRKG
2 | Here we evaluate the structural similarity among a pair of relation types by computing their Jaccard similarity coefficient and the overlap among the two edge types via the overlap coeffcient in [Jaccard_scores_among_all_edge_types_in_DRKG.ipynb](Jaccard_scores_among_all_edge_types_in_DRKG.ipynb)
--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import tarfile
3 |
4 | def download_and_extract():
5 | import shutil
6 | import requests
7 |
8 | url = "https://s3.us-west-2.amazonaws.com/dgl-data/dataset/DRKG/drkg.tar.gz"
9 | path = "../data/"
10 | filename = "drkg.tar.gz"
11 | fn = os.path.join(path, filename)
12 | if os.path.exists("../data/drkg/drkg.tsv"):
13 | return
14 |
15 | opener, mode = tarfile.open, 'r:gz'
16 | os.makedirs(path, exist_ok=True)
17 | cwd = os.getcwd()
18 | os.chdir(path)
19 | while True:
20 | try:
21 | file = opener(filename, mode)
22 | try: file.extractall()
23 | finally: file.close()
24 | break
25 | except Exception:
26 | f_remote = requests.get(url, stream=True)
27 | sz = f_remote.headers.get('content-length')
28 | assert f_remote.status_code == 200, 'fail to open {}'.format(url)
29 | with open(filename, 'wb') as writer:
30 | for chunk in f_remote.iter_content(chunk_size=1024*1024):
31 | writer.write(chunk)
32 | print('Download finished. Unzipping the file...')
33 | os.chdir(cwd)
34 |
--------------------------------------------------------------------------------