├── .gitattributes ├── .gitignore ├── README.md ├── cooccur-topics.ipynb ├── cooccurrence.py ├── data ├── DO-slim-to-mesh.tsv ├── disease-disease-cooccurrence.tsv ├── disease-pmids-topic.tsv.gz ├── disease-pmids.tsv.gz ├── disease-symptom-cooccurrence.tsv ├── disease-uberon-cooccurrence.tsv ├── mesh-nxo-node-link.json.gz ├── mesh-term-topics-noexp.jsonl.gz ├── symptom-pmids.tsv.gz └── uberon-pmids.tsv.gz ├── diseases.ipynb ├── download-topics.ipynb ├── environment.yml ├── eutility.py ├── symptoms.ipynb └── tissues.ipynb /.gitattributes: -------------------------------------------------------------------------------- 1 | *.xz filter=lfs diff=lfs merge=lfs -text 2 | *.gz filter=lfs diff=lfs merge=lfs -text 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Python 2 | __pycache__/ 3 | *.egg-info/ 4 | pip-wheel-metadata/ 5 | .ipynb_checkpoints 6 | .cache 7 | .pytest_cache/ 8 | build/ 9 | dist/ 10 | 11 | # System specific files 12 | 13 | ## Linux 14 | *~ 15 | .Trash-* 16 | 17 | ## macOS 18 | .DS_Store 19 | ._* 20 | .Trashes 21 | 22 | ## Windows 23 | Thumbs.db 24 | [Dd]esktop.ini 25 | 26 | ## Text Editors 27 | .vscode 28 | .idea/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Computing term cooccurrence in MEDLINE 2 | 3 | This repository quantifies term cooccurrence in MEDLINE. 4 | It's designed for computing the cooccurence of all pairs between two MeSH termsets. 5 | The repository computes MEDLINE cooccurences for the Rephetio hetnet. 6 | See the corresponding [Thinklab discussion](https://doi.org/10.15363/thinklab.d67 "Mining knowledge from MEDLINE articles and their indexed MeSH terms") for more information. 7 | 8 | ## Modules 9 | 10 | + [`eutility.py`](eutility.py) defines an `esearch_query` function for retreiving PubMed IDs matching a user-defined query. 11 | + [`cooccurrence.py`](cooccurrence.py) computes the cooccurences bewteen two termsets, 12 | whose associated PubMed IDs have been retrieved. 13 | 14 | ## Notebooks 15 | 16 | The following notebooks were used to compute relationships for Hetionet v1.0 by [Project Rephetio](https://git.dhimmel.com/rephetio-manuscript/): 17 | 18 | + [`diseases.ipynb`](diseases.ipynb) computes disease-disease cooccurrence 19 | + [`symptoms.ipynb`](symptoms.ipynb) computes symptom-disease cooccurrence 20 | + [`tissues.ipynb`](tissues.ipynb) computes anatomy-disease cooccurrence. 21 | This notebook depends on `data/disease-pmids.tsv.gz`, 22 | a dataset created by `symptoms.ipynb`. 23 | 24 | The following notebooks are for a more general analysis to support custom user queries: 25 | 26 | - [`download-topics.ipynb`](download-topics.ipynb) downloads the PubMed IDs for all MeSH descriptors and supplementary disease concepts and saves this to [`data/mesh-term-topics-noexp.jsonl.gz`](data/mesh-term-topics-noexp.jsonl.gz). 27 | - [`cooccur-topics.ipynb`](cooccur-topics.ipynb) reads `mesh-term-topics-noexp.jsonl.gz` to compute cooccurrence between a user-selected term with all other MeSH terms. 28 | 29 | ## Environment 30 | 31 | ```shell 32 | # create environment 33 | conda env create --file=environment.yml 34 | 35 | # update environment 36 | conda env update --file=environment.yml 37 | 38 | # activate environment 39 | conda activate medline 40 | 41 | # run jupyter lab for notebook development 42 | jupyter lab 43 | ``` 44 | 45 | ## History 46 | 47 | On 2021-04-09, ownership of this repository on GitHub was changed from `dhimmel/medline` to `hetio/medline`. 48 | The `hetio` organization has GitHub LFS quota, 49 | providing a more convenient way to store large compressed files. 50 | 51 | At the time of the transfer, the only default (and only) branch was `gh-pages`. 52 | The `gh-pages` branch was renamed to `pre-lfs-archive`. 53 | A new default branch `main` was created, whose history has been migrated to use Git LFS. 54 | For the version of this repository used by Project Rephetio to create Hetionet v1.0, 55 | refer to the [v1.0 release](https://github.com/hetio/medline/releases/tag/v1.0). 56 | 57 | ## Comparison to MRCOC 58 | 59 | MEDLINE [produces co-occurrence files](https://ii.nlm.nih.gov/MRCOC.shtml) under the codename MRCOC. 60 | More information is available in the 2016 report [Building an Updated MEDLINE Co-Occurrences (MRCOC) File](https://ii.nlm.nih.gov/MRCOC/MRCOC_Doc_2016.pdf). 61 | These files might be a viable alternative to the analyses in this repository for certain applications. 62 | However, they don't appear to contain topics for supplemental concept records 63 | (for example MeSH term [`C000591739`](https://id.nlm.nih.gov/mesh/2020/C000591739.html)). 64 | Feel free to open an issue with additional insights on or comparisons to MRCOC. 65 | 66 | ## License 67 | 68 | This repository is released under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/ "CC0 1.0 Universal: Public Domain Dedication"). 69 | -------------------------------------------------------------------------------- /cooccur-topics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "6e78410f-f668-4e79-8983-0ac747c58d6d", 6 | "metadata": {}, 7 | "source": [ 8 | "# Cooccurrence of a user-selected term against all MeSH terms with citations" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "fe5814b0-9d56-482f-8990-309f9c9b2db2", 15 | "metadata": {}, 16 | "outputs": [ 17 | { 18 | "name": "stdout", 19 | "output_type": "stream", 20 | "text": [ 21 | "\u001b[33m7744a88\u001b[m\u001b[33m (\u001b[m\u001b[1;36mHEAD -> \u001b[m\u001b[1;32mmain\u001b[m\u001b[33m, \u001b[m\u001b[1;31morigin/main\u001b[m\u001b[33m)\u001b[m Query pubmed with quoted MeSH terms and [nm]\n" 22 | ] 23 | } 24 | ], 25 | "source": [ 26 | "! git log -1 --oneline" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "135775b5-8fae-4a1d-81d0-96fc9989dd13", 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "import datetime\n", 37 | "import gzip\n", 38 | "import pathlib\n", 39 | "from typing import List, Set\n", 40 | "\n", 41 | "import scipy.stats\n", 42 | "import tqdm\n", 43 | "import jsonlines\n", 44 | "import tqdm\n", 45 | "import pandas as pd\n", 46 | "from nxontology import NXOntology\n", 47 | "\n", 48 | "from cooccurrence import cooccurrence_metrics" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "id": "bb3b6ec1-7e21-46b6-a8f5-c1c9fa065a3c", 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "300093" 61 | ] 62 | }, 63 | "execution_count": 3, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "# read the MeSH ontology\n", 70 | "nxo = NXOntology.read_node_link_json(\"data/mesh-nxo-node-link.json.gz\")\n", 71 | "nxo.freeze()\n", 72 | "nxo.n_nodes" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 4, 78 | "id": "6ee9d0ef-9212-4f2e-8d67-ce6edead3546", 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "data": { 83 | "text/plain": [ 84 | "35533" 85 | ] 86 | }, 87 | "execution_count": 4, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "# Read the jsonlines file\n", 94 | "path = pathlib.Path('data/mesh-term-topics-noexp.jsonl.gz')\n", 95 | "with jsonlines.Reader(gzip.open(path, \"rt\")) as reader:\n", 96 | " lines = list(reader)\n", 97 | "for line in lines:\n", 98 | " line[\"pumbed_ids\"] = set(line[\"pubmed_ids\"])\n", 99 | "len(lines)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "id": "5e9a2205-2528-4c15-8309-39c3f5a37cfd", 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "# filter topics without mesh_ids since cooccurrence cannot be computed\n", 110 | "mesh_id_to_line = {line[\"mesh_id\"]: line for line in lines if line[\"pubmed_ids\"]}" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 6, 116 | "id": "03403778-2573-4f0b-9106-1f2608c26c46", 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "27698253" 123 | ] 124 | }, 125 | "execution_count": 6, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "all_pmids: Set[str] = set()\n", 132 | "for line in lines:\n", 133 | " all_pmids |= set(line[\"pubmed_ids\"])\n", 134 | "len(all_pmids)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 7, 140 | "id": "d76747d0-6640-4b9b-9d1c-305433fd29ee", 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "def explode_pubmid_ids(nxo: NXOntology, mesh_id_to_line: dict, topic: str):\n", 145 | " exploded_pubmed_ids = set()\n", 146 | " for descendant in nxo.node_info(topic).descendants:\n", 147 | " if descendant not in mesh_id_to_line:\n", 148 | " continue\n", 149 | " exploded_pubmed_ids |= set(mesh_id_to_line[descendant][\"pubmed_ids\"])\n", 150 | " return exploded_pubmed_ids\n", 151 | "\n", 152 | "def cooccurrence_result(source_mesh_id: str, target_mesh_id: str, nxo: NXOntology, mesh_id_to_line: dict, total_pmids: int) -> dict:\n", 153 | " source_pmids = explode_pubmid_ids(nxo, mesh_id_to_line, source_mesh_id)\n", 154 | " target_pmids = explode_pubmid_ids(nxo, mesh_id_to_line, target_mesh_id)\n", 155 | " result = {\n", 156 | " \"source_mesh_id\": source_mesh_id,\n", 157 | " \"target_mesh_id\": target_mesh_id,\n", 158 | " \"source_mesh_label\": nxo.node_info(source_mesh_id).label,\n", 159 | " \"target_mesh_label\": nxo.node_info(target_mesh_id).label,\n", 160 | " }\n", 161 | " result.update(cooccurrence_metrics(source_pmids, target_pmids, total_pmids=total_pmids))\n", 162 | " return result" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 8, 168 | "id": "bc9ce436-c33c-4027-8005-01c0da8e3972", 169 | "metadata": {}, 170 | "outputs": [ 171 | { 172 | "data": { 173 | "text/plain": [ 174 | "{'source_mesh_id': 'D005357',\n", 175 | " 'target_mesh_id': 'D009103',\n", 176 | " 'source_mesh_label': 'Fibrous Dysplasia of Bone',\n", 177 | " 'target_mesh_label': 'Multiple Sclerosis',\n", 178 | " 'cooccurrence': 0,\n", 179 | " 'expected': 10.945734736410992,\n", 180 | " 'enrichment': 0.0,\n", 181 | " 'odds_ratio': 0.0,\n", 182 | " 'p_fisher': 1.0,\n", 183 | " 'n_source': 4985,\n", 184 | " 'n_target': 60818}" 185 | ] 186 | }, 187 | "execution_count": 8, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "source_mesh_id = \"D005357\" # Fibrous Dysplasia of Bone\n", 194 | "target_mesh_id = \"D009103\"\n", 195 | "cooccurrence_result(source_mesh_id, target_mesh_id, nxo, mesh_id_to_line, total_pmids=len(all_pmids))" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 9, 201 | "id": "77341fd8-e927-41a0-bbe1-7692a69d4560", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "name": "stderr", 206 | "output_type": "stream", 207 | "text": [ 208 | "100%|██████████| 32568/32568 [05:11<00:00, 104.58it/s] \n" 209 | ] 210 | } 211 | ], 212 | "source": [ 213 | "source_mesh_id = \"D005357\" # Fibrous Dysplasia of Bone\n", 214 | "\n", 215 | "rows = list()\n", 216 | "for target_mesh_id in tqdm.tqdm(mesh_id_to_line):\n", 217 | " # for development\n", 218 | "# if len(rows) > 1000:\n", 219 | "# break\n", 220 | " row = cooccurrence_result(source_mesh_id, target_mesh_id, nxo, mesh_id_to_line, total_pmids=len(all_pmids))\n", 221 | " rows.append(row)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 10, 227 | "id": "fc6360dd-9138-44ba-aca6-ee144aa3095f", 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/html": [ 233 | "
\n", 234 | "\n", 247 | "\n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | "
source_mesh_idtarget_mesh_idsource_mesh_labeltarget_mesh_labelcooccurrenceexpectedenrichmentodds_ratiop_fishern_sourcen_target
8235D005357D002636Fibrous Dysplasia of BoneCherubism4320.0777495556.319559inf0.04985432
10805D005357D005357Fibrous Dysplasia of BoneFibrous Dysplasia of Bone49850.8971775556.319559inf0.049854985
10806D005357D005358Fibrous Dysplasia of BoneFibrous Dysplasia, Monostotic4550.0818895556.319559inf0.04985455
10807D005357D005359Fibrous Dysplasia of BoneFibrous Dysplasia, Polyostotic14460.2602445556.319559inf0.049851446
15098D005357D010002Fibrous Dysplasia of BoneOsteitis Fibrosa Cystica5700.2919201952.5907203398.4909550.049851622
15105D005357D010009Fibrous Dysplasia of BoneOsteochondrodysplasias49855.510482904.639526inf0.0498530618
15096D005357D010000Fibrous Dysplasia of BoneOsteitis5080.711802713.681501911.4975020.049853955
23170D005357D019205Fibrous Dysplasia of BoneGTP-Binding Protein alpha Subunits, Gs2180.385507565.489105658.1885280.049852142
7503D005357D001848Fibrous Dysplasia of BoneBone Diseases, Developmental498514.657724340.093722inf0.0498581443
16640D005357D011629Fibrous Dysplasia of BonePuberty, Precocious1900.841564225.770042244.5735980.049854676
7504D005357D001849Fibrous Dysplasia of BoneBone Diseases, Endocrine6603.252513202.920037242.5549980.0498518072
15112D005357D010016Fibrous Dysplasia of BoneOsteoma2001.073912186.234944200.6697280.049855967
27191D005357D044385Fibrous Dysplasia of BoneGTP-Binding Protein alpha Subunits2221.412806157.134133169.1672450.049857850
13657D005357D008439Fibrous Dysplasia of BoneMaxillary Diseases2201.591701138.216904148.2142540.049858844
7500D005357D001845Fibrous Dysplasia of BoneBone Cysts2822.398530117.572005127.2329600.0498513327
\n", 477 | "
" 478 | ], 479 | "text/plain": [ 480 | " source_mesh_id target_mesh_id source_mesh_label \\\n", 481 | "8235 D005357 D002636 Fibrous Dysplasia of Bone \n", 482 | "10805 D005357 D005357 Fibrous Dysplasia of Bone \n", 483 | "10806 D005357 D005358 Fibrous Dysplasia of Bone \n", 484 | "10807 D005357 D005359 Fibrous Dysplasia of Bone \n", 485 | "15098 D005357 D010002 Fibrous Dysplasia of Bone \n", 486 | "15105 D005357 D010009 Fibrous Dysplasia of Bone \n", 487 | "15096 D005357 D010000 Fibrous Dysplasia of Bone \n", 488 | "23170 D005357 D019205 Fibrous Dysplasia of Bone \n", 489 | "7503 D005357 D001848 Fibrous Dysplasia of Bone \n", 490 | "16640 D005357 D011629 Fibrous Dysplasia of Bone \n", 491 | "7504 D005357 D001849 Fibrous Dysplasia of Bone \n", 492 | "15112 D005357 D010016 Fibrous Dysplasia of Bone \n", 493 | "27191 D005357 D044385 Fibrous Dysplasia of Bone \n", 494 | "13657 D005357 D008439 Fibrous Dysplasia of Bone \n", 495 | "7500 D005357 D001845 Fibrous Dysplasia of Bone \n", 496 | "\n", 497 | " target_mesh_label cooccurrence expected \\\n", 498 | "8235 Cherubism 432 0.077749 \n", 499 | "10805 Fibrous Dysplasia of Bone 4985 0.897177 \n", 500 | "10806 Fibrous Dysplasia, Monostotic 455 0.081889 \n", 501 | "10807 Fibrous Dysplasia, Polyostotic 1446 0.260244 \n", 502 | "15098 Osteitis Fibrosa Cystica 570 0.291920 \n", 503 | "15105 Osteochondrodysplasias 4985 5.510482 \n", 504 | "15096 Osteitis 508 0.711802 \n", 505 | "23170 GTP-Binding Protein alpha Subunits, Gs 218 0.385507 \n", 506 | "7503 Bone Diseases, Developmental 4985 14.657724 \n", 507 | "16640 Puberty, Precocious 190 0.841564 \n", 508 | "7504 Bone Diseases, Endocrine 660 3.252513 \n", 509 | "15112 Osteoma 200 1.073912 \n", 510 | "27191 GTP-Binding Protein alpha Subunits 222 1.412806 \n", 511 | "13657 Maxillary Diseases 220 1.591701 \n", 512 | "7500 Bone Cysts 282 2.398530 \n", 513 | "\n", 514 | " enrichment odds_ratio p_fisher n_source n_target \n", 515 | "8235 5556.319559 inf 0.0 4985 432 \n", 516 | "10805 5556.319559 inf 0.0 4985 4985 \n", 517 | "10806 5556.319559 inf 0.0 4985 455 \n", 518 | "10807 5556.319559 inf 0.0 4985 1446 \n", 519 | "15098 1952.590720 3398.490955 0.0 4985 1622 \n", 520 | "15105 904.639526 inf 0.0 4985 30618 \n", 521 | "15096 713.681501 911.497502 0.0 4985 3955 \n", 522 | "23170 565.489105 658.188528 0.0 4985 2142 \n", 523 | "7503 340.093722 inf 0.0 4985 81443 \n", 524 | "16640 225.770042 244.573598 0.0 4985 4676 \n", 525 | "7504 202.920037 242.554998 0.0 4985 18072 \n", 526 | "15112 186.234944 200.669728 0.0 4985 5967 \n", 527 | "27191 157.134133 169.167245 0.0 4985 7850 \n", 528 | "13657 138.216904 148.214254 0.0 4985 8844 \n", 529 | "7500 117.572005 127.232960 0.0 4985 13327 " 530 | ] 531 | }, 532 | "execution_count": 10, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "cooccur_df = pd.DataFrame(rows)\n", 539 | "cooccur_df = cooccur_df.sort_values(by=[\"p_fisher\", \"enrichment\"], ascending=[True, False])\n", 540 | "cooccur_df.head(15)" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 11, 546 | "id": "8e0ad8d5-6414-45ec-aad7-1ca73c0c2d3a", 547 | "metadata": {}, 548 | "outputs": [], 549 | "source": [ 550 | "# cooccur_df.head(1000).to_excel(\"data/medline-cooccurrence.xlsx\", index=False, freeze_panes=(0, 1))" 551 | ] 552 | } 553 | ], 554 | "metadata": { 555 | "kernelspec": { 556 | "display_name": "Python 3", 557 | "language": "python", 558 | "name": "python3" 559 | }, 560 | "language_info": { 561 | "codemirror_mode": { 562 | "name": "ipython", 563 | "version": 3 564 | }, 565 | "file_extension": ".py", 566 | "mimetype": "text/x-python", 567 | "name": "python", 568 | "nbconvert_exporter": "python", 569 | "pygments_lexer": "ipython3", 570 | "version": "3.9.2" 571 | } 572 | }, 573 | "nbformat": 4, 574 | "nbformat_minor": 5 575 | } 576 | -------------------------------------------------------------------------------- /cooccurrence.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | from typing import Any, Dict, List, Set 3 | 4 | import scipy.stats 5 | import pandas 6 | 7 | 8 | def read_pmids_tsv(path, key, min_articles = 1): 9 | term_to_pmids = dict() 10 | pmids_df = pandas.read_table(path, compression='gzip') 11 | pmids_df = pmids_df[pmids_df.n_articles >= min_articles] 12 | for i, row in pmids_df.iterrows(): 13 | term = row[key] 14 | pmids = row.pubmed_ids.split('|') 15 | term_to_pmids[term] = set(pmids) 16 | pmids_df.drop('pubmed_ids', axis=1, inplace=True) 17 | return pmids_df, term_to_pmids 18 | 19 | def score_pmid_cooccurrence(term0_to_pmids, term1_to_pmids, term0_name='term_0', term1_name='term_1', verbose=True): 20 | """ 21 | Find pubmed cooccurrence between topics of two classes. 22 | 23 | term0_to_pmids -- a dictionary that returns the pubmed_ids for each term of class 0 24 | term0_to_pmids -- a dictionary that returns the pubmed_ids for each term of class 1 25 | """ 26 | all_pmids0 = set.union(*term0_to_pmids.values()) 27 | all_pmids1 = set.union(*term1_to_pmids.values()) 28 | pmids_in_both = all_pmids0 & all_pmids1 29 | total_pmids = len(pmids_in_both) 30 | if verbose: 31 | print('Total articles containing a {}: {}'.format(term0_name, len(all_pmids0))) 32 | print('Total articles containing a {}: {}'.format(term1_name, len(all_pmids1))) 33 | print('Total articles containing both a {} and {}: {}'.format(term0_name, term1_name, total_pmids)) 34 | 35 | term0_to_pmids = term0_to_pmids.copy() 36 | term1_to_pmids = term1_to_pmids.copy() 37 | for d in term0_to_pmids, term1_to_pmids: 38 | for key, value in list(d.items()): 39 | d[key] = value & pmids_in_both 40 | if not d[key]: 41 | del d[key] 42 | 43 | if verbose: 44 | print('\nAfter removing terms without any cooccurences:') 45 | print('+ {} {}s remain'.format(len(term0_to_pmids), term0_name)) 46 | print('+ {} {}s remain'.format(len(term1_to_pmids), term1_name)) 47 | 48 | rows = list() 49 | for term0, term1 in itertools.product(term0_to_pmids, term1_to_pmids): 50 | pmids0 = term0_to_pmids[term0] 51 | pmids1 = term1_to_pmids[term1] 52 | row = { 53 | term0_name: term0, 54 | term1_name: term1, 55 | **cooccurrence_metrics(pmids0, pmids1, total_pmids=total_pmids) 56 | } 57 | rows.append(row) 58 | df = pandas.DataFrame(rows) 59 | 60 | if verbose: 61 | print('\nCooccurrence scores calculated for {} {} -- {} pairs'.format(len(df), term0_name, term1_name)) 62 | return df 63 | 64 | 65 | def cooccurrence_metrics(source_pmids: Set[str], target_pmids: Set[str], total_pmids: int) -> Dict[str, Any]: 66 | """ 67 | Compute metrics of cooccurrence between two sets of pubmed ids. 68 | Requires providing the total number of pubmed ids in the corpus. 69 | """ 70 | a = len(source_pmids & target_pmids) 71 | b = len(source_pmids) - a 72 | c = len(target_pmids) - a 73 | d = total_pmids - (a + b + c) 74 | contingency_table = [[a, b], [c, d]] 75 | # discussion on this formula in https://github.com/hetio/medline/issues/1 76 | expected = len(source_pmids) * len(target_pmids) / total_pmids 77 | enrichment = a / expected 78 | odds_ratio, p_fisher = scipy.stats.fisher_exact(contingency_table, alternative='greater') 79 | return { 80 | "cooccurrence": a, 81 | "expected": expected, 82 | "enrichment": enrichment, 83 | "odds_ratio": odds_ratio, 84 | "p_fisher": p_fisher, 85 | "n_source": len(source_pmids), 86 | "n_target": len(target_pmids), 87 | } 88 | -------------------------------------------------------------------------------- /data/DO-slim-to-mesh.tsv: -------------------------------------------------------------------------------- 1 | doid_code doid_name mesh_id mesh_name 2 | DOID:2531 hematologic cancer D019337 Hematologic Neoplasms 3 | DOID:1319 brain cancer D001932 Brain Neoplasms 4 | DOID:1324 lung cancer D008175 Lung Neoplasms 5 | DOID:263 kidney cancer D007680 Kidney Neoplasms 6 | DOID:1793 pancreatic cancer D010190 Pancreatic Neoplasms 7 | DOID:4159 skin cancer D012878 Skin Neoplasms 8 | DOID:184 bone cancer D001859 Bone Neoplasms 9 | DOID:0060119 pharynx cancer D010610 Pharyngeal Neoplasms 10 | DOID:2394 ovarian cancer D010051 Ovarian Neoplasms 11 | DOID:1612 breast cancer D001943 Breast Neoplasms 12 | DOID:3070 malignant glioma D005910 Glioma 13 | DOID:363 uterine cancer D014594 Uterine Neoplasms 14 | DOID:3953 adrenal gland cancer D000310 Adrenal Gland Neoplasms 15 | DOID:5041 esophageal cancer D004938 Esophageal Neoplasms 16 | DOID:8850 salivary gland cancer D012468 Salivary Gland Neoplasms 17 | DOID:10283 prostate cancer D011471 Prostatic Neoplasms 18 | DOID:10534 stomach cancer D013274 Stomach Neoplasms 19 | DOID:11054 urinary bladder cancer D001749 Urinary Bladder Neoplasms 20 | DOID:1192 peripheral nervous system neoplasm D010524 Peripheral Nervous System Neoplasms 21 | DOID:1781 thyroid cancer D013964 Thyroid Neoplasms 22 | DOID:3571 liver cancer D008113 Liver Neoplasms 23 | DOID:4362 cervical cancer D002583 Uterine Cervical Neoplasms 24 | DOID:119 vaginal cancer D014625 Vaginal Neoplasms 25 | DOID:11934 head and neck cancer D006258 Head and Neck Neoplasms 26 | DOID:1993 rectum cancer D012004 Rectal Neoplasms 27 | DOID:2174 ocular cancer D005134 Eye Neoplasms 28 | DOID:219 colon cancer D003110 Colonic Neoplasms 29 | DOID:2596 larynx cancer D007822 Laryngeal Neoplasms 30 | DOID:2994 germ cell cancer D009373 Neoplasms, Germ Cell and Embryonal 31 | DOID:3277 thymus cancer D013953 Thymus Neoplasms 32 | DOID:4045 muscle cancer D009217 Myosarcoma 33 | DOID:10021 duodenum cancer D004379 Duodenal Neoplasms 34 | DOID:10153 ileum cancer D007078 Ileal Neoplasms 35 | DOID:1115 sarcoma D012509 Sarcoma 36 | DOID:11239 appendix cancer D001063 Appendiceal Neoplasms 37 | DOID:11615 penile cancer D010412 Penile Neoplasms 38 | DOID:11819 ureter cancer D014516 Ureteral Neoplasms 39 | DOID:11920 tracheal cancer D014134 Tracheal Neoplasms 40 | DOID:1245 vulva cancer D014846 Vulvar Neoplasms 41 | DOID:13499 jejunal cancer D007580 Jejunal Neoplasms 42 | DOID:1725 peritoneum cancer D010534 Peritoneal Neoplasms 43 | DOID:175 vascular cancer D019043 Vascular Neoplasms 44 | DOID:1790 malignant mesothelioma D008654 Mesothelioma 45 | DOID:1909 melanoma D008545 Melanoma 46 | DOID:1964 fallopian tube cancer D005185 Fallopian Tube Neoplasms 47 | DOID:2998 testicular cancer D013736 Testicular Neoplasms 48 | DOID:3121 gallbladder cancer D005706 Gallbladder Neoplasms 49 | DOID:3565 meningioma D008577 Meningeal Neoplasms 50 | DOID:4606 bile duct cancer D001650 Bile Duct Neoplasms 51 | DOID:5559 mediastinal cancer D008479 Mediastinal Neoplasms 52 | DOID:5612 spinal cancer D013120 Spinal Cord Neoplasms 53 | DOID:5875 retroperitoneal cancer D012186 Retroperitoneal Neoplasms 54 | DOID:8778 Crohn's disease D003424 Crohn Disease 55 | DOID:2377 multiple sclerosis D009103 Multiple Sclerosis 56 | DOID:9352 type 2 diabetes mellitus D003924 Diabetes Mellitus, Type 2 57 | DOID:8577 ulcerative colitis D003093 Colitis, Ulcerative 58 | DOID:9744 type 1 diabetes mellitus D003922 Diabetes Mellitus, Type 1 59 | DOID:7148 rheumatoid arthritis D001172 Arthritis, Rheumatoid 60 | DOID:3393 coronary artery disease D003324 Coronary Artery Disease 61 | DOID:3393 coronary artery disease D003327 Coronary Disease 62 | DOID:3393 coronary artery disease D017202 Myocardial Ischemia 63 | DOID:9970 obesity D009765 Obesity 64 | DOID:10608 celiac disease D002446 Celiac Disease 65 | DOID:9074 systemic lupus erythematosus D008180 Lupus Erythematosus, Systemic 66 | DOID:9835 refractive error D012030 Refractive Errors 67 | DOID:12236 primary biliary cirrhosis D008105 Liver Cirrhosis, Biliary 68 | DOID:12306 vitiligo D014820 Vitiligo 69 | DOID:10871 age related macular degeneration D008268 Macular Degeneration 70 | DOID:14221 metabolic syndrome X D024821 Metabolic Syndrome X 71 | DOID:2841 asthma D001249 Asthma 72 | DOID:8893 psoriasis D011565 Psoriasis 73 | DOID:5419 schizophrenia D012559 Schizophrenia 74 | DOID:6364 migraine D008881 Migraine Disorders 75 | DOID:10652 Alzheimer's disease D000544 Alzheimer Disease 76 | DOID:12361 Graves' disease D006111 Graves Disease 77 | DOID:14330 Parkinson's disease D010300 Parkinson Disease 78 | DOID:3310 atopic dermatitis D003876 Dermatitis, Atopic 79 | DOID:3312 bipolar disorder D001714 Bipolar Disorder 80 | DOID:7147 ankylosing spondylitis D013167 Spondylitis, Ankylosing 81 | DOID:11612 polycystic ovary syndrome D011085 Polycystic Ovary Syndrome 82 | DOID:10763 hypertension D006973 Hypertension 83 | DOID:418 systemic scleroderma D012595 Scleroderma, Systemic 84 | DOID:13241 Behcet's disease D001528 Behcet Syndrome 85 | DOID:5408 Paget's disease of bone D010001 Osteitis Deformans 86 | DOID:1024 leprosy D007918 Leprosy 87 | DOID:10941 intracranial aneurysm D002532 Intracranial Aneurysm 88 | DOID:1686 glaucoma D005901 Glaucoma 89 | DOID:332 amyotrophic lateral sclerosis D000690 Amyotrophic Lateral Sclerosis 90 | DOID:0050425 restless legs syndrome D012148 Restless Legs Syndrome 91 | DOID:13378 Kawasaki disease D009080 Mucocutaneous Lymph Node Syndrome 92 | DOID:1936 atherosclerosis D050197 Atherosclerosis 93 | DOID:986 alopecia areata D000506 Alopecia Areata 94 | DOID:11476 osteoporosis D010024 Osteoporosis 95 | DOID:1459 hypothyroidism D007037 Hypothyroidism 96 | DOID:2986 IgA glomerulonephritis D005922 Glomerulonephritis, IGA 97 | DOID:0050741 alcohol dependence D000437 Alcoholism 98 | DOID:11949 Creutzfeldt-Jakob disease D007562 Creutzfeldt-Jakob Syndrome 99 | DOID:14227 azoospermia D053713 Azoospermia 100 | DOID:1826 epilepsy syndrome D004827 Epilepsy 101 | DOID:2043 hepatitis B D006509 Hepatitis B 102 | DOID:3083 chronic obstructive pulmonary disease D029424 Pulmonary Disease, Chronic Obstructive 103 | DOID:7693 abdominal aortic aneurysm D017544 Aortic Aneurysm, Abdominal 104 | DOID:784 chronic kidney failure D007676 Kidney Failure, Chronic 105 | DOID:8398 osteoarthritis D010003 Osteoarthritis 106 | DOID:9008 psoriatic arthritis D015535 Arthritis, Psoriatic 107 | DOID:0050742 nicotine dependence D014029 Tobacco Use Disorder 108 | DOID:10976 membranous glomerulonephritis D015433 Glomerulonephritis, Membranous 109 | DOID:11714 gestational diabetes D016640 Diabetes, Gestational 110 | DOID:12365 malaria D008288 Malaria 111 | DOID:12849 autistic disorder D001321 Autistic Disorder 112 | DOID:12930 dilated cardiomyopathy D002311 Cardiomyopathy, Dilated 113 | DOID:13189 gout D015210 Arthritis, Gouty 114 | DOID:13223 uterine fibroid D007889 Leiomyoma 115 | DOID:14268 sclerosing cholangitis D015209 Cholangitis, Sclerosing 116 | DOID:8986 narcolepsy D009290 Narcolepsy 117 | DOID:90 degenerative disc disease D055959 Intervertebral Disc Degeneration 118 | DOID:9296 cleft lip D002971 Cleft Lip 119 | DOID:0050156 idiopathic pulmonary fibrosis D054990 Idiopathic Pulmonary Fibrosis 120 | DOID:1094 attention deficit hyperactivity disorder D001289 Attention Deficit Disorder with Hyperactivity 121 | DOID:11119 Gilles de la Tourette syndrome D005879 Tourette Syndrome 122 | DOID:14004 thoracic aortic aneurysm D017545 Aortic Aneurysm, Thoracic 123 | DOID:1595 endogenous depression D003866 Depressive Disorder 124 | DOID:4481 allergic rhinitis D065631 Rhinitis, Allergic 125 | DOID:4989 pancreatitis D010195 Pancreatitis 126 | DOID:585 nephrolithiasis D053040 Nephrolithiasis 127 | DOID:824 periodontitis D010518 Periodontitis 128 | DOID:9206 Barrett's esophagus D001471 Barrett Esophagus 129 | DOID:11555 Fuchs' endothelial dystrophy D005642 Fuchs' Endothelial Dystrophy 130 | DOID:12185 otosclerosis D010040 Otosclerosis 131 | DOID:12995 conduct disorder D019955 Conduct Disorder 132 | DOID:1312 focal segmental glomerulosclerosis D005923 Glomerulosclerosis, Focal Segmental 133 | DOID:216 dental caries D003731 Dental Caries 134 | DOID:2355 anemia D000740 Anemia 135 | DOID:594 panic disorder D016584 Panic Disorder 136 | DOID:635 acquired immunodeficiency syndrome D000163 Acquired Immunodeficiency Syndrome 137 | -------------------------------------------------------------------------------- /data/disease-pmids-topic.tsv.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:acb1d6322425aa7e57d6f8a2cc185d9a837b3cd45d26f534f021a866b76cf98e 3 | size 16822856 4 | -------------------------------------------------------------------------------- /data/disease-pmids.tsv.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:29408bee2b7c81da5774b5ab31896ac91845380d8071d105902bff16c87f2697 3 | size 13432263 4 | -------------------------------------------------------------------------------- /data/mesh-nxo-node-link.json.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:c0221b2d057af5b1f018224eebf42ce0ec31907d0cfb276f03120eaa592d9c16 3 | size 9092183 4 | -------------------------------------------------------------------------------- /data/mesh-term-topics-noexp.jsonl.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:c0b28c6d4b5e5b6384156b450c0145a658c5cbbb10084de14b37fe22cb7d865d 3 | size 849544061 4 | -------------------------------------------------------------------------------- /data/symptom-pmids.tsv.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:a87a99ef60bf5553e49702202e20b3e70bb82b148156d1725be6c8954ef9b4a1 3 | size 7398164 4 | -------------------------------------------------------------------------------- /data/uberon-pmids.tsv.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:03247bfa21e5f078226a795bede0bec0fe91e798c95131dfa304a0f8bdaa0cbd 3 | size 21074726 4 | -------------------------------------------------------------------------------- /diseases.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Compute disease-disease-cooccurrence for Hetionet" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "collapsed": true, 15 | "jupyter": { 16 | "outputs_hidden": true 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import io\n", 22 | "import gzip\n", 23 | "\n", 24 | "import pandas\n", 25 | "import requests\n", 26 | "import networkx\n", 27 | "\n", 28 | "import eutility\n", 29 | "import cooccurrence" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "metadata": { 36 | "collapsed": false, 37 | "jupyter": { 38 | "outputs_hidden": false 39 | } 40 | }, 41 | "outputs": [ 42 | { 43 | "data": { 44 | "text/html": [ 45 | "
\n", 46 | "\n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | "
doid_codedoid_namemesh_idmesh_name
0DOID:2531hematologic cancerD019337Hematologic Neoplasms
1DOID:1319brain cancerD001932Brain Neoplasms
2DOID:1324lung cancerD008175Lung Neoplasms
3DOID:263kidney cancerD007680Kidney Neoplasms
4DOID:1793pancreatic cancerD010190Pancreatic Neoplasms
\n", 94 | "
" 95 | ], 96 | "text/plain": [ 97 | " doid_code doid_name mesh_id mesh_name\n", 98 | "0 DOID:2531 hematologic cancer D019337 Hematologic Neoplasms\n", 99 | "1 DOID:1319 brain cancer D001932 Brain Neoplasms\n", 100 | "2 DOID:1324 lung cancer D008175 Lung Neoplasms\n", 101 | "3 DOID:263 kidney cancer D007680 Kidney Neoplasms\n", 102 | "4 DOID:1793 pancreatic cancer D010190 Pancreatic Neoplasms" 103 | ] 104 | }, 105 | "execution_count": 2, 106 | "metadata": {}, 107 | "output_type": "execute_result" 108 | } 109 | ], 110 | "source": [ 111 | "# Read mappings for DO Slim terms\n", 112 | "url = 'https://raw.githubusercontent.com/dhimmel/disease-ontology/72614ade9f1cc5a5317b8f6836e1e464b31d5587/data/xrefs-slim.tsv'\n", 113 | "disease_df = pandas.read_table(url)\n", 114 | "disease_df = disease_df.query('resource == \"MSH\"').drop('resource', 1)\n", 115 | "disease_df = disease_df.rename(columns={'resource_id': 'mesh_id'})\n", 116 | "\n", 117 | "# Read MeSH terms to MeSH names\n", 118 | "url = 'https://raw.githubusercontent.com/dhimmel/mesh/e561301360e6de2140dedeaa7c7e17ce4714eb7f/data/terms.tsv'\n", 119 | "mesh_df = pandas.read_table(url)\n", 120 | "disease_df = disease_df.merge(mesh_df)\n", 121 | "\n", 122 | "# Manually remove problematic xrefs\n", 123 | "# https://github.com/obophenotype/human-disease-ontology/issues/45\n", 124 | "disease_df = disease_df.query(\"mesh_id != 'D003327' and mesh_id != 'D017202'\")\n", 125 | "disease_df.head()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "## Query PubMed" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 3, 138 | "metadata": { 139 | "collapsed": false, 140 | "jupyter": { 141 | "outputs_hidden": false 142 | } 143 | }, 144 | "outputs": [ 145 | { 146 | "name": "stdout", 147 | "output_type": "stream", 148 | "text": [ 149 | "10320 articles for Hematologic Neoplasms\n", 150 | "122727 articles for Brain Neoplasms\n", 151 | "180844 articles for Lung Neoplasms\n", 152 | "60494 articles for Kidney Neoplasms\n", 153 | "57863 articles for Pancreatic Neoplasms\n", 154 | "100038 articles for Skin Neoplasms\n", 155 | "104535 articles for Bone Neoplasms\n", 156 | "27302 articles for Pharyngeal Neoplasms\n", 157 | "65991 articles for Ovarian Neoplasms\n", 158 | "226835 articles for Breast Neoplasms\n", 159 | "63189 articles for Glioma\n", 160 | "107447 articles for Uterine Neoplasms\n", 161 | "24447 articles for Adrenal Gland Neoplasms\n", 162 | "40010 articles for Esophageal Neoplasms\n", 163 | "14552 articles for Salivary Gland Neoplasms\n", 164 | "97203 articles for Prostatic Neoplasms\n", 165 | "77286 articles for Stomach Neoplasms\n", 166 | "45208 articles for Urinary Bladder Neoplasms\n", 167 | "18495 articles for Peripheral Nervous System Neoplasms\n", 168 | "40519 articles for Thyroid Neoplasms\n", 169 | "130963 articles for Liver Neoplasms\n", 170 | "60840 articles for Uterine Cervical Neoplasms\n", 171 | "4780 articles for Vaginal Neoplasms\n", 172 | "249626 articles for Head and Neck Neoplasms\n", 173 | "38987 articles for Rectal Neoplasms\n", 174 | "34076 articles for Eye Neoplasms\n", 175 | "68917 articles for Colonic Neoplasms\n", 176 | "24448 articles for Laryngeal Neoplasms\n", 177 | "283101 articles for Neoplasms, Germ Cell and Embryonal\n", 178 | "9735 articles for Thymus Neoplasms\n", 179 | "11737 articles for Myosarcoma\n", 180 | "5565 articles for Duodenal Neoplasms\n", 181 | "2617 articles for Ileal Neoplasms\n", 182 | "117808 articles for Sarcoma\n", 183 | "2355 articles for Appendiceal Neoplasms\n", 184 | "4612 articles for Penile Neoplasms\n", 185 | "4139 articles for Ureteral Neoplasms\n", 186 | "3249 articles for Tracheal Neoplasms\n", 187 | "7161 articles for Vulvar Neoplasms\n", 188 | "1940 articles for Jejunal Neoplasms\n", 189 | "12425 articles for Peritoneal Neoplasms\n", 190 | "2738 articles for Vascular Neoplasms\n", 191 | "11841 articles for Mesothelioma\n", 192 | "76390 articles for Melanoma\n", 193 | "2380 articles for Fallopian Tube Neoplasms\n", 194 | "22613 articles for Testicular Neoplasms\n", 195 | "7358 articles for Gallbladder Neoplasms\n", 196 | "20327 articles for Meningeal Neoplasms\n", 197 | "14052 articles for Bile Duct Neoplasms\n", 198 | "12274 articles for Mediastinal Neoplasms\n", 199 | "9418 articles for Spinal Cord Neoplasms\n", 200 | "7944 articles for Retroperitoneal Neoplasms\n", 201 | "31533 articles for Crohn Disease\n", 202 | "46287 articles for Multiple Sclerosis\n", 203 | "91140 articles for Diabetes Mellitus, Type 2\n", 204 | "28289 articles for Colitis, Ulcerative\n", 205 | "62862 articles for Diabetes Mellitus, Type 1\n", 206 | "95295 articles for Arthritis, Rheumatoid\n", 207 | "40786 articles for Coronary Artery Disease\n", 208 | "148894 articles for Obesity\n", 209 | "16725 articles for Celiac Disease\n", 210 | "49965 articles for Lupus Erythematosus, Systemic\n", 211 | "26855 articles for Refractive Errors\n", 212 | "7065 articles for Liver Cirrhosis, Biliary\n", 213 | "4078 articles for Vitiligo\n", 214 | "16971 articles for Macular Degeneration\n", 215 | "21070 articles for Metabolic Syndrome X\n", 216 | "108236 articles for Asthma\n", 217 | "30896 articles for Psoriasis\n", 218 | "87056 articles for Schizophrenia\n", 219 | "22222 articles for Migraine Disorders\n", 220 | "69752 articles for Alzheimer Disease\n", 221 | "14577 articles for Graves Disease\n", 222 | "49349 articles for Parkinson Disease\n", 223 | "14898 articles for Dermatitis, Atopic\n", 224 | "32534 articles for Bipolar Disorder\n", 225 | "12161 articles for Spondylitis, Ankylosing\n", 226 | "10757 articles for Polycystic Ovary Syndrome\n", 227 | "214731 articles for Hypertension\n", 228 | "17041 articles for Scleroderma, Systemic\n", 229 | "7686 articles for Behcet Syndrome\n", 230 | "4802 articles for Osteitis Deformans\n", 231 | "20395 articles for Leprosy\n", 232 | "22378 articles for Intracranial Aneurysm\n", 233 | "43355 articles for Glaucoma\n", 234 | "13589 articles for Amyotrophic Lateral Sclerosis\n", 235 | "2744 articles for Restless Legs Syndrome\n", 236 | "4773 articles for Mucocutaneous Lymph Node Syndrome\n", 237 | "24584 articles for Atherosclerosis\n", 238 | "2509 articles for Alopecia Areata\n", 239 | "45971 articles for Osteoporosis\n", 240 | "28909 articles for Hypothyroidism\n", 241 | "4960 articles for Glomerulonephritis, IGA\n", 242 | "67451 articles for Alcoholism\n", 243 | "5771 articles for Creutzfeldt-Jakob Syndrome\n", 244 | "1206 articles for Azoospermia\n", 245 | "132583 articles for Epilepsy\n", 246 | "47571 articles for Hepatitis B\n", 247 | "38605 articles for Pulmonary Disease, Chronic Obstructive\n", 248 | "14411 articles for Aortic Aneurysm, Abdominal\n", 249 | "79638 articles for Kidney Failure, Chronic\n", 250 | "45631 articles for Osteoarthritis\n", 251 | "3974 articles for Arthritis, Psoriatic\n", 252 | "8353 articles for Tobacco Use Disorder\n", 253 | "2448 articles for Glomerulonephritis, Membranous\n", 254 | "7669 articles for Diabetes, Gestational\n", 255 | "52704 articles for Malaria\n", 256 | "16500 articles for Autistic Disorder\n", 257 | "13355 articles for Cardiomyopathy, Dilated\n", 258 | "920 articles for Arthritis, Gouty\n", 259 | "17621 articles for Leiomyoma\n", 260 | "3033 articles for Cholangitis, Sclerosing\n", 261 | "3065 articles for Narcolepsy\n", 262 | "1884 articles for Intervertebral Disc Degeneration\n", 263 | "12123 articles for Cleft Lip\n", 264 | "1442 articles for Idiopathic Pulmonary Fibrosis\n", 265 | "21145 articles for Attention Deficit Disorder with Hyperactivity\n", 266 | "3636 articles for Tourette Syndrome\n", 267 | "8889 articles for Aortic Aneurysm, Thoracic\n", 268 | "83521 articles for Depressive Disorder\n", 269 | "17875 articles for Rhinitis, Allergic\n", 270 | "44312 articles for Pancreatitis\n", 271 | "16146 articles for Nephrolithiasis\n", 272 | "24223 articles for Periodontitis\n", 273 | "6418 articles for Barrett Esophagus\n", 274 | "782 articles for Fuchs' Endothelial Dystrophy\n", 275 | "4768 articles for Otosclerosis\n", 276 | "2277 articles for Conduct Disorder\n", 277 | "4440 articles for Glomerulosclerosis, Focal Segmental\n", 278 | "37451 articles for Dental Caries\n", 279 | "138233 articles for Anemia\n", 280 | "6096 articles for Panic Disorder\n", 281 | "72916 articles for Acquired Immunodeficiency Syndrome\n" 282 | ] 283 | } 284 | ], 285 | "source": [ 286 | "rows_out = list()\n", 287 | "\n", 288 | "for i, row in disease_df.iterrows():\n", 289 | " term_query = '{disease}[MeSH Terms]'.format(disease = row.mesh_name.lower())\n", 290 | " payload = {'db': 'pubmed', 'term': term_query}\n", 291 | " pmids = eutility.esearch_query(payload, retmax = 10000)\n", 292 | " row['term_query'] = term_query\n", 293 | " row['n_articles'] = len(pmids)\n", 294 | " row['pubmed_ids'] = '|'.join(pmids)\n", 295 | " rows_out.append(row)\n", 296 | " print('{} articles for {}'.format(len(pmids), row.mesh_name))\n", 297 | "\n", 298 | "disease_pmids_df = pandas.DataFrame(rows_out)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 4, 304 | "metadata": { 305 | "collapsed": true, 306 | "jupyter": { 307 | "outputs_hidden": true 308 | } 309 | }, 310 | "outputs": [], 311 | "source": [ 312 | "with gzip.open('data/disease-pmids-topic.tsv.gz', 'wt') as write_file:\n", 313 | " disease_pmids_df.to_csv(write_file, sep='\\t', index=False)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "## Analyze data" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 5, 326 | "metadata": { 327 | "collapsed": true, 328 | "jupyter": { 329 | "outputs_hidden": true 330 | } 331 | }, 332 | "outputs": [], 333 | "source": [ 334 | "disease_df, disease_to_pmids = cooccurrence.read_pmids_tsv('data/disease-pmids-topic.tsv.gz', key='doid_code')" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 6, 340 | "metadata": { 341 | "collapsed": false, 342 | "jupyter": { 343 | "outputs_hidden": false 344 | } 345 | }, 346 | "outputs": [ 347 | { 348 | "name": "stdout", 349 | "output_type": "stream", 350 | "text": [ 351 | "Total articles containing a doid_code_0: 4161769\n", 352 | "Total articles containing a doid_code_1: 4161769\n", 353 | "Total articles containing both a doid_code_0 and doid_code_1: 4161769\n", 354 | "\n", 355 | "After removing terms without any cooccurences:\n", 356 | "+ 133 doid_code_0s remain\n", 357 | "+ 133 doid_code_1s remain\n", 358 | "\n", 359 | "Cooccurrence scores calculated for 17689 doid_code_0 -- doid_code_1 pairs\n" 360 | ] 361 | } 362 | ], 363 | "source": [ 364 | "cooc_df = cooccurrence.score_pmid_cooccurrence(disease_to_pmids, disease_to_pmids, 'doid_code_0', 'doid_code_1')" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 7, 370 | "metadata": { 371 | "collapsed": false, 372 | "jupyter": { 373 | "outputs_hidden": false 374 | } 375 | }, 376 | "outputs": [ 377 | { 378 | "data": { 379 | "text/html": [ 380 | "
\n", 381 | "\n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | "
doid_codedoid_namemesh_idmesh_nameterm_queryn_articles
0DOID:2531hematologic cancerD019337Hematologic Neoplasmshematologic neoplasms[MeSH Terms]10320
1DOID:1319brain cancerD001932Brain Neoplasmsbrain neoplasms[MeSH Terms]122727
2DOID:1324lung cancerD008175Lung Neoplasmslung neoplasms[MeSH Terms]180844
3DOID:263kidney cancerD007680Kidney Neoplasmskidney neoplasms[MeSH Terms]60494
4DOID:1793pancreatic cancerD010190Pancreatic Neoplasmspancreatic neoplasms[MeSH Terms]57863
\n", 441 | "
" 442 | ], 443 | "text/plain": [ 444 | " doid_code doid_name mesh_id mesh_name \\\n", 445 | "0 DOID:2531 hematologic cancer D019337 Hematologic Neoplasms \n", 446 | "1 DOID:1319 brain cancer D001932 Brain Neoplasms \n", 447 | "2 DOID:1324 lung cancer D008175 Lung Neoplasms \n", 448 | "3 DOID:263 kidney cancer D007680 Kidney Neoplasms \n", 449 | "4 DOID:1793 pancreatic cancer D010190 Pancreatic Neoplasms \n", 450 | "\n", 451 | " term_query n_articles \n", 452 | "0 hematologic neoplasms[MeSH Terms] 10320 \n", 453 | "1 brain neoplasms[MeSH Terms] 122727 \n", 454 | "2 lung neoplasms[MeSH Terms] 180844 \n", 455 | "3 kidney neoplasms[MeSH Terms] 60494 \n", 456 | "4 pancreatic neoplasms[MeSH Terms] 57863 " 457 | ] 458 | }, 459 | "execution_count": 7, 460 | "metadata": {}, 461 | "output_type": "execute_result" 462 | } 463 | ], 464 | "source": [ 465 | "disease_df.head()" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 8, 471 | "metadata": { 472 | "collapsed": false, 473 | "jupyter": { 474 | "outputs_hidden": false 475 | } 476 | }, 477 | "outputs": [ 478 | { 479 | "data": { 480 | "text/html": [ 481 | "
\n", 482 | "\n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | "
doid_code_0doid_code_1cooccurrenceexpectedenrichmentodds_ratiop_fisher
0DOID:11615DOID:1161546125.110938902.378361inf0.000000
1DOID:11615DOID:8577131.3493780.0318990.0316541.000000
2DOID:11615DOID:5612210.4368640.1916280.1911060.999669
3DOID:11615DOID:14330054.6877030.0000000.0000001.000000
4DOID:11615DOID:005042503.0408530.0000000.0000001.000000
\n", 548 | "
" 549 | ], 550 | "text/plain": [ 551 | " doid_code_0 doid_code_1 cooccurrence expected enrichment odds_ratio \\\n", 552 | "0 DOID:11615 DOID:11615 4612 5.110938 902.378361 inf \n", 553 | "1 DOID:11615 DOID:8577 1 31.349378 0.031899 0.031654 \n", 554 | "2 DOID:11615 DOID:5612 2 10.436864 0.191628 0.191106 \n", 555 | "3 DOID:11615 DOID:14330 0 54.687703 0.000000 0.000000 \n", 556 | "4 DOID:11615 DOID:0050425 0 3.040853 0.000000 0.000000 \n", 557 | "\n", 558 | " p_fisher \n", 559 | "0 0.000000 \n", 560 | "1 1.000000 \n", 561 | "2 0.999669 \n", 562 | "3 1.000000 \n", 563 | "4 1.000000 " 564 | ] 565 | }, 566 | "execution_count": 8, 567 | "metadata": {}, 568 | "output_type": "execute_result" 569 | } 570 | ], 571 | "source": [ 572 | "cooc_df.head()" 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": 9, 578 | "metadata": { 579 | "collapsed": false, 580 | "jupyter": { 581 | "outputs_hidden": false 582 | } 583 | }, 584 | "outputs": [], 585 | "source": [ 586 | "cooc_df = cooc_df[cooc_df['doid_code_0'] != cooc_df['doid_code_1']]\n", 587 | "doid_name_df = disease_df[['doid_code', 'doid_name']].drop_duplicates()\n", 588 | "cooc_df = doid_name_df.rename(columns={'doid_code': 'doid_code_1', 'doid_name': 'doid_name_1'}).merge(cooc_df)\n", 589 | "cooc_df = doid_name_df.rename(columns={'doid_code': 'doid_code_0', 'doid_name': 'doid_name_0'}).merge(cooc_df)\n", 590 | "cooc_df = cooc_df.sort_values(by=['doid_name_0', 'p_fisher'])" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": 10, 596 | "metadata": { 597 | "collapsed": false, 598 | "jupyter": { 599 | "outputs_hidden": false 600 | } 601 | }, 602 | "outputs": [ 603 | { 604 | "data": { 605 | "text/html": [ 606 | "
\n", 607 | "\n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | "
doid_code_0doid_name_0doid_code_1doid_name_1cooccurrenceexpectedenrichmentodds_ratiop_fisher
9444DOID:10652Alzheimer's diseaseDOID:14330Parkinson's disease2760827.0981523.3369683.5773980.000000e+00
9465DOID:10652Alzheimer's diseaseDOID:11949Creutzfeldt-Jakob disease33296.7230023.4324823.5933063.377672e-80
9456DOID:10652Alzheimer's diseaseDOID:332amyotrophic lateral sclerosis451227.7540941.9802062.0204525.524978e-40
9496DOID:10652Alzheimer's diseaseDOID:11555Fuchs' endothelial dystrophy113.1064610.0762980.0751029.999982e-01
9490DOID:10652Alzheimer's diseaseDOID:1595endogenous depression12211399.8270430.8722510.8680459.999997e-01
\n", 685 | "
" 686 | ], 687 | "text/plain": [ 688 | " doid_code_0 doid_name_0 doid_code_1 \\\n", 689 | "9444 DOID:10652 Alzheimer's disease DOID:14330 \n", 690 | "9465 DOID:10652 Alzheimer's disease DOID:11949 \n", 691 | "9456 DOID:10652 Alzheimer's disease DOID:332 \n", 692 | "9496 DOID:10652 Alzheimer's disease DOID:11555 \n", 693 | "9490 DOID:10652 Alzheimer's disease DOID:1595 \n", 694 | "\n", 695 | " doid_name_1 cooccurrence expected enrichment \\\n", 696 | "9444 Parkinson's disease 2760 827.098152 3.336968 \n", 697 | "9465 Creutzfeldt-Jakob disease 332 96.723002 3.432482 \n", 698 | "9456 amyotrophic lateral sclerosis 451 227.754094 1.980206 \n", 699 | "9496 Fuchs' endothelial dystrophy 1 13.106461 0.076298 \n", 700 | "9490 endogenous depression 1221 1399.827043 0.872251 \n", 701 | "\n", 702 | " odds_ratio p_fisher \n", 703 | "9444 3.577398 0.000000e+00 \n", 704 | "9465 3.593306 3.377672e-80 \n", 705 | "9456 2.020452 5.524978e-40 \n", 706 | "9496 0.075102 9.999982e-01 \n", 707 | "9490 0.868045 9.999997e-01 " 708 | ] 709 | }, 710 | "execution_count": 10, 711 | "metadata": {}, 712 | "output_type": "execute_result" 713 | } 714 | ], 715 | "source": [ 716 | "cooc_df.head()" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": 11, 722 | "metadata": { 723 | "collapsed": false, 724 | "jupyter": { 725 | "outputs_hidden": false 726 | } 727 | }, 728 | "outputs": [ 729 | { 730 | "data": { 731 | "text/plain": [ 732 | "17556" 733 | ] 734 | }, 735 | "execution_count": 11, 736 | "metadata": {}, 737 | "output_type": "execute_result" 738 | } 739 | ], 740 | "source": [ 741 | "len(cooc_df)" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 12, 747 | "metadata": { 748 | "collapsed": false, 749 | "jupyter": { 750 | "outputs_hidden": false 751 | } 752 | }, 753 | "outputs": [ 754 | { 755 | "data": { 756 | "text/plain": [ 757 | "1086" 758 | ] 759 | }, 760 | "execution_count": 12, 761 | "metadata": {}, 762 | "output_type": "execute_result" 763 | } 764 | ], 765 | "source": [ 766 | "len(cooc_df[cooc_df.p_fisher <= 0.005])" 767 | ] 768 | }, 769 | { 770 | "cell_type": "code", 771 | "execution_count": 13, 772 | "metadata": { 773 | "collapsed": true, 774 | "jupyter": { 775 | "outputs_hidden": true 776 | } 777 | }, 778 | "outputs": [], 779 | "source": [ 780 | "cooc_df.to_csv('data/disease-disease-cooccurrence.tsv', index=False, sep='\\t')" 781 | ] 782 | } 783 | ], 784 | "metadata": { 785 | "kernelspec": { 786 | "display_name": "Python 3", 787 | "language": "python", 788 | "name": "python3" 789 | }, 790 | "language_info": { 791 | "codemirror_mode": { 792 | "name": "ipython", 793 | "version": 3 794 | }, 795 | "file_extension": ".py", 796 | "mimetype": "text/x-python", 797 | "name": "python", 798 | "nbconvert_exporter": "python", 799 | "pygments_lexer": "ipython3", 800 | "version": "3.9.2" 801 | } 802 | }, 803 | "nbformat": 4, 804 | "nbformat_minor": 4 805 | } 806 | -------------------------------------------------------------------------------- /download-topics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "59a31d15-b048-4518-a56a-394d456d57a6", 6 | "metadata": {}, 7 | "source": [ 8 | "# Download MEDLINE topics for all MeSH Topical Descriptors and SCR Diseases" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0fd925e3-aba3-434b-92b1-b268f6a7799b", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import datetime\n", 19 | "import gzip\n", 20 | "import pathlib\n", 21 | "\n", 22 | "import tenacity\n", 23 | "import jsonlines\n", 24 | "import tqdm\n", 25 | "import pandas as pd\n", 26 | "from pubmedpy.eutilities import esearch_query\n", 27 | "from nxontology import NXOntology" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "ea711065-ce1f-4aac-b6c6-abd0a504d80c", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "@tenacity.retry(wait=tenacity.wait_exponential(min=2, max=2**10))\n", 38 | "def query_topic(mesh_term: str, scr: bool = False) -> dict:\n", 39 | " \"\"\"\n", 40 | " mesh_term is the name/label of a MeSH Term.\n", 41 | " scr: whether the MeSH term is a supplementary concept.\n", 42 | " See https://github.com/hetio/medline/issues/4.\n", 43 | " \"\"\"\n", 44 | " result = {}\n", 45 | " # https://pubmed.ncbi.nlm.nih.gov/help/#pubmed-format\n", 46 | " term_query = f'\"{mesh_term}\" [{\"Supplementary Concept\" if scr else \"MeSH Terms\"}:noexp]'\n", 47 | " result[\"pubmed_search\"] = term_query\n", 48 | " payload = {'db': 'pubmed', 'term': term_query}\n", 49 | " result[\"timestamp\"] = datetime.datetime.utcnow().isoformat(timespec=\"seconds\")\n", 50 | " result[\"pubmed_ids\"] = sorted(esearch_query(payload, retmax = 5000, tqdm=None))\n", 51 | " return result" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 3, 57 | "id": "df08612e-df1c-411b-9e5f-42cba0192574", 58 | "metadata": { 59 | "tags": [] 60 | }, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "{'pubmed_search': '\"Tabatznik syndrome\" [Supplementary Concept:noexp]',\n", 66 | " 'timestamp': '2021-04-12T20:05:16',\n", 67 | " 'pubmed_ids': []}" 68 | ] 69 | }, 70 | "execution_count": 3, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "# example query\n", 77 | "query_topic(\"Tabatznik syndrome\", scr=True)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "id": "f50387db-d61f-417d-9316-664f2c42c510", 83 | "metadata": {}, 84 | "source": [ 85 | "## Load MeSH Ontology" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 4, 91 | "id": "967432ee-1e49-4236-a076-ff922bf3a071", 92 | "metadata": {}, 93 | "outputs": [ 94 | { 95 | "data": { 96 | "text/plain": [ 97 | "300093" 98 | ] 99 | }, 100 | "execution_count": 4, 101 | "metadata": {}, 102 | "output_type": "execute_result" 103 | } 104 | ], 105 | "source": [ 106 | "# read the MeSH ontology\n", 107 | "nxo = NXOntology.read_node_link_json(\"data/mesh-nxo-node-link.json.gz\")\n", 108 | "nxo.n_nodes" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 5, 114 | "id": "ad017896-60a7-4dfc-ae78-8e89c339c68d", 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/html": [ 120 | "
\n", 121 | "\n", 134 | "\n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | "
mesh_idmesh_classmesh_urimesh_labeltree_numbers
0D005260CheckTaghttp://id.nlm.nih.gov/mesh/2020/D005260FemaleNaN
1D008297CheckTaghttp://id.nlm.nih.gov/mesh/2020/D008297MaleNaN
\n", 164 | "
" 165 | ], 166 | "text/plain": [ 167 | " mesh_id mesh_class mesh_uri mesh_label \\\n", 168 | "0 D005260 CheckTag http://id.nlm.nih.gov/mesh/2020/D005260 Female \n", 169 | "1 D008297 CheckTag http://id.nlm.nih.gov/mesh/2020/D008297 Male \n", 170 | "\n", 171 | " tree_numbers \n", 172 | "0 NaN \n", 173 | "1 NaN " 174 | ] 175 | }, 176 | "execution_count": 5, 177 | "metadata": {}, 178 | "output_type": "execute_result" 179 | } 180 | ], 181 | "source": [ 182 | "nodes_data = [data for node, data in nxo.graph.nodes(data=True)]\n", 183 | "nodes_data.sort(key=lambda x: (x[\"mesh_class\"], x[\"mesh_id\"]))\n", 184 | "term_df = pd.DataFrame(nodes_data)\n", 185 | "term_df.head(2)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "id": "755c6040-83e4-4752-aebb-ff0de812eef9", 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/plain": [ 197 | "SCR_Chemical 243740\n", 198 | "TopicalDescriptor 29054\n", 199 | "SCR_Organism 19019\n", 200 | "SCR_Disease 6479\n", 201 | "SCR_Protocol 1215\n", 202 | "GeographicalDescriptor 397\n", 203 | "PublicationType 187\n", 204 | "CheckTag 2\n", 205 | "Name: mesh_class, dtype: int64" 206 | ] 207 | }, 208 | "execution_count": 6, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "term_df.mesh_class.value_counts()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 7, 220 | "id": "d2026b7b-32e8-437a-8a13-0902cc7f13f9", 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "35533" 227 | ] 228 | }, 229 | "execution_count": 7, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "# filter to classes of interest\n", 236 | "keep_classes = {\"TopicalDescriptor\", \"SCR_Disease\"}\n", 237 | "nodes_data = [info for info in nodes_data if info[\"mesh_class\"] in keep_classes]\n", 238 | "mesh_ids = [x[\"mesh_id\"] for x in nodes_data]\n", 239 | "len(nodes_data)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 8, 245 | "id": "bf487760-6978-4509-bdea-9bbd719d6d08", 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "data": { 250 | "text/plain": [ 251 | "{'mesh_id': 'C000591739',\n", 252 | " 'mesh_class': 'SCR_Disease',\n", 253 | " 'mesh_uri': 'http://id.nlm.nih.gov/mesh/2020/C000591739',\n", 254 | " 'mesh_label': 'Familial gynecomastia, due to increased aromatase activity'}" 255 | ] 256 | }, 257 | "execution_count": 8, 258 | "metadata": {}, 259 | "output_type": "execute_result" 260 | } 261 | ], 262 | "source": [ 263 | "nodes_data[0]" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "id": "c5c5f986-b705-4ffa-a362-7690458f3659", 269 | "metadata": {}, 270 | "source": [ 271 | "## Perform queries" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 9, 277 | "id": "d5eb4727-d219-4635-88f7-659be5e66746", 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "name": "stdout", 282 | "output_type": "stream", 283 | "text": [ 284 | "35,533 total mesh_ids: 0 already queried, 35,533 new\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "# read already queried affiliations\n", 290 | "path = pathlib.Path('data/mesh-term-topics-noexp.jsonl.gz')\n", 291 | "lines = jsonlines.Reader(gzip.open(path, \"rt\")) if path.exists() else []\n", 292 | "existing = {row['mesh_id'] for row in lines}\n", 293 | "new = sorted(set(mesh_ids) - existing)\n", 294 | "print(f\"{len(mesh_ids):,} total mesh_ids: {len(existing):,} already queried, {len(new):,} new\")" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 10, 300 | "id": "f7cf722e-5239-49ce-9c1d-8090c497ac11", 301 | "metadata": {}, 302 | "outputs": [ 303 | { 304 | "name": "stderr", 305 | "output_type": "stream", 306 | "text": [ 307 | "100%|██████████| 35533/35533 [15:30:42<00:00, 1.57s/it] \n" 308 | ] 309 | } 310 | ], 311 | "source": [ 312 | "# query new affiliations and append to JSON Lines file\n", 313 | "write_file = gzip.GzipFile(filename=path, mode=\"ab\", mtime=0)\n", 314 | "with write_file:\n", 315 | " with jsonlines.Writer(write_file) as writer:\n", 316 | " for mesh_id in tqdm.tqdm(new):\n", 317 | " result = nxo.graph.nodes[mesh_id].copy()\n", 318 | " result.update(query_topic(result[\"mesh_label\"], result[\"mesh_class\"] != \"TopicalDescriptor\"))\n", 319 | " writer.write(result)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 11, 325 | "id": "14ce8898-f1be-4e17-b78b-128bfe08676a", 326 | "metadata": {}, 327 | "outputs": [ 328 | { 329 | "data": { 330 | "text/plain": [ 331 | "35533" 332 | ] 333 | }, 334 | "execution_count": 11, 335 | "metadata": {}, 336 | "output_type": "execute_result" 337 | } 338 | ], 339 | "source": [ 340 | "# Read the jsonlines file\n", 341 | "with jsonlines.Reader(gzip.open(path, \"rt\")) as reader:\n", 342 | " lines = list(reader)\n", 343 | "len(lines)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 12, 349 | "id": "02a69105-a4b0-48cb-a834-465e93e0156c", 350 | "metadata": {}, 351 | "outputs": [ 352 | { 353 | "data": { 354 | "text/plain": [ 355 | "['mesh_id',\n", 356 | " 'mesh_class',\n", 357 | " 'mesh_uri',\n", 358 | " 'mesh_label',\n", 359 | " 'pubmed_search',\n", 360 | " 'timestamp',\n", 361 | " 'pubmed_ids']" 362 | ] 363 | }, 364 | "execution_count": 12, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "# Show keys for a single line\n", 371 | "list(lines[0])" 372 | ] 373 | } 374 | ], 375 | "metadata": { 376 | "kernelspec": { 377 | "display_name": "Python 3", 378 | "language": "python", 379 | "name": "python3" 380 | }, 381 | "language_info": { 382 | "codemirror_mode": { 383 | "name": "ipython", 384 | "version": 3 385 | }, 386 | "file_extension": ".py", 387 | "mimetype": "text/x-python", 388 | "name": "python", 389 | "nbconvert_exporter": "python", 390 | "pygments_lexer": "ipython3", 391 | "version": "3.9.2" 392 | } 393 | }, 394 | "nbformat": 4, 395 | "nbformat_minor": 5 396 | } 397 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: medline 2 | 3 | channels: 4 | - conda-forge 5 | 6 | dependencies: 7 | - ipywidgets=7.6.3 8 | - jsonlines=2.0.0 9 | - jupyterlab=3.0.13 10 | - lxml=4.6.3 11 | - networkx=2.5.1 12 | - numpy=1.20.2 13 | - pandas=1.2.3 14 | - python=3.9.2 15 | - requests=2.25.1 16 | - scipy=1.6.2 17 | - tenacity=7.0.0 18 | - tqdm=4.60.0 19 | - pip 20 | - pip: 21 | - git+https://github.com/dhimmel/pubmedpy.git@9d716768f5ab798ec448154588e4fd99afd7584a 22 | - nxontology==0.1.4 23 | 24 | -------------------------------------------------------------------------------- /eutility.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | import xml.etree.ElementTree as ET 4 | 5 | import requests 6 | 7 | def esearch_query(payload, retmax = 100, sleep=2): 8 | """ 9 | Query the esearch E-utility. 10 | NOTE: use `pubmedpy.eutilities.esearch_query` instead. 11 | This function might be deleted in the future. 12 | """ 13 | url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi' 14 | payload['retmax'] = retmax 15 | payload['retstart'] = 0 16 | ids = list() 17 | count = 1 18 | while payload['retstart'] < count: 19 | response = requests.get(url, params=payload) 20 | xml = ET.fromstring(response.content) 21 | count = int(xml.findtext('Count')) 22 | ids += [xml_id.text for xml_id in xml.findall('IdList/Id')] 23 | payload['retstart'] += retmax 24 | time.sleep(sleep) 25 | return ids 26 | -------------------------------------------------------------------------------- /symptoms.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Compute symptom-disease cooccurrence for Hetionet" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "collapsed": false, 15 | "jupyter": { 16 | "outputs_hidden": false 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import io\n", 22 | "import gzip\n", 23 | "\n", 24 | "import pandas\n", 25 | "import requests\n", 26 | "import networkx\n", 27 | "\n", 28 | "import eutility\n", 29 | "import cooccurrence" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "metadata": { 36 | "collapsed": false, 37 | "jupyter": { 38 | "outputs_hidden": false 39 | } 40 | }, 41 | "outputs": [ 42 | { 43 | "data": { 44 | "text/html": [ 45 | "
\n", 46 | "\n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | "
doid_codedoid_namemesh_idmesh_name
0DOID:2531hematologic cancerD019337Hematologic Neoplasms
1DOID:1319brain cancerD001932Brain Neoplasms
2DOID:1324lung cancerD008175Lung Neoplasms
3DOID:263kidney cancerD007680Kidney Neoplasms
4DOID:1793pancreatic cancerD010190Pancreatic Neoplasms
\n", 94 | "
" 95 | ], 96 | "text/plain": [ 97 | " doid_code doid_name mesh_id mesh_name\n", 98 | "0 DOID:2531 hematologic cancer D019337 Hematologic Neoplasms\n", 99 | "1 DOID:1319 brain cancer D001932 Brain Neoplasms\n", 100 | "2 DOID:1324 lung cancer D008175 Lung Neoplasms\n", 101 | "3 DOID:263 kidney cancer D007680 Kidney Neoplasms\n", 102 | "4 DOID:1793 pancreatic cancer D010190 Pancreatic Neoplasms" 103 | ] 104 | }, 105 | "execution_count": 2, 106 | "metadata": {}, 107 | "output_type": "execute_result" 108 | } 109 | ], 110 | "source": [ 111 | "# Read mappings for DO Slim terms\n", 112 | "url = 'https://raw.githubusercontent.com/dhimmel/disease-ontology/72614ade9f1cc5a5317b8f6836e1e464b31d5587/data/xrefs-slim.tsv'\n", 113 | "disease_df = pandas.read_table(url)\n", 114 | "disease_df = disease_df.query('resource == \"MSH\"').drop('resource', 1)\n", 115 | "disease_df = disease_df.rename(columns={'resource_id': 'mesh_id'})\n", 116 | "\n", 117 | "# Read MeSH terms to MeSH names\n", 118 | "url = 'https://raw.githubusercontent.com/dhimmel/mesh/e561301360e6de2140dedeaa7c7e17ce4714eb7f/data/terms.tsv'\n", 119 | "mesh_df = pandas.read_table(url)\n", 120 | "disease_df = disease_df.merge(mesh_df)\n", 121 | "\n", 122 | "# Manually remove problematic xrefs\n", 123 | "# https://github.com/obophenotype/human-disease-ontology/issues/45\n", 124 | "disease_df = disease_df.query(\"mesh_id != 'D003327' and mesh_id != 'D017202'\")\n", 125 | "disease_df.head()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "# Diseases" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 3, 138 | "metadata": { 139 | "collapsed": false, 140 | "jupyter": { 141 | "outputs_hidden": false 142 | }, 143 | "scrolled": true 144 | }, 145 | "outputs": [ 146 | { 147 | "name": "stdout", 148 | "output_type": "stream", 149 | "text": [ 150 | "7382 articles for Hematologic Neoplasms\n", 151 | "99586 articles for Brain Neoplasms\n", 152 | "139299 articles for Lung Neoplasms\n", 153 | "49515 articles for Kidney Neoplasms\n", 154 | "46298 articles for Pancreatic Neoplasms\n", 155 | "85654 articles for Skin Neoplasms\n", 156 | "83658 articles for Bone Neoplasms\n", 157 | "22125 articles for Pharyngeal Neoplasms\n", 158 | "53989 articles for Ovarian Neoplasms\n", 159 | "188908 articles for Breast Neoplasms\n", 160 | "49539 articles for Glioma\n", 161 | "89055 articles for Uterine Neoplasms\n", 162 | "18514 articles for Adrenal Gland Neoplasms\n", 163 | "33421 articles for Esophageal Neoplasms\n", 164 | "12158 articles for Salivary Gland Neoplasms\n", 165 | "83257 articles for Prostatic Neoplasms\n", 166 | "64512 articles for Stomach Neoplasms\n", 167 | "37569 articles for Urinary Bladder Neoplasms\n", 168 | "14765 articles for Peripheral Nervous System Neoplasms\n", 169 | "33286 articles for Thyroid Neoplasms\n", 170 | "97650 articles for Liver Neoplasms\n", 171 | "50220 articles for Uterine Cervical Neoplasms\n", 172 | "3507 articles for Vaginal Neoplasms\n", 173 | "210249 articles for Head and Neck Neoplasms\n", 174 | "32809 articles for Rectal Neoplasms\n", 175 | "28761 articles for Eye Neoplasms\n", 176 | "50799 articles for Colonic Neoplasms\n", 177 | "19451 articles for Laryngeal Neoplasms\n", 178 | "225331 articles for Neoplasms, Germ Cell and Embryonal\n", 179 | "7330 articles for Thymus Neoplasms\n", 180 | "8568 articles for Myosarcoma\n", 181 | "4375 articles for Duodenal Neoplasms\n", 182 | "2161 articles for Ileal Neoplasms\n", 183 | "90305 articles for Sarcoma\n", 184 | "2002 articles for Appendiceal Neoplasms\n", 185 | "3898 articles for Penile Neoplasms\n", 186 | "3458 articles for Ureteral Neoplasms\n", 187 | "2526 articles for Tracheal Neoplasms\n", 188 | "5866 articles for Vulvar Neoplasms\n", 189 | "1649 articles for Jejunal Neoplasms\n", 190 | "9852 articles for Peritoneal Neoplasms\n", 191 | "2469 articles for Vascular Neoplasms\n", 192 | "9785 articles for Mesothelioma\n", 193 | "60090 articles for Melanoma\n", 194 | "2040 articles for Fallopian Tube Neoplasms\n", 195 | "18463 articles for Testicular Neoplasms\n", 196 | "5919 articles for Gallbladder Neoplasms\n", 197 | "15236 articles for Meningeal Neoplasms\n", 198 | "11129 articles for Bile Duct Neoplasms\n", 199 | "9591 articles for Mediastinal Neoplasms\n", 200 | "7736 articles for Spinal Cord Neoplasms\n", 201 | "6254 articles for Retroperitoneal Neoplasms\n", 202 | "24975 articles for Crohn Disease\n", 203 | "39550 articles for Multiple Sclerosis\n", 204 | "72794 articles for Diabetes Mellitus, Type 2\n", 205 | "22055 articles for Colitis, Ulcerative\n", 206 | "50883 articles for Diabetes Mellitus, Type 1\n", 207 | "76622 articles for Arthritis, Rheumatoid\n", 208 | "33214 articles for Coronary Artery Disease\n", 209 | "105366 articles for Obesity\n", 210 | "13742 articles for Celiac Disease\n", 211 | "39741 articles for Lupus Erythematosus, Systemic\n", 212 | "19765 articles for Refractive Errors\n", 213 | "5304 articles for Liver Cirrhosis, Biliary\n", 214 | "3492 articles for Vitiligo\n", 215 | "13479 articles for Macular Degeneration\n", 216 | "16426 articles for Metabolic Syndrome X\n", 217 | "88631 articles for Asthma\n", 218 | "25313 articles for Psoriasis\n", 219 | "69283 articles for Schizophrenia\n", 220 | "18093 articles for Migraine Disorders\n", 221 | "55360 articles for Alzheimer Disease\n", 222 | "10965 articles for Graves Disease\n", 223 | "40397 articles for Parkinson Disease\n", 224 | "11782 articles for Dermatitis, Atopic\n", 225 | "24524 articles for Bipolar Disorder\n", 226 | "9401 articles for Spondylitis, Ankylosing\n", 227 | "8888 articles for Polycystic Ovary Syndrome\n", 228 | "155557 articles for Hypertension\n", 229 | "14044 articles for Scleroderma, Systemic\n", 230 | "6764 articles for Behcet Syndrome\n", 231 | "3814 articles for Osteitis Deformans\n", 232 | "18561 articles for Leprosy\n", 233 | "18785 articles for Intracranial Aneurysm\n", 234 | "35366 articles for Glaucoma\n", 235 | "11500 articles for Amyotrophic Lateral Sclerosis\n", 236 | "2296 articles for Restless Legs Syndrome\n", 237 | "4319 articles for Mucocutaneous Lymph Node Syndrome\n", 238 | "18009 articles for Atherosclerosis\n", 239 | "2125 articles for Alopecia Areata\n", 240 | "32547 articles for Osteoporosis\n", 241 | "20300 articles for Hypothyroidism\n", 242 | "4202 articles for Glomerulonephritis, IGA\n", 243 | "49443 articles for Alcoholism\n", 244 | "4464 articles for Creutzfeldt-Jakob Syndrome\n", 245 | "864 articles for Azoospermia\n", 246 | "102949 articles for Epilepsy\n", 247 | "36716 articles for Hepatitis B\n", 248 | "30665 articles for Pulmonary Disease, Chronic Obstructive\n", 249 | "12886 articles for Aortic Aneurysm, Abdominal\n", 250 | "54875 articles for Kidney Failure, Chronic\n", 251 | "33398 articles for Osteoarthritis\n", 252 | "2999 articles for Arthritis, Psoriatic\n", 253 | "6354 articles for Tobacco Use Disorder\n", 254 | "1918 articles for Glomerulonephritis, Membranous\n", 255 | "6056 articles for Diabetes, Gestational\n", 256 | "43489 articles for Malaria\n", 257 | "13959 articles for Autistic Disorder\n", 258 | "10108 articles for Cardiomyopathy, Dilated\n", 259 | "724 articles for Arthritis, Gouty\n", 260 | "14343 articles for Leiomyoma\n", 261 | "2309 articles for Cholangitis, Sclerosing\n", 262 | "2374 articles for Narcolepsy\n", 263 | "1561 articles for Intervertebral Disc Degeneration\n", 264 | "9599 articles for Cleft Lip\n", 265 | "1277 articles for Idiopathic Pulmonary Fibrosis\n", 266 | "16912 articles for Attention Deficit Disorder with Hyperactivity\n", 267 | "3143 articles for Tourette Syndrome\n", 268 | "7893 articles for Aortic Aneurysm, Thoracic\n", 269 | "63783 articles for Depressive Disorder\n", 270 | "13894 articles for Rhinitis, Allergic\n", 271 | "35263 articles for Pancreatitis\n", 272 | "12217 articles for Nephrolithiasis\n", 273 | "16409 articles for Periodontitis\n", 274 | "5256 articles for Barrett Esophagus\n", 275 | "550 articles for Fuchs' Endothelial Dystrophy\n", 276 | "3870 articles for Otosclerosis\n", 277 | "1486 articles for Conduct Disorder\n", 278 | "2979 articles for Glomerulosclerosis, Focal Segmental\n", 279 | "25969 articles for Dental Caries\n", 280 | "105132 articles for Anemia\n", 281 | "4634 articles for Panic Disorder\n", 282 | "58396 articles for Acquired Immunodeficiency Syndrome\n" 283 | ] 284 | } 285 | ], 286 | "source": [ 287 | "rows_out = list()\n", 288 | "\n", 289 | "for i, row in disease_df.iterrows():\n", 290 | " term_query = '{disease}[MeSH Major Topic]'.format(disease = row.mesh_name.lower())\n", 291 | " payload = {'db': 'pubmed', 'term': term_query}\n", 292 | " pmids = eutility.esearch_query(payload, retmax = 10000)\n", 293 | " row['term_query'] = term_query\n", 294 | " row['n_articles'] = len(pmids)\n", 295 | " row['pubmed_ids'] = '|'.join(pmids)\n", 296 | " rows_out.append(row)\n", 297 | " print('{} articles for {}'.format(len(pmids), row.mesh_name))\n", 298 | "\n", 299 | "disease_pmids_df = pandas.DataFrame(rows_out)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 4, 305 | "metadata": { 306 | "collapsed": false, 307 | "jupyter": { 308 | "outputs_hidden": false 309 | } 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "with gzip.open('data/disease-pmids.tsv.gz', 'w') as write_file:\n", 314 | " write_file = io.TextIOWrapper(write_file)\n", 315 | " disease_pmids_df.to_csv(write_file, sep='\\t', index=False)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "# Symptoms" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 5, 328 | "metadata": { 329 | "collapsed": false, 330 | "jupyter": { 331 | "outputs_hidden": false 332 | } 333 | }, 334 | "outputs": [ 335 | { 336 | "data": { 337 | "text/html": [ 338 | "
\n", 339 | "\n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | "
mesh_idmesh_namein_hsdn
0D000006Abdomen, Acute1
1D000270Adie Syndrome0
2D000326Adrenoleukodystrophy0
3D000334Aerophagy1
4D000370Ageusia1
\n", 381 | "
" 382 | ], 383 | "text/plain": [ 384 | " mesh_id mesh_name in_hsdn\n", 385 | "0 D000006 Abdomen, Acute 1\n", 386 | "1 D000270 Adie Syndrome 0\n", 387 | "2 D000326 Adrenoleukodystrophy 0\n", 388 | "3 D000334 Aerophagy 1\n", 389 | "4 D000370 Ageusia 1" 390 | ] 391 | }, 392 | "execution_count": 5, 393 | "metadata": {}, 394 | "output_type": "execute_result" 395 | } 396 | ], 397 | "source": [ 398 | "# Read MeSH Symptoms\n", 399 | "url = 'https://raw.githubusercontent.com/dhimmel/mesh/e561301360e6de2140dedeaa7c7e17ce4714eb7f/data/symptoms.tsv'\n", 400 | "symptom_df = pandas.read_table(url)\n", 401 | "symptom_df.head()" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": 6, 407 | "metadata": { 408 | "collapsed": false, 409 | "jupyter": { 410 | "outputs_hidden": false 411 | } 412 | }, 413 | "outputs": [ 414 | { 415 | "name": "stdout", 416 | "output_type": "stream", 417 | "text": [ 418 | "8496 articles for Abdomen, Acute\n", 419 | "313 articles for Adie Syndrome\n", 420 | "1508 articles for Adrenoleukodystrophy\n", 421 | "261 articles for Aerophagy\n", 422 | "222 articles for Ageusia\n", 423 | "2049 articles for Agnosia\n", 424 | "849 articles for Agraphia\n", 425 | "12310 articles for Albuminuria\n", 426 | "1118 articles for Alcohol Amnestic Disorder\n", 427 | "846 articles for Alkalosis, Respiratory\n", 428 | "5803 articles for Amblyopia\n", 429 | "6330 articles for Amnesia\n", 430 | "824 articles for Amnesia, Retrograde\n", 431 | "30785 articles for Angina Pectoris\n", 432 | "1864 articles for Angina Pectoris, Variant\n", 433 | "8277 articles for Angina, Unstable\n", 434 | "933 articles for Anomia\n", 435 | "4130 articles for Anorexia\n", 436 | "3055 articles for Olfaction Disorders\n", 437 | "53502 articles for Anoxia\n", 438 | "8561 articles for Aphasia\n", 439 | "1439 articles for Aphasia, Broca\n", 440 | "825 articles for Aphasia, Wernicke\n", 441 | "270 articles for Aphonia\n", 442 | "6310 articles for Apnea\n", 443 | "2355 articles for Apraxias\n", 444 | "1544 articles for Articulation Disorders\n", 445 | "1439 articles for Asthenia\n", 446 | "6601 articles for Ataxia\n", 447 | "2930 articles for Ataxia Telangiectasia\n", 448 | "1284 articles for Athetosis\n", 449 | "1041 articles for Auditory Perceptual Disorders\n", 450 | "15025 articles for Back Pain\n", 451 | "33305 articles for Birth Weight\n", 452 | "6468 articles for Urinary Bladder, Neurogenic\n", 453 | "16978 articles for Blindness\n", 454 | "0 articles for Body Temperature Changes\n", 455 | "164793 articles for Body Weight\n", 456 | "5 articles for Body Weight Changes\n", 457 | "7382 articles for Brain Death\n", 458 | "4920 articles for Bulimia\n", 459 | "4023 articles for Cachexia\n", 460 | "5278 articles for Cardiac Output, Low\n", 461 | "2347 articles for Catalepsy\n", 462 | "767 articles for Cataplexy\n", 463 | "1977 articles for Catatonia\n", 464 | "564 articles for Causalgia\n", 465 | "3625 articles for Cerebellar Ataxia\n", 466 | "981 articles for Cerebrospinal Fluid Otorrhea\n", 467 | "2619 articles for Cerebrospinal Fluid Rhinorrhea\n", 468 | "9786 articles for Chest Pain\n", 469 | "625 articles for Cheyne-Stokes Respiration\n", 470 | "3746 articles for Chorea\n", 471 | "361 articles for Choroid Hemorrhage\n", 472 | "3597 articles for Colic\n", 473 | "3593 articles for Color Vision Defects\n", 474 | "10956 articles for Coma\n", 475 | "1687 articles for Communication Disorders\n", 476 | "3771 articles for Confusion\n", 477 | "2178 articles for Consciousness Disorders\n", 478 | "11088 articles for Constipation\n", 479 | "12559 articles for Cough\n", 480 | "609 articles for Cri-du-Chat Syndrome\n", 481 | "4077 articles for Cyanosis\n", 482 | "627 articles for De Lange Syndrome\n", 483 | "23208 articles for Deafness\n", 484 | "1750 articles for Hearing Loss, Sudden\n", 485 | "4708 articles for Decerebrate State\n", 486 | "6161 articles for Delirium\n", 487 | "39716 articles for Diarrhea\n", 488 | "6500 articles for Diarrhea, Infantile\n", 489 | "4370 articles for Diplopia\n", 490 | "11880 articles for Dizziness\n", 491 | "21187 articles for Down Syndrome\n", 492 | "1906 articles for Dysarthria\n", 493 | "288 articles for Dysgeusia\n", 494 | "6293 articles for Dyskinesia, Drug-Induced\n", 495 | "6411 articles for Dyslexia\n", 496 | "825 articles for Dyslexia, Acquired\n", 497 | "3213 articles for Dysmenorrhea\n", 498 | "7353 articles for Dyspepsia\n", 499 | "15978 articles for Dyspnea\n", 500 | "322 articles for Dyspnea, Paroxysmal\n", 501 | "6503 articles for Dystonia\n", 502 | "614 articles for Earache\n", 503 | "1075 articles for Ecchymosis\n", 504 | "211 articles for Echolalia\n", 505 | "33279 articles for Edema\n", 506 | "555 articles for Edema, Cardiac\n", 507 | "607 articles for Emaciation\n", 508 | "599 articles for Encopresis\n", 509 | "310 articles for Eructation\n", 510 | "817 articles for Eye Hemorrhage\n", 511 | "3146 articles for Eye Manifestations\n", 512 | "5163 articles for Facial Pain\n", 513 | "10449 articles for Facial Paralysis\n", 514 | "1785 articles for Failure to Thrive\n", 515 | "598 articles for Fasciculation\n", 516 | "20620 articles for Fatigue\n", 517 | "1172 articles for Mental Fatigue\n", 518 | "534 articles for Feminization\n", 519 | "2763 articles for Fetal Hypoxia\n", 520 | "2988 articles for Fetal Distress\n", 521 | "1878 articles for Fetal Macrosomia\n", 522 | "31658 articles for Fever\n", 523 | "3742 articles for Fever of Unknown Origin\n", 524 | "1233 articles for Flatulence\n", 525 | "1084 articles for Flushing\n", 526 | "4151 articles for Fragile X Syndrome\n", 527 | "416 articles for Gagging\n", 528 | "188 articles for Gerstmann Syndrome\n", 529 | "2562 articles for Gingival Hemorrhage\n", 530 | "251 articles for Glossalgia\n", 531 | "1128 articles for Halitosis\n", 532 | "9275 articles for Hallucinations\n", 533 | "22956 articles for Headache\n", 534 | "13453 articles for Hearing Disorders\n", 535 | "1745 articles for Hearing Loss, Bilateral\n", 536 | "481 articles for Hearing Loss, Central\n", 537 | "3022 articles for Hearing Loss, Conductive\n", 538 | "158 articles for Hearing Loss, Functional\n", 539 | "868 articles for Hearing Loss, High-Frequency\n", 540 | "6101 articles for Hearing Loss, Noise-Induced\n", 541 | "13197 articles for Hearing Loss, Sensorineural\n", 542 | "3035 articles for Heart Murmurs\n", 543 | "1707 articles for Heartburn\n", 544 | "2030 articles for Hematemesis\n", 545 | "2493 articles for Hemianopsia\n", 546 | "10439 articles for Hemiplegia\n", 547 | "1147 articles for Hemoglobinuria\n", 548 | "5102 articles for Hemoptysis\n", 549 | "1904 articles for Oral Hemorrhage\n", 550 | "939 articles for Hiccup\n", 551 | "3403 articles for Hirsutism\n", 552 | "1731 articles for Hoarseness\n", 553 | "1713 articles for Horner Syndrome\n", 554 | "9641 articles for Huntington Disease\n", 555 | "8030 articles for Hyperalgesia\n", 556 | "7386 articles for Hypercapnia\n", 557 | "1254 articles for Hyperemesis Gravidarum\n", 558 | "831 articles for Hyperesthesia\n", 559 | "3377 articles for Hypergammaglobulinemia\n", 560 | "3698 articles for Hyperkinesis\n", 561 | "2475 articles for Hyperphagia\n", 562 | "2742 articles for Disorders of Excessive Somnolence\n", 563 | "5160 articles for Hyperventilation\n", 564 | "2373 articles for Hypesthesia\n", 565 | "1260 articles for Hyphema\n", 566 | "4944 articles for Hypotension, Orthostatic\n", 567 | "8729 articles for Hypothermia\n", 568 | "1680 articles for Hypoventilation\n", 569 | "4413 articles for Illusions\n", 570 | "9308 articles for Sleep Initiation and Maintenance Disorders\n", 571 | "319 articles for Insulin Coma\n", 572 | "7081 articles for Intermittent Claudication\n", 573 | "9452 articles for Jaundice\n", 574 | "596 articles for Kearns-Sayre Syndrome\n", 575 | "916 articles for Menkes Kinky Hair Syndrome\n", 576 | "5025 articles for Language Development Disorders\n", 577 | "5692 articles for Language Disorders\n", 578 | "12767 articles for Learning Disorders\n", 579 | "1144 articles for Lesch-Nyhan Syndrome\n", 580 | "311 articles for Lipoid Proteinosis of Urbach and Wiethe\n", 581 | "15869 articles for Memory Disorders\n", 582 | "238 articles for Meningism\n", 583 | "47833 articles for Intellectual Disability\n", 584 | "781 articles for Monoclonal Gammopathy of Undetermined Significance\n", 585 | "2279 articles for Motion Sickness\n", 586 | "1119 articles for Mouth Breathing\n", 587 | "1948 articles for Muscle Cramp\n", 588 | "777 articles for Muscle Hypertonia\n", 589 | "2650 articles for Muscle Hypotonia\n", 590 | "1789 articles for Muscle Rigidity\n", 591 | "6967 articles for Muscle Spasticity\n", 592 | "8969 articles for Muscular Atrophy\n", 593 | "906 articles for Mutism\n", 594 | "4660 articles for Myoclonus\n", 595 | "1040 articles for Myotonia\n", 596 | "2842 articles for Narcolepsy\n", 597 | "13292 articles for Nausea\n", 598 | "9009 articles for Neuralgia\n", 599 | "7314 articles for Neurologic Manifestations\n", 600 | "1287 articles for Night Blindness\n", 601 | "132455 articles for Obesity\n", 602 | "12170 articles for Obesity, Morbid\n", 603 | "1037 articles for Oliguria\n", 604 | "7392 articles for Ophthalmoplegia\n", 605 | "4188 articles for Optical Illusions\n", 606 | "1484 articles for Oral Manifestations\n", 607 | "111258 articles for Pain\n", 608 | "5588 articles for Pain, Intractable\n", 609 | "28672 articles for Pain, Postoperative\n", 610 | "269 articles for Pallor\n", 611 | "18293 articles for Paralysis\n", 612 | "11424 articles for Paraplegia\n", 613 | "5291 articles for Paresis\n", 614 | "5248 articles for Paresthesia\n", 615 | "5499 articles for Perceptual Disorders\n", 616 | "1560 articles for Phantom Limb\n", 617 | "643 articles for Obesity Hypoventilation Syndrome\n", 618 | "1821 articles for Polyuria\n", 619 | "2356 articles for Prader-Willi Syndrome\n", 620 | "1165 articles for Presbycusis\n", 621 | "20966 articles for Proteinuria\n", 622 | "9095 articles for Pruritus\n", 623 | "366 articles for Pruritus Ani\n", 624 | "314 articles for Pruritus Vulvae\n", 625 | "3697 articles for Psychomotor Agitation\n", 626 | "4816 articles for Psychomotor Disorders\n", 627 | "17476 articles for Psychophysiologic Disorders\n", 628 | "766 articles for Pupil Disorders\n", 629 | "4655 articles for Purpura\n", 630 | "243 articles for Purpura, Hyperglobulinemic\n", 631 | "3611 articles for Purpura, Schoenlein-Henoch\n", 632 | "5798 articles for Purpura, Thrombocytopenic\n", 633 | "3792 articles for Purpura, Thrombotic Thrombocytopenic\n", 634 | "7154 articles for Quadriplegia\n", 635 | "393 articles for Hyperacusis\n", 636 | "4453 articles for Reflex, Abnormal\n", 637 | "1479 articles for Respiratory Paralysis\n", 638 | "7110 articles for Respiratory Sounds\n", 639 | "2744 articles for Restless Legs Syndrome\n", 640 | "4507 articles for Retinal Hemorrhage\n", 641 | "417 articles for Rubinstein-Taybi Syndrome\n", 642 | "4261 articles for Sciatica\n", 643 | "2517 articles for Scotoma\n", 644 | "42647 articles for Seizures\n", 645 | "4095 articles for Sensation Disorders\n", 646 | "0 articles for Signs and Symptoms, Digestive\n", 647 | "0 articles for Signs and Symptoms, Respiratory\n", 648 | "2723 articles for Skin Manifestations\n", 649 | "12373 articles for Sleep Apnea Syndromes\n", 650 | "7447 articles for Sleep Deprivation\n", 651 | "16262 articles for Sleep Disorders\n", 652 | "781 articles for Sneezing\n", 653 | "3366 articles for Snoring\n", 654 | "529 articles for Somnambulism\n", 655 | "6248 articles for Spasm\n", 656 | "10007 articles for Speech Disorders\n", 657 | "3113 articles for Stuttering\n", 658 | "1910 articles for Supranuclear Palsy, Progressive\n", 659 | "9376 articles for Syncope\n", 660 | "1377 articles for Taste Disorders\n", 661 | "2277 articles for Tetany\n", 662 | "3905 articles for Thinness\n", 663 | "1203 articles for Tinea Pedis\n", 664 | "6152 articles for Tinnitus\n", 665 | "2362 articles for Toothache\n", 666 | "3088 articles for Torticollis\n", 667 | "8147 articles for Tremor\n", 668 | "1300 articles for Trismus\n", 669 | "3711 articles for Unconsciousness\n", 670 | "18321 articles for Urinary Incontinence\n", 671 | "9276 articles for Urinary Incontinence, Stress\n", 672 | "8668 articles for Vertigo\n", 673 | "1893 articles for Virilism\n", 674 | "22639 articles for Vision Disorders\n", 675 | "1583 articles for Vitreous Hemorrhage\n", 676 | "5144 articles for Vocal Cord Paralysis\n", 677 | "4620 articles for Voice Disorders\n", 678 | "19894 articles for Vomiting\n", 679 | "221 articles for Vomiting, Anticipatory\n", 680 | "422 articles for Waterhouse-Friderichsen Syndrome\n", 681 | "349 articles for Wolfram Syndrome\n", 682 | "1908 articles for Hydrops Fetalis\n", 683 | "362 articles for Pyruvate Dehydrogenase Complex Deficiency Disease\n", 684 | "2343 articles for Vision, Low\n", 685 | "23709 articles for Weight Gain\n", 686 | "25680 articles for Weight Loss\n", 687 | "1947 articles for Rett Syndrome\n", 688 | "15002 articles for Abdominal Pain\n", 689 | "124 articles for Tonic Pupil\n", 690 | "265 articles for Anisocoria\n", 691 | "363 articles for Miosis\n", 692 | "514 articles for Mydriasis\n", 693 | "786 articles for Mucopolysaccharidosis II\n", 694 | "167 articles for Cardiac Output, High\n", 695 | "4732 articles for Purpura, Thrombocytopenic, Idiopathic\n", 696 | "762 articles for Hypocapnia\n", 697 | "1306 articles for Akathisia, Drug-Induced\n", 698 | "15043 articles for Low Back Pain\n", 699 | "441 articles for Ophthalmoplegia, Chronic Progressive External\n", 700 | "9826 articles for Pain Threshold\n", 701 | "897 articles for Microvascular Angina\n", 702 | "150 articles for Kleine-Levin Syndrome\n", 703 | "82 articles for WAGR Syndrome\n", 704 | "3734 articles for Pelvic Pain\n", 705 | "629 articles for Machado-Joseph Disease\n", 706 | "268 articles for Brown-Sequard Syndrome\n", 707 | "2572 articles for Persistent Vegetative State\n", 708 | "1046 articles for Hypokinesia\n", 709 | "189 articles for Space Motion Sickness\n", 710 | "2667 articles for Hyperoxia\n", 711 | "1228 articles for Gastroparesis\n", 712 | "22 articles for Sweating Sickness\n", 713 | "5442 articles for Arthralgia\n", 714 | "60 articles for Aphasia, Conduction\n", 715 | "409 articles for Aphasia, Primary Progressive\n", 716 | "11068 articles for Muscle Weakness\n", 717 | "1324 articles for Williams Syndrome\n", 718 | "318 articles for Cafe-au-Lait Spots\n", 719 | "1557 articles for Syncope, Vasovagal\n", 720 | "4803 articles for Neck Pain\n", 721 | "751 articles for Hemifacial Spasm\n", 722 | "457 articles for Blindness, Cortical\n", 723 | "2471 articles for Hot Flashes\n", 724 | "758 articles for Aging, Premature\n", 725 | "1678 articles for Pseudophakia\n", 726 | "148 articles for Schnitzler Syndrome\n", 727 | "129 articles for Neurobehavioral Manifestations\n", 728 | "3129 articles for Shoulder Pain\n", 729 | "498 articles for Neurogenic Inflammation\n", 730 | "20 articles for Chorea Gravidarum\n", 731 | "45 articles for Hypersomnolence, Idiopathic\n", 732 | "1410 articles for Sleep Disorders, Circadian Rhythm\n", 733 | "310 articles for Jet Lag Syndrome\n", 734 | "12292 articles for Sleep Apnea, Obstructive\n", 735 | "928 articles for Sleep Apnea, Central\n", 736 | "45 articles for Nocturnal Paroxysmal Dystonia\n", 737 | "112 articles for Night Terrors\n", 738 | "320 articles for Sleep Bruxism\n", 739 | "706 articles for REM Sleep Behavior Disorder\n", 740 | "103 articles for Sleep Paralysis\n", 741 | "493 articles for Nocturnal Myoclonus Syndrome\n", 742 | "94 articles for Coma, Post-Head Injury\n", 743 | "4298 articles for Gait Disorders, Neurologic\n", 744 | "394 articles for Gait Ataxia\n", 745 | "101 articles for Gait Apraxia\n", 746 | "316 articles for Amnesia, Transient Global\n", 747 | "75 articles for Alexia, Pure\n", 748 | "412 articles for Prosopagnosia\n", 749 | "179 articles for Apraxia, Ideomotor\n", 750 | "3000 articles for Postoperative Nausea and Vomiting\n", 751 | "225 articles for Alcohol Withdrawal Seizures\n", 752 | "671 articles for Tics\n", 753 | "221 articles for Amnesia, Anterograde\n", 754 | "608 articles for Paraparesis\n", 755 | "325 articles for Paraparesis, Spastic\n", 756 | "118 articles for Myokymia\n", 757 | "328 articles for Parasomnias\n", 758 | "1247 articles for Fetal Weight\n", 759 | "1610 articles for Spinocerebellar Ataxias\n", 760 | "245 articles for Amaurosis Fugax\n", 761 | "495 articles for Photophobia\n", 762 | "1384 articles for Dyskinesias\n", 763 | "95 articles for Pseudobulbar Palsy\n", 764 | "0 articles for Neuromuscular Manifestations\n", 765 | "991 articles for Somatosensory Disorders\n", 766 | "348 articles for Korsakoff Syndrome\n", 767 | "223 articles for Sleep Disorders, Intrinsic\n", 768 | "293 articles for Dyssomnias\n", 769 | "129 articles for Sleep Arousal Disorders\n", 770 | "123 articles for Sleep-Wake Transition Disorders\n", 771 | "73 articles for REM Sleep Parasomnias\n", 772 | "0 articles for Urological Manifestations\n", 773 | "428 articles for Flank Pain\n", 774 | "156 articles for Chills\n", 775 | "123 articles for Insomnia, Fatal Familial\n", 776 | "9079 articles for Hearing Loss\n", 777 | "142 articles for Metatarsalgia\n", 778 | "510 articles for Mental Retardation, X-Linked\n", 779 | "72 articles for Coffin-Lowry Syndrome\n", 780 | "2706 articles for Jaundice, Obstructive\n", 781 | "44 articles for Reticulocytosis\n", 782 | "436 articles for Hearing Loss, Unilateral\n", 783 | "222 articles for Hearing Loss, Mixed Conductive-Sensorineural\n", 784 | "110 articles for Synkinesis\n", 785 | "656 articles for Labor Pain\n", 786 | "116 articles for Morning Sickness\n", 787 | "13641 articles for Overweight\n", 788 | "2618 articles for Mobility Limitation\n", 789 | "687 articles for Neuralgia, Postherpetic\n", 790 | "91 articles for Glycogen Storage Disease Type IIb\n", 791 | "299 articles for Usher Syndromes\n", 792 | "475 articles for Nocturia\n", 793 | "224 articles for Dysuria\n", 794 | "2760 articles for Urinary Bladder, Overactive\n", 795 | "622 articles for Urinary Incontinence, Urge\n", 796 | "423 articles for Prostatism\n", 797 | "463 articles for Hypercalciuria\n", 798 | "166 articles for Opsoclonus-Myoclonus Syndrome\n", 799 | "74 articles for Urinoma\n", 800 | "231 articles for Pain, Referred\n", 801 | "97 articles for Stupor\n", 802 | "226 articles for Lethargy\n", 803 | "8724 articles for Acute Coronary Syndrome\n", 804 | "66 articles for Deaf-Blind Disorders\n", 805 | "165 articles for Livedo Reticularis\n", 806 | "121 articles for Mevalonate Kinase Deficiency\n", 807 | "48 articles for Systolic Murmurs\n", 808 | "99 articles for Classical Lissencephalies and Subcortical Band Heterotopias\n", 809 | "102 articles for Neuroacanthocytosis\n", 810 | "188 articles for Orthostatic Intolerance\n", 811 | "209 articles for Postural Orthostatic Tachycardia Syndrome\n", 812 | "154 articles for Failed Back Surgery Syndrome\n", 813 | "770 articles for Dysphonia\n", 814 | "136 articles for Purpura Fulminans\n", 815 | "1103 articles for Sarcopenia\n", 816 | "93 articles for Susac Syndrome\n", 817 | "57 articles for Piriformis Muscle Syndrome\n", 818 | "35 articles for Alien Hand Syndrome\n", 819 | "21 articles for Slit Ventricle Syndrome\n", 820 | "1770 articles for Obesity, Abdominal\n", 821 | "185 articles for Renal Colic\n", 822 | "166 articles for Ideal Body Weight\n", 823 | "65 articles for Primary Progressive Nonfluent Aphasia\n", 824 | "16 articles for Infantile Apparent Life-Threatening Event\n", 825 | "27 articles for Post-Exercise Hypotension\n", 826 | "67 articles for Striae Distensae\n", 827 | "293 articles for Eye Pain\n", 828 | "29 articles for Necrolytic Migratory Erythema\n", 829 | "280 articles for Nociceptive Pain\n", 830 | "32 articles for Transient Tachypnea of the Newborn\n", 831 | "65 articles for Tachypnea\n", 832 | "222 articles for Visceral Pain\n", 833 | "4756 articles for Chronic Pain\n", 834 | "1127 articles for Musculoskeletal Pain\n", 835 | "56 articles for Mastodynia\n", 836 | "46 articles for Pelvic Girdle Pain\n", 837 | "146 articles for Breakthrough Pain\n", 838 | "997 articles for Lower Urinary Tract Symptoms\n", 839 | "300 articles for Anhedonia\n", 840 | "65 articles for Polydipsia\n", 841 | "20 articles for Polydipsia, Psychogenic\n", 842 | "775 articles for Acute Pain\n", 843 | "542 articles for Angina, Stable\n", 844 | "18 articles for Ophthalmoplegic Migraine\n", 845 | "38 articles for Pudendal Neuralgia\n", 846 | "83 articles for Dyscalculia\n", 847 | "13 articles for Alice in Wonderland Syndrome\n", 848 | "441 articles for Prodromal Symptoms\n", 849 | "1354 articles for Pediatric Obesity\n", 850 | "327 articles for Myalgia\n", 851 | "12 articles for Hypertriglyceridemic Waist\n", 852 | "388 articles for Cerebrospinal Fluid Leak\n", 853 | "296 articles for Benign Paroxysmal Positional Vertigo\n", 854 | "15 articles for Hyperlactatemia\n", 855 | "0 articles for Allesthesia\n" 856 | ] 857 | } 858 | ], 859 | "source": [ 860 | "rows_out = list()\n", 861 | "\n", 862 | "for i, row in symptom_df.iterrows():\n", 863 | " term_query = '{symptom}[MeSH Terms:noexp]'.format(symptom = row.mesh_name.lower())\n", 864 | " payload = {'db': 'pubmed', 'term': term_query}\n", 865 | " pmids = eutility.esearch_query(payload, retmax = 5000, sleep=2)\n", 866 | " row['term_query'] = term_query\n", 867 | " row['n_articles'] = len(pmids)\n", 868 | " row['pubmed_ids'] = '|'.join(pmids)\n", 869 | " rows_out.append(row)\n", 870 | " print('{} articles for {}'.format(len(pmids), row.mesh_name))" 871 | ] 872 | }, 873 | { 874 | "cell_type": "code", 875 | "execution_count": 7, 876 | "metadata": { 877 | "collapsed": false, 878 | "jupyter": { 879 | "outputs_hidden": false 880 | } 881 | }, 882 | "outputs": [ 883 | { 884 | "data": { 885 | "text/html": [ 886 | "
\n", 887 | "\n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | "
mesh_idmesh_namein_hsdnterm_queryn_articlespubmed_ids
0D000006Abdomen, Acute1abdomen, acute[MeSH Terms:noexp]849625742249|25669229|25650451|25619050|25608417|2...
1D000270Adie Syndrome0adie syndrome[MeSH Terms:noexp]31325138821|24995781|24625775|24533698|24215593|2...
2D000326Adrenoleukodystrophy0adrenoleukodystrophy[MeSH Terms:noexp]150825860611|25583825|25393703|25378668|25297370|2...
3D000334Aerophagy1aerophagy[MeSH Terms:noexp]26125073665|24796405|24280810|23772202|23772201|2...
4D000370Ageusia1ageusia[MeSH Terms:noexp]22224999669|24999665|24825557|24782205|24191925|2...
\n", 947 | "
" 948 | ], 949 | "text/plain": [ 950 | " mesh_id mesh_name in_hsdn \\\n", 951 | "0 D000006 Abdomen, Acute 1 \n", 952 | "1 D000270 Adie Syndrome 0 \n", 953 | "2 D000326 Adrenoleukodystrophy 0 \n", 954 | "3 D000334 Aerophagy 1 \n", 955 | "4 D000370 Ageusia 1 \n", 956 | "\n", 957 | " term_query n_articles \\\n", 958 | "0 abdomen, acute[MeSH Terms:noexp] 8496 \n", 959 | "1 adie syndrome[MeSH Terms:noexp] 313 \n", 960 | "2 adrenoleukodystrophy[MeSH Terms:noexp] 1508 \n", 961 | "3 aerophagy[MeSH Terms:noexp] 261 \n", 962 | "4 ageusia[MeSH Terms:noexp] 222 \n", 963 | "\n", 964 | " pubmed_ids \n", 965 | "0 25742249|25669229|25650451|25619050|25608417|2... \n", 966 | "1 25138821|24995781|24625775|24533698|24215593|2... \n", 967 | "2 25860611|25583825|25393703|25378668|25297370|2... \n", 968 | "3 25073665|24796405|24280810|23772202|23772201|2... \n", 969 | "4 24999669|24999665|24825557|24782205|24191925|2... " 970 | ] 971 | }, 972 | "execution_count": 7, 973 | "metadata": {}, 974 | "output_type": "execute_result" 975 | } 976 | ], 977 | "source": [ 978 | "symptom_pmids_df = pandas.DataFrame(rows_out)\n", 979 | "\n", 980 | "with gzip.open('data/symptom-pmids.tsv.gz', 'w') as write_file:\n", 981 | " write_file = io.TextIOWrapper(write_file)\n", 982 | " symptom_pmids_df.to_csv(write_file, sep='\\t', index=False)\n", 983 | "\n", 984 | "symptom_pmids_df.head()" 985 | ] 986 | }, 987 | { 988 | "cell_type": "markdown", 989 | "metadata": {}, 990 | "source": [ 991 | "# Cooccurrence" 992 | ] 993 | }, 994 | { 995 | "cell_type": "code", 996 | "execution_count": 8, 997 | "metadata": { 998 | "collapsed": false, 999 | "jupyter": { 1000 | "outputs_hidden": false 1001 | } 1002 | }, 1003 | "outputs": [], 1004 | "source": [ 1005 | "symptom_df, symptom_to_pmids = cooccurrence.read_pmids_tsv('data/symptom-pmids.tsv.gz', key='mesh_id')\n", 1006 | "disease_df, disease_to_pmids = cooccurrence.read_pmids_tsv('data/disease-pmids.tsv.gz', key='doid_code')" 1007 | ] 1008 | }, 1009 | { 1010 | "cell_type": "code", 1011 | "execution_count": 9, 1012 | "metadata": { 1013 | "collapsed": false, 1014 | "jupyter": { 1015 | "outputs_hidden": false 1016 | } 1017 | }, 1018 | "outputs": [ 1019 | { 1020 | "data": { 1021 | "text/plain": [ 1022 | "1759475" 1023 | ] 1024 | }, 1025 | "execution_count": 9, 1026 | "metadata": {}, 1027 | "output_type": "execute_result" 1028 | } 1029 | ], 1030 | "source": [ 1031 | "symptom_pmids = set.union(*symptom_to_pmids.values())\n", 1032 | "len(symptom_pmids)" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": 10, 1038 | "metadata": { 1039 | "collapsed": false, 1040 | "jupyter": { 1041 | "outputs_hidden": false 1042 | } 1043 | }, 1044 | "outputs": [ 1045 | { 1046 | "data": { 1047 | "text/plain": [ 1048 | "3478558" 1049 | ] 1050 | }, 1051 | "execution_count": 10, 1052 | "metadata": {}, 1053 | "output_type": "execute_result" 1054 | } 1055 | ], 1056 | "source": [ 1057 | "disease_pmids = set.union(*disease_to_pmids.values())\n", 1058 | "len(disease_pmids)" 1059 | ] 1060 | }, 1061 | { 1062 | "cell_type": "code", 1063 | "execution_count": 11, 1064 | "metadata": { 1065 | "collapsed": false, 1066 | "jupyter": { 1067 | "outputs_hidden": false 1068 | } 1069 | }, 1070 | "outputs": [ 1071 | { 1072 | "name": "stdout", 1073 | "output_type": "stream", 1074 | "text": [ 1075 | "Total articles containing a doid_code: 3478558\n", 1076 | "Total articles containing a mesh_id: 1759475\n", 1077 | "Total articles containing both a doid_code and mesh_id: 363928\n", 1078 | "\n", 1079 | "After removing terms without any cooccurences:\n", 1080 | "+ 133 doid_codes remain\n", 1081 | "+ 426 mesh_ids remain\n", 1082 | "\n", 1083 | "Cooccurrence scores calculated for 56658 doid_code -- mesh_id pairs\n" 1084 | ] 1085 | } 1086 | ], 1087 | "source": [ 1088 | "cooc_df = cooccurrence.score_pmid_cooccurrence(disease_to_pmids, symptom_to_pmids, 'doid_code', 'mesh_id')" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "code", 1093 | "execution_count": 12, 1094 | "metadata": { 1095 | "collapsed": false, 1096 | "jupyter": { 1097 | "outputs_hidden": false 1098 | } 1099 | }, 1100 | "outputs": [ 1101 | { 1102 | "data": { 1103 | "text/html": [ 1104 | "
\n", 1105 | "\n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | "
doid_codedoid_namemesh_idmesh_namecooccurrenceexpectedenrichmentodds_ratiop_fisher
30318DOID:10652Alzheimer's diseaseD004314Down Syndrome80035.61960122.45954439.9183520.000000e+00
30408DOID:10652Alzheimer's diseaseD008569Memory Disorders159376.58053220.80163141.8858770.000000e+00
30452DOID:10652Alzheimer's diseaseD011595Psychomotor Agitation33415.23566521.92224735.2773290.000000e+00
30257DOID:10652Alzheimer's diseaseD000647Amnesia30714.06121521.83310634.8900994.277452e-314
30381DOID:10652Alzheimer's diseaseD006816Huntington Disease25512.13061421.02119532.6300358.215868e-256
\n", 1183 | "
" 1184 | ], 1185 | "text/plain": [ 1186 | " doid_code doid_name mesh_id mesh_name \\\n", 1187 | "30318 DOID:10652 Alzheimer's disease D004314 Down Syndrome \n", 1188 | "30408 DOID:10652 Alzheimer's disease D008569 Memory Disorders \n", 1189 | "30452 DOID:10652 Alzheimer's disease D011595 Psychomotor Agitation \n", 1190 | "30257 DOID:10652 Alzheimer's disease D000647 Amnesia \n", 1191 | "30381 DOID:10652 Alzheimer's disease D006816 Huntington Disease \n", 1192 | "\n", 1193 | " cooccurrence expected enrichment odds_ratio p_fisher \n", 1194 | "30318 800 35.619601 22.459544 39.918352 0.000000e+00 \n", 1195 | "30408 1593 76.580532 20.801631 41.885877 0.000000e+00 \n", 1196 | "30452 334 15.235665 21.922247 35.277329 0.000000e+00 \n", 1197 | "30257 307 14.061215 21.833106 34.890099 4.277452e-314 \n", 1198 | "30381 255 12.130614 21.021195 32.630035 8.215868e-256 " 1199 | ] 1200 | }, 1201 | "execution_count": 12, 1202 | "metadata": {}, 1203 | "output_type": "execute_result" 1204 | } 1205 | ], 1206 | "source": [ 1207 | "cooc_df = symptom_df[['mesh_id', 'mesh_name']].drop_duplicates().merge(cooc_df)\n", 1208 | "cooc_df = disease_df[['doid_code', 'doid_name']].drop_duplicates().merge(cooc_df)\n", 1209 | "cooc_df = cooc_df.sort_values(by=['doid_name', 'p_fisher'])\n", 1210 | "cooc_df.to_csv('data/disease-symptom-cooccurrence.tsv', index=False, sep='\\t')\n", 1211 | "cooc_df.head()" 1212 | ] 1213 | }, 1214 | { 1215 | "cell_type": "markdown", 1216 | "metadata": {}, 1217 | "source": [ 1218 | "## Visualization" 1219 | ] 1220 | }, 1221 | { 1222 | "cell_type": "code", 1223 | "execution_count": 13, 1224 | "metadata": { 1225 | "collapsed": false, 1226 | "jupyter": { 1227 | "outputs_hidden": false 1228 | } 1229 | }, 1230 | "outputs": [], 1231 | "source": [ 1232 | "import numpy\n", 1233 | "import scipy\n", 1234 | "import seaborn\n", 1235 | "import matplotlib.pyplot as plt\n", 1236 | "\n", 1237 | "%matplotlib inline" 1238 | ] 1239 | }, 1240 | { 1241 | "cell_type": "code", 1242 | "execution_count": 14, 1243 | "metadata": { 1244 | "collapsed": false, 1245 | "jupyter": { 1246 | "outputs_hidden": false 1247 | } 1248 | }, 1249 | "outputs": [ 1250 | { 1251 | "data": { 1252 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAECCAYAAADq7fyyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFGxJREFUeJzt3X+MZWV9x/H3guwuyw6j0FmwZVlSlG82bbBC1ZYafqQK\n0h/SmCZNY20lFSQSXJMKqatd+2Mpptu1EX+lsipqbCUQadcSFUsN4MZIaa2Vil9BdHdICI4z4+wO\nyzq7O9M/7h2ZZe6P+XHnnnuffb+SDXOfc+bcLzN3Pvfc5znneVbNzMwgSSrHCVUXIEnqLINdkgpj\nsEtSYQx2SSqMwS5JhTHYJakwL2i1MSJOBG4DzgNmgOuAnwK3A9PAI8D1mTkTEdcA1wJHgO2Zec8K\n1i1JaqLdGfvvANOZ+WrgPcDfAjuBrZl5MbAKuCoizgRuAC4CrgBuiYjVK1e2JKmZlsGemf8KvLX+\n8BxgHLgwMx+ot30ReA3wCmBPZh7OzP3A48D5K1KxJKmltn3smXk0Im4HPgB8ltpZ+qwDwCBwKjDR\noF2S1GULGjzNzDcDAewC1s7ZdCrwE2A/MDCnfYDa2b0kqcvaDZ6+CTgrM28BngWOAg9HxCWZeT9w\nJXAf8BBwc0SsoRb8m6kNrDY1MzMzs2rVqla7SJLmaxucq1pNAhYRJ1O7AuZM4CTgFuC71K6UWQ18\nB7imflXMW6hdFXMCcHNm3t3muWdGRg4s4P+hWkNDA1hn51hn5/RDjWCdnTY0NNA22FuesWfms8Af\nNNh0aYN9d1HrqpEkVcgblCSpMAa7JBXGYJekwrTsY9d8U1NTDA/vbbp948ZNrF7tTbeSqmOwL9Lw\n8F627NjNusEN87YdnPgRH7jx9Zx77ksrqEySagz2JVg3uIH1L/qFqsuQpIbsY5ekwhjsklQYu2Ka\nmB0kHR9fz9jY5M/a9+1rPnAqSb3AYG+i2SDp6JOPcvpZmyuqSpLaM9hbaDRIenDi6YqqkaSFsY9d\nkgpjsEtSYeyK6YJWd6t6p6qkTjPYu6DZQKx3qkpaCQZ7l3i3qqRusY9dkgpjsEtSYQx2SSqMwS5J\nhTHYJakwBrskFcZgl6TCGOySVBiDXZIKY7BLUmEMdkkqjMEuSYVxErAOmj56pOGaqK6TKqmbWgZ7\nRJwEfALYBKwBtgNPAv8GfK++20cy886IuAa4FjgCbM/Me1as6h51aHKUnXeMsW7wqWPaXSdVUje1\nO2N/IzCSmW+KiBcB3wL+CtiZme+f3SkizgRuAC4ETga+FhFfycypFaq7Z7lOqqSqtQv2O4G76l+f\nABymFt4REVcBjwHvAF4J7MnMw8DhiHgcOB94eEWqliQ11XLwNDOfyczJiBigFvLvBh4C3pmZlwBP\nAO8FBoCJOd96ABhcmZIlSa20vSomIjYC/wF8OjM/B9ydmd+sb74beDmwn1q4zxoAxjtcqyRpAdoN\nnp4B3Au8LTO/Wm/+UkS8PTP/E3gNte6Wh4CbI2INsBbYDDzS7smHhgba7VKZ8fH1K/4c00ePMDEx\n0vS5zjnnnEUtdN3LP8+5rLNz+qFGsM5ua9fHvpVal8q2iNhWb3sH8A8RcRh4Cri23l1zK/AgtU8B\nWxcycDoycmDpla+wsbHJFX+OQ5OjbPvY11k3+P152ybHn+LGP7yAs8/eNG/bxo2b5gX+0NBAT/88\nZ1ln5/RDjWCdnbaQN5+WwZ6ZW4AtDTa9usG+u4BdCy1ONc0WuT448TQ77/jWvEsnD078iA/c+HrO\nPfel3SpRUp/xBqUe1iz0JamV4z7Yp6amGB72blFJ5Tjug314eC9bduxm3eCGY9q9W1RSvzrugx28\nW1RSWZzdUZIKY7BLUmEMdkkqjMEuSYUx2CWpMAa7JBXGYJekwhjsklQYg12SCmOwS1JhDHZJKozB\nLkmFcRKwPjN99EjDKYXHx9dzyimnL2opPUllMtj7zKHJUXbeMebKSpKaMtj7kCsrSWrFPnZJKozB\nLkmFMdglqTD2sRei2dUyszZu3OQVM9Jx4rgI9qmpKYaHG4deqzDsJ82ulgGvmJGON8dFsA8P72XL\njt2sG9wwb9vok49y+lmbK6iq87xaRhIcJ8EOzUPv4MTTFVQjSSvHwVNJKozBLkmFMdglqTAGuyQV\npuXgaUScBHwC2ASsAbYDjwK3A9PAI8D1mTkTEdcA1wJHgO2Zec8K1i1JaqLdGfsbgZHMvBh4HfBh\nYCewtd62CrgqIs4EbgAuAq4AbokI74aRpAq0u9zxTuCu+tcnAIeBCzLzgXrbF4HLgaPAnsw8DByO\niMeB84GHO1+yJKmVlsGemc8ARMQAtZB/D/D3c3Y5AAwCpwITDdolSV3W9galiNgIfB74cGb+c0T8\n3ZzNpwI/AfYDA3PaB4DxdsceGhpot0tHjI+v78rz9Krpo0eYmBhp+HM455xzuj6HTLd+78vVD3X2\nQ41gnd3WbvD0DOBe4G2Z+dV68zcj4pLMvB+4ErgPeAi4OSLWAGuBzdQGVlsaGTmwnNoXbGxssivP\n06sOTY6y7WNfZ93g949pr2IOmaGhga793pejH+rshxrBOjttIW8+7c7Yt1LrUtkWEdvqbVuAW+uD\no98B7qpfFXMr8CC1vvitmTm15MrVcc4jIx0/2vWxb6EW5M93aYN9dwG7OlOWJGmpvEFJkgpjsEtS\nYQx2SSqMwS5JhTHYJakwBrskFcZgl6TCGOySVBiDXZIKY7BLUmEMdkkqjMEuSYUx2CWpMAa7JBWm\n7QpK/WRqaorh4b3z2vftm98mSaUqKtiHh/eyZcdu1g1uOKZ99MlHOf2szRVVJUndVVSwQ+OVgg5O\nPF1RNZLUffaxS1Jhijtj18JNHz3SdPxh48ZNrF69ussVSeoEg/04dmhylJ13jLFu8Klj2g9O/IgP\n3Ph6zj33pRVVJmk5DPbjXKMxiVZn8uDZvNTrDHbN0+xMHjybl/qBwa6GGp3JS+oPXhUjSYUx2CWp\nMAa7JBXGYJekwhjsklQYg12SCrOgyx0j4lXA+zLzsoh4OfAF4LH65o9k5p0RcQ1wLXAE2J6Z96xI\nxZKkltoGe0TcBPwRMFlvuhB4f2a+f84+ZwI31LedDHwtIr6SmVOdL1mS1MpCztgfB94AfKb++ELg\nvIi4itpZ+zuAVwJ7MvMwcDgiHgfOBx7ufMmSpFba9rFn5uepda/M+gbwzsy8BHgCeC8wAEzM2ecA\nMNjBOiVJC7SUKQXuzszZEL8b+CDwALVwnzUAjLc70NDQQLtdFmV8fH1Hj6fGTjtt/bJ+d53+va+U\nfqizH2oE6+y2pQT7lyLi7Zn5n8BrqHW3PATcHBFrgLXAZuCRdgcaGTmwhKdvbmxssv1OWraxsckl\n/+6GhgY6/ntfCf1QZz/UCNbZaQt581lMsM/U/3sd8OGIOAw8BVybmZMRcSvwILXuna0OnEpSNRYU\n7Jn5Q+Ci+tffAl7dYJ9dwK5OFidJWjxvUJKkwhjsklQYg12SCmOwS1JhXBpPK2pqaorh4ecWxh4f\nX3/MZakujC11nsGuFTU8vJctO3azbnDDvG0ujC2tDINdK86FsaXuso9dkgpjsEtSYQx2SSqMfezq\niOdf/TJr3775bZJWlsGujmh29cvok49y+lmbK6pKOj4Z7OqYRle/HJx4uqJqpOOXfeySVBiDXZIK\nY1eMFmX66JGGA6IOkkq9w2DXohyaHGXnHWOsG3zqmHYHSaXeYbBr0RwklXqbfeySVBiDXZIKY1eM\nKtNsIBacp11aDoNdlWk2EOs87dLyGOyqlHO1S51nH7skFcZgl6TCGOySVBiDXZIKY7BLUmEMdkkq\nzIIud4yIVwHvy8zLIuIlwO3ANPAIcH1mzkTENcC1wBFge2bes0I1S5JaaHvGHhE3AbcBa+pN7we2\nZubFwCrgqog4E7gBuAi4ArglIrxtUJIqsJCumMeBN1ALcYALMvOB+tdfBF4DvALYk5mHM3N//XvO\n73SxkqT22gZ7Zn6eWvfKrFVzvj4ADAKnAhMN2iVJXbaUKQWm53x9KvATYD8wMKd9ABhvd6ChoYF2\nuyzK+Pj6jh5P1TnttPUdf30sVtXPvxD9UCNYZ7ctJdi/GRGXZOb9wJXAfcBDwM0RsQZYC2ymNrDa\n0sjIgSU8fXNjY5MdPZ6qMzY22fHXx2IMDQ1U+vwL0Q81gnV22kLefBYT7DP1//4ZcFt9cPQ7wF31\nq2JuBR6k1r2zNTOnFlnvgk1NTTE87LqbktTIgoI9M39I7YoXMvMx4NIG++wCdnWwtqaGh/eyZcdu\n1g1uOKbddTclqY+n7XXdTUlqrG+DXeVqtbISuLqS1I7Brp7TbGUlcHUlaSEMdvUkV1aSls5JwCSp\nMAa7JBXGYJekwhjsklQYg12SCmOwS1JhDHZJKozBLkmFMdglqTAGuyQVxmCXpMIY7JJUGINdkgpj\nsEtSYZy2V32l1SIcLsAh1Rjs6ivNFuFwAQ7pOQa7+o6LcEit2ccuSYXxjF1FsO9deo7BriLY9y49\nx2BXMex7l2rsY5ekwnjGrqK16nsH+99VJoNdRWvW9w72v6tcBruKZ9+7jjdLDvaI+G9gov7wCeAW\n4HZgGngEuD4zZ5ZboCRpcZYU7BGxFiAzL5vTthvYmpkPRMRHgauAf+lIlZKkBVvqGfvLgHUR8eX6\nMd4NXJCZD9S3fxG4HINdkrpuqZc7PgPsyMwrgOuAzz5v+yQwuJzCJElLs9Rg/x71MM/Mx4BR4Iw5\n2weAnyyvNEnSUiy1K+Zq4Hzg+oj4eWpBfm9EXJKZ9wNXAve1O8jQ0MCSnnx8fP2Svk96vtNOW9/0\ndbjU12c39UONYJ3dttRg/zjwyYiY7VO/mtpZ+20RsRr4DnBXu4OMjBxY0pOPjU0u6fuk5xsbm2z4\nOhwaGljy67Nb+qFGsM5OW8ibz5KCPTOPAG9qsOnSpRyvkampKYaHG98x2OpOQmmhWt2VOjj4y12u\nRuqcnr1BaXh4L1t27Gbd4IZ520affJTTz9pcQVUqSasZIT9zy3pe9KIXV1SZtDw9G+zQ/I7BgxNP\nV1CNSuRdqSpRTwe71EtadQ+CE4qpdxjs0gK16h50QjH1EoNdWgS7btQPDHbpeaaPHuEHP/jBvMtq\nvRpL/cJgl57n0OQo2z729XldLl6NpX5hsEsNNOpy8Wos9QvXPJWkwhjsklQYg12SCmMfu9QBread\n8cYldZvBLnVAq3lnvHFJ3WawSx3izUvqFQa7tILsolEVDHZpBdlFoyoY7NIKa9RF0+pMHjyb1/IY\n7FIFmp3Jg2fzWj6DXaqIg61aKd6gJEmFMdglqTB2xUg9xksktVwGu9Rjmg2sTo4/xY1/eAFnn70J\ngPHx9T9bDMTA11yVB/uX//2rfOUbj81r3z8+AieeXUFFUvWazQe/845veU282qo82EdGxxg78SXz\n2idPPLmCaqTe5pU0WojKg13SypmammJ42P76443BLhVseHgvW3bsnrd+q903ZTPYpT7X6iqaffv2\nLnpKA8/k+5/BLvW5VtMTjD75KKeftXnB3+OZfBk6GuwRcQLwEeB84KfAWzLz+518DknzNRtUPTjx\n9KK/R/2v02fsvweszsyLIuJVwM56m6Q+0MlZJ2cHbudeb7/UY2lxOh3svwF8CSAzvxERv9rh40ta\nQZ2cdbLZwC3Mv9lqLgN/+Tod7KcC++c8PhoRJ2TmdIefR9IKadZF0+xs/vDhwwCcdNJJx7Q3G7iF\n7txs1epSz0Y1z36y6NQbS5WXmnY62PcDA3Metw311S84kenRb89rn574MYdOeGHD73n2wBiwatnt\nHqu/jlX18x/vxxp/6jG23/Zd1q4/7Zj2iaefYM0pL2zY/sIXn9e0rpMHTm+4rVVX0GLs27eX7bd9\nZV5ds7U1qvnQ5Bjvuea1DT9JdOr5D02O8bG/ecuKDlCvmpmZ6djBIuINwO9m5tUR8WvAX2Tmb3fs\nCSRJbXX6jP1u4LURsaf++OoOH1+S1EZHz9glSdVzoQ1JKozBLkmFMdglqTAGuyQVpuuTgPXbfDL1\nqRHel5mXVV1LIxFxEvAJYBOwBtiemV+otqpjRcSJwG3AecAMcF1m/l+1VTUXERuA/wJ+MzO/V3U9\njUTEfwMT9YdPZOafVllPMxHxLuB3gZOAD2XmpyouaZ6I+BPgzfWHJwMvA87IzP1Nv6kC9ezcRe3v\naBq4JjOz0b5VnLH/bD4Z4M+pzSfTkyLiJmqBtKbqWlp4IzCSmRcDrwM+VHE9jfwOMJ2ZrwbeA9xc\ncT1N1d8o/xF4pupamomItQCZeVn9X6+G+qXAr9f/1i8FfrHSgprIzE/N/iyBh4Ebei3U6y4HTqn/\nHf01Lf6Oqgj2Y+aTAXp5PpnHgTfQ7Fa83nAnsK3+9QnAkQpraSgz/xV4a/3hOcB4ddW0tQP4KDB/\nspTe8TJgXUR8OSLuq3+q7EWXA9+OiH8BvgDsrrielupzW/1SZu6qupYmngUGI2IVMAhMNduximBv\nOJ9MBXW0lZmfpweDcq7MfCYzJyNigFrIv7vqmhrJzKMRcTtwK/BPFZfTUES8mdqnn3vrTb36hv4M\nsCMzrwCuAz7bo39DQ8CFwO9Tr7PactraCvxl1UW0sAdYC3yX2qfKDzbbsYoXw6Lnk1FrEbER+A/g\n05n5uarraSYz30ytf/C2iOjF1cqvpnbn9FeBXwE+FRFnVFxTI9+jHpKZ+RgwCry40ooa+zFwb2Ye\nqY9VHIqIn6u6qEYi4oXAeZl5f9W1tHATsCczg+denw1nEqsi2PcAvwVQn0/mfyuooRj14LkXuCkz\nb6+4nIYi4k31QTSofZycrv/rKZl5SWZeWu9r/R/gjzOz+UoV1bma+thURPw8tU/Bvdh19DVq4z6z\ndZ5C7U2oF10M3Fd1EW2cwnO9HePUBqRPbLRjFUvj9eN8Mr0878JWav1t2yJitq/9ysw8VGFNz3cX\ncHtE3E/txbglM39acU397OPAJyPigfrjq3vxU29m3hMRF0fEQ9ROIt+Wmb36t3Qe0LNX59XtoPZ7\nf5Da39G7MvPZRjs6V4wkFaYXB1wkSctgsEtSYQx2SSqMwS5JhTHYJakwBrskFcZgl6TCGOySVJj/\nBwG0X6gKM3flAAAAAElFTkSuQmCC\n", 1253 | "text/plain": [ 1254 | "" 1255 | ] 1256 | }, 1257 | "metadata": {}, 1258 | "output_type": "display_data" 1259 | } 1260 | ], 1261 | "source": [ 1262 | "sig_df = cooc_df[cooc_df.p_fisher < 0.05]\n", 1263 | "plt.hist(list(numpy.log(sig_df.enrichment)), bins = 50);" 1264 | ] 1265 | } 1266 | ], 1267 | "metadata": { 1268 | "kernelspec": { 1269 | "display_name": "Python 3", 1270 | "language": "python", 1271 | "name": "python3" 1272 | }, 1273 | "language_info": { 1274 | "codemirror_mode": { 1275 | "name": "ipython", 1276 | "version": 3 1277 | }, 1278 | "file_extension": ".py", 1279 | "mimetype": "text/x-python", 1280 | "name": "python", 1281 | "nbconvert_exporter": "python", 1282 | "pygments_lexer": "ipython3", 1283 | "version": "3.9.2" 1284 | } 1285 | }, 1286 | "nbformat": 4, 1287 | "nbformat_minor": 4 1288 | } 1289 | -------------------------------------------------------------------------------- /tissues.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Compute anatomy-disease cooccurrence for Hetionet" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "collapsed": true, 15 | "jupyter": { 16 | "outputs_hidden": true 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import io\n", 22 | "import gzip\n", 23 | "\n", 24 | "import pandas\n", 25 | "import requests\n", 26 | "import networkx\n", 27 | "\n", 28 | "import eutility\n", 29 | "import cooccurrence" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "# Tissues" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 2, 42 | "metadata": { 43 | "collapsed": false, 44 | "jupyter": { 45 | "outputs_hidden": false 46 | } 47 | }, 48 | "outputs": [ 49 | { 50 | "data": { 51 | "text/html": [ 52 | "
\n", 53 | "\n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | "
uberon_iduberon_namemesh_idmesh_name
0UBERON:0001716secondary palateD010159Palate
1UBERON:0001908optic tractD014795Visual Pathways
2UBERON:0002286third ventricleD020542Third Ventricle
3UBERON:0002349myocardiumD009206Myocardium
4UBERON:0000978legD035002Lower Extremity
\n", 101 | "
" 102 | ], 103 | "text/plain": [ 104 | " uberon_id uberon_name mesh_id mesh_name\n", 105 | "0 UBERON:0001716 secondary palate D010159 Palate\n", 106 | "1 UBERON:0001908 optic tract D014795 Visual Pathways\n", 107 | "2 UBERON:0002286 third ventricle D020542 Third Ventricle\n", 108 | "3 UBERON:0002349 myocardium D009206 Myocardium\n", 109 | "4 UBERON:0000978 leg D035002 Lower Extremity" 110 | ] 111 | }, 112 | "execution_count": 2, 113 | "metadata": {}, 114 | "output_type": "execute_result" 115 | } 116 | ], 117 | "source": [ 118 | "# Read MeSH UBERON Anatomical structures\n", 119 | "url = 'https://raw.githubusercontent.com/dhimmel/uberon/86a9b754871e5ce7d91d2ef15bcc8f6a0ef6cda1/data/hetio-slim.tsv'\n", 120 | "uberon_df = pandas.read_table(url)\n", 121 | "uberon_df.head()" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 3, 127 | "metadata": { 128 | "collapsed": false, 129 | "jupyter": { 130 | "outputs_hidden": false 131 | } 132 | }, 133 | "outputs": [ 134 | { 135 | "name": "stdout", 136 | "output_type": "stream", 137 | "text": [ 138 | "9284 articles for Palate\n", 139 | "15786 articles for Visual Pathways\n", 140 | "1359 articles for Third Ventricle\n", 141 | "139687 articles for Myocardium\n", 142 | "8596 articles for Lower Extremity\n", 143 | "39066 articles for Cerebellum\n", 144 | "2471 articles for Arachnoid\n", 145 | "382125 articles for Liver\n", 146 | "3960 articles for Dermis\n", 147 | "3295 articles for Sweat\n", 148 | "16125 articles for Optic Nerve\n", 149 | "13305 articles for Gallbladder\n", 150 | "11676 articles for Parotid Gland\n", 151 | "265 articles for Manubrium\n", 152 | "6438 articles for Vena Cava, Superior\n", 153 | "47991 articles for Arteries\n", 154 | "26790 articles for Arm\n", 155 | "28944 articles for Aorta, Thoracic\n", 156 | "60440 articles for Pancreas\n", 157 | "16194 articles for Mesencephalon\n", 158 | "9469 articles for Common Bile Duct\n", 159 | "4456 articles for Choroid Plexus\n", 160 | "5769 articles for Nails\n", 161 | "13016 articles for Joints\n", 162 | "399 articles for Bulbourethral Glands\n", 163 | "158666 articles for Skin\n", 164 | "530 articles for Incus\n", 165 | "14579 articles for Forearm\n", 166 | "8129 articles for Trigeminal Nerve\n", 167 | "1167 articles for Axillary Vein\n", 168 | "3943 articles for Peroneal Nerve\n", 169 | "465 articles for Stapedius\n", 170 | "20418 articles for Vagus Nerve\n", 171 | "24984 articles for Femoral Artery\n", 172 | "7003 articles for Ligaments\n", 173 | "19976 articles for Extremities\n", 174 | "7780 articles for Thumb\n", 175 | "32275 articles for Trachea\n", 176 | "4117 articles for Subclavian Vein\n", 177 | "11826 articles for Iris\n", 178 | "5612 articles for Epiphyses\n", 179 | "5318 articles for Hemolymph\n", 180 | "13982 articles for Wrist\n", 181 | "2056 articles for Sense Organs\n", 182 | "3271 articles for Sweat Glands\n", 183 | "2276 articles for Axillary Artery\n", 184 | "6069 articles for Basilar Artery\n", 185 | "12065 articles for Lymphoid Tissue\n", 186 | "16279 articles for Medulla Oblongata\n", 187 | "7256 articles for Gonads\n", 188 | "15748 articles for Penis\n", 189 | "33282 articles for Heart Atria\n", 190 | "1138 articles for Basilar Membrane\n", 191 | "212 articles for Zona Reticularis\n", 192 | "7331 articles for Femoral Vein\n", 193 | "15273 articles for Tongue\n", 194 | "9959 articles for Tunica Intima\n", 195 | "7380 articles for Seminal Vesicles\n", 196 | "17376 articles for Mouth\n", 197 | "16444 articles for Thoracic Vertebrae\n", 198 | "3420 articles for Semicircular Canals\n", 199 | "754 articles for Ulnar Artery\n", 200 | "1692 articles for Cranial Sutures\n", 201 | "12999 articles for Carotid Artery, Internal\n", 202 | "17868 articles for Cardiovascular System\n", 203 | "8207 articles for Vertebral Artery\n", 204 | "7214 articles for Papillary Muscles\n", 205 | "1566 articles for Palate, Hard\n", 206 | "11061 articles for Ankle Joint\n", 207 | "1838 articles for Ethmoid Bone\n", 208 | "4409 articles for Locus Coeruleus\n", 209 | "415 articles for Malleus\n", 210 | "138163 articles for Muscles\n", 211 | "24440 articles for Maxilla\n", 212 | "3333 articles for Sebaceous Glands\n", 213 | "5278 articles for Ganglia, Autonomic\n", 214 | "1753 articles for Uvea\n", 215 | "2250 articles for Pia Mater\n", 216 | "313 articles for Oval Window, Ear\n", 217 | "62855 articles for Retina\n", 218 | "2802 articles for Purkinje Fibers\n", 219 | "10795 articles for Ear, External\n", 220 | "7283 articles for Prosencephalon\n", 221 | "15487 articles for Head\n", 222 | "10011 articles for Submandibular Gland\n", 223 | "45 articles for Metencephalon\n", 224 | "11258 articles for Pulmonary Veins\n", 225 | "4959 articles for Neck Muscles\n", 226 | "4934 articles for Decidua\n", 227 | "1974 articles for Loop of Henle\n", 228 | "2487 articles for Zygoma\n", 229 | "2774 articles for Intestinal Secretions\n", 230 | "1151 articles for Eyelashes\n", 231 | "2388 articles for Brachiocephalic Trunk\n", 232 | "12103 articles for Umbilical Veins\n", 233 | "60128 articles for Testis\n", 234 | "22830 articles for Pulmonary Alveoli\n", 235 | "28862 articles for Skull\n", 236 | "368 articles for Geniculate Ganglion\n", 237 | "1560 articles for Cochlear Nucleus\n", 238 | "7456 articles for Ganglia\n", 239 | "53374 articles for Uterus\n", 240 | "5274 articles for Umbilical Arteries\n", 241 | "2980 articles for Ethmoid Sinus\n", 242 | "2110 articles for Stellate Ganglion\n", 243 | "3345 articles for Palate, Soft\n", 244 | "5092 articles for Scapula\n", 245 | "8415 articles for Median Nerve\n", 246 | "816 articles for Maxillary Artery\n", 247 | "50886 articles for Thyroid Gland\n", 248 | "70886 articles for Lymph Nodes\n", 249 | "9377 articles for Elbow Joint\n", 250 | "3943 articles for Nipples\n", 251 | "6072 articles for Atrioventricular Node\n", 252 | "594 articles for Anterior Cerebral Artery\n", 253 | "3299 articles for Skull Base\n", 254 | "38733 articles for Mandible\n", 255 | "38733 articles for Mandible\n", 256 | "8666 articles for Endocrine Glands\n", 257 | "13807 articles for Immune System\n", 258 | "444 articles for Vestibular Aqueduct\n", 259 | "22218 articles for Cartilage\n", 260 | "70968 articles for Spinal Cord\n", 261 | "2860 articles for Radial Nerve\n", 262 | "7956 articles for Vas Deferens\n", 263 | "4298 articles for Vestibulocochlear Nerve\n", 264 | "1018 articles for Sternoclavicular Joint\n", 265 | "17244 articles for Mammary Glands, Animal\n", 266 | "2818 articles for Exocrine Glands\n", 267 | "3322 articles for Fovea Centralis\n", 268 | "18490 articles for Aorta, Abdominal\n", 269 | "3535 articles for Endocrine System\n", 270 | "4302 articles for Ear Canal\n", 271 | "848 articles for Bile Canaliculi\n", 272 | "430 articles for Cochlear Duct\n", 273 | "49555 articles for Muscle, Smooth\n", 274 | "23210 articles for Endometrium\n", 275 | "882 articles for Acromion\n", 276 | "6664 articles for Lacrimal Apparatus\n", 277 | "39662 articles for Pulmonary Artery\n", 278 | "20823 articles for Foot\n", 279 | "1819 articles for Olfactory Nerve\n", 280 | "1000 articles for Medial Forebrain Bundle\n", 281 | "10831 articles for Parathyroid Glands\n", 282 | "11571 articles for Bile Ducts\n", 283 | "23513 articles for Neck\n", 284 | "72662 articles for Bone and Bones\n", 285 | "2449 articles for Nasal Bone\n", 286 | "6464 articles for Kidney Medulla\n", 287 | "5374 articles for Hepatic Veins\n", 288 | "3782 articles for Masseter Muscle\n", 289 | "121682 articles for Heart\n", 290 | "4958 articles for Endothelium, Corneal\n", 291 | "3739 articles for Tibial Nerve\n", 292 | "7079 articles for Tears\n", 293 | "18399 articles for Gastric Juice\n", 294 | "2475 articles for Musculoskeletal System\n", 295 | "3679 articles for Hematopoietic System\n", 296 | "4587 articles for Elastic Tissue\n", 297 | "1078 articles for Stapes\n", 298 | "856 articles for Zona Glomerulosa\n", 299 | "12726 articles for Pericardium\n", 300 | "7595 articles for Arterioles\n", 301 | "3811 articles for Myenteric Plexus\n", 302 | "2157 articles for Enteric Nervous System\n", 303 | "905 articles for Celiac Plexus\n", 304 | "21094 articles for Ureter\n", 305 | "4224 articles for Pulmonary Valve\n", 306 | "6662 articles for Pyramidal Tracts\n", 307 | "10362 articles for Kidney Cortex\n", 308 | "15782 articles for Renal Artery\n", 309 | "2332 articles for Petrous Bone\n", 310 | "66514 articles for Epithelium\n", 311 | "20764 articles for Hair\n", 312 | "2413 articles for Ophthalmic Artery\n", 313 | "10484 articles for Fallopian Tubes\n", 314 | "272 articles for Endolymphatic Duct\n", 315 | "4235 articles for Epithelium, Corneal\n", 316 | "10160 articles for Pons\n", 317 | "1674 articles for Bronchial Arteries\n", 318 | "5314 articles for Colostrum\n", 319 | "868 articles for Neuropil\n", 320 | "21982 articles for Digestive System\n", 321 | "11662 articles for Eyelids\n", 322 | "2474 articles for Lumbosacral Plexus\n", 323 | "158 articles for Truncus Arteriosus\n", 324 | "7471 articles for Macula Lutea\n", 325 | "13490 articles for Salivary Glands\n", 326 | "14568 articles for Elbow\n", 327 | "222 articles for Acoustic Maculae\n", 328 | "1188 articles for Sebum\n", 329 | "41482 articles for Pituitary Gland\n", 330 | "2389 articles for Femoral Nerve\n", 331 | "6717 articles for Renal Veins\n", 332 | "6673 articles for Subclavian Artery\n", 333 | "3161 articles for Trigeminal Ganglion\n", 334 | "42968 articles for Embryo, Mammalian\n", 335 | "18090 articles for Endothelium\n", 336 | "18347 articles for Portal Vein\n", 337 | "4601 articles for Clavicle\n", 338 | "50256 articles for Blood\n", 339 | "1219 articles for Zygapophyseal Joint\n", 340 | "1882 articles for Retinal Vein\n", 341 | "20645 articles for Urethra\n", 342 | "11286 articles for Synovial Fluid\n", 343 | "28147 articles for Cervical Vertebrae\n", 344 | "39275 articles for Central Nervous System\n", 345 | "887 articles for Sesamoid Bones\n", 346 | "1766 articles for Otolithic Membrane\n", 347 | "29346 articles for Prostate\n", 348 | "8731 articles for Brachial Artery\n", 349 | "6440 articles for Facial Muscles\n", 350 | "2580 articles for Cerebellar Nuclei\n", 351 | "486 articles for Ejaculatory Ducts\n", 352 | "1676 articles for Nasolacrimal Duct\n", 353 | "1035 articles for Round Window, Ear\n", 354 | "67285 articles for Heart Ventricles\n", 355 | "12568 articles for Ear, Middle\n", 356 | "20945 articles for Autonomic Nervous System\n", 357 | "1193 articles for Serous Membrane\n", 358 | "3208 articles for Cranial Nerves\n", 359 | "777 articles for Trochlear Nerve\n", 360 | "2901 articles for Occipital Bone\n", 361 | "5188 articles for Carotid Artery, Common\n", 362 | "5890 articles for Paranasal Sinuses\n", 363 | "2866 articles for Oculomotor Nerve\n", 364 | "2574 articles for Hepatic Duct, Common\n", 365 | "11292 articles for Ear, Inner\n", 366 | "2976 articles for Peripheral Nervous System\n", 367 | "28343 articles for Hindlimb\n", 368 | "3319 articles for Sacroiliac Joint\n", 369 | "7290 articles for Perineum\n", 370 | "700 articles for Cerumen\n", 371 | "7040 articles for Tricuspid Valve\n", 372 | "3474 articles for Seminiferous Tubules\n", 373 | "14570 articles for Vena Cava, Inferior\n", 374 | "21373 articles for Tendons\n", 375 | "3202 articles for Optic Chiasm\n", 376 | "8320 articles for Parasympathetic Nervous System\n", 377 | "2226 articles for Follicular Fluid\n", 378 | "4265 articles for Nephrons\n", 379 | "9982 articles for Hip\n", 380 | "8989 articles for Dura Mater\n", 381 | "33485 articles for Saliva\n", 382 | "4545 articles for Hair Follicle\n", 383 | "8669 articles for Kidney Pelvis\n", 384 | "253 articles for Ultimobranchial Body\n", 385 | "24382 articles for Cervix Uteri\n", 386 | "480 articles for Internal Capsule\n", 387 | "3271 articles for Vestibular Nuclei\n", 388 | "29115 articles for Eye\n", 389 | "3742 articles for Celiac Artery\n", 390 | "8402 articles for Pleura\n", 391 | "5731 articles for Sinoatrial Node\n", 392 | "1892 articles for Chromaffin Cells\n", 393 | "15068 articles for Synovial Membrane\n", 394 | "715 articles for Spinothalamic Tracts\n", 395 | "218 articles for Tensor Tympani\n", 396 | "7331 articles for Reticular Formation\n", 397 | "56200 articles for Ovary\n", 398 | "5004 articles for Cerebellar Cortex\n", 399 | "1606 articles for Trigeminal Nuclei\n", 400 | "2947 articles for Retinal Artery\n", 401 | "10206 articles for Shoulder\n", 402 | "7475 articles for Pancreatic Ducts\n", 403 | "19257 articles for Epidermis\n", 404 | "8584 articles for Maxillary Sinus\n", 405 | "18673 articles for Basement Membrane\n", 406 | "9330 articles for Nasal Cavity\n", 407 | "39454 articles for Cornea\n", 408 | "14737 articles for Respiratory System\n", 409 | "9830 articles for Temporal Bone\n", 410 | "10127 articles for Corpus Luteum\n", 411 | "1571 articles for Superior Cervical Ganglion\n", 412 | "32909 articles for Face\n", 413 | "6536 articles for Cheek\n", 414 | "11521 articles for Hepatic Artery\n", 415 | "35393 articles for Sympathetic Nervous System\n", 416 | "391093 articles for Brain\n", 417 | "16739 articles for Semen\n", 418 | "8922 articles for Sclera\n", 419 | "3358 articles for Frontal Sinus\n", 420 | "5619 articles for Biliary Tract\n", 421 | "5184 articles for Endocardium\n", 422 | "892 articles for Meibomian Glands\n", 423 | "118 articles for Diagonal Band of Broca\n", 424 | "1575 articles for Intercostal Muscles\n", 425 | "3514 articles for Sphenoid Bone\n", 426 | "10040 articles for Genitalia, Female\n", 427 | "1235 articles for Skeleton\n", 428 | "429 articles for Scala Tympani\n", 429 | "29363 articles for Blood Vessels\n", 430 | "9860 articles for Aqueous Humor\n", 431 | "33165 articles for Islets of Langerhans\n", 432 | "161 articles for Para-Aortic Bodies\n", 433 | "9149 articles for Ribs\n", 434 | "21783 articles for Hip Joint\n", 435 | "17297 articles for Sciatic Nerve\n", 436 | "5419 articles for Spinal Nerves\n", 437 | "23545 articles for Bile\n", 438 | "18713 articles for Nose\n", 439 | "14785 articles for Orbit\n", 440 | "1947 articles for Glossopharyngeal Nerve\n", 441 | "2651 articles for Turbinates\n", 442 | "1034 articles for Diaphyses\n", 443 | "7338 articles for Jaw\n", 444 | "23702 articles for Spine\n", 445 | "12877 articles for Adrenal Cortex\n", 446 | "7442 articles for Ciliary Body\n", 447 | "40260 articles for Adrenal Glands\n", 448 | "1439 articles for Azygos Vein\n", 449 | "3276 articles for Bundle of His\n", 450 | "8269 articles for Popliteal Artery\n", 451 | "823 articles for Stria Vascularis\n", 452 | "34461 articles for Urine\n", 453 | "4302 articles for Corneal Stroma\n", 454 | "23289 articles for Aortic Valve\n", 455 | "200 articles for Area Postrema\n", 456 | "387 articles for Neurilemma\n", 457 | "1886 articles for Axis\n", 458 | "932 articles for Salivary Ducts\n", 459 | "17828 articles for Diaphragm\n", 460 | "181304 articles for Lung\n", 461 | "2212 articles for Carotid Artery, External\n", 462 | "3792 articles for Oviducts\n", 463 | "9569 articles for Growth Plate\n", 464 | "8384 articles for Facial Nerve\n", 465 | "50094 articles for Knee\n", 466 | "4815 articles for Meninges\n", 467 | "4618 articles for Periosteum\n", 468 | "27105 articles for Veins\n", 469 | "7875 articles for Forelimb\n", 470 | "656 articles for Fourth Ventricle\n", 471 | "2817 articles for Mesenteric Artery, Superior\n", 472 | "1790 articles for Abducens Nerve\n", 473 | "24297 articles for Mitral Valve\n", 474 | "5586 articles for Telencephalon\n", 475 | "839 articles for Accessory Nerve\n", 476 | "1563 articles for Saccule and Utricle\n", 477 | "4045 articles for Vulva\n", 478 | "7473 articles for Brachial Plexus\n", 479 | "1133 articles for Endolymphatic Sac\n", 480 | "1710 articles for Hyoid Bone\n", 481 | "693 articles for Trigeminal Nucleus, Spinal\n", 482 | "234281 articles for Kidney\n", 483 | "14926 articles for Plasma\n", 484 | "20874 articles for Peripheral Nerves\n", 485 | "19503 articles for Nervous System\n", 486 | "2910 articles for Sphenoid Sinus\n", 487 | "398 articles for Mesenteric Artery, Inferior\n", 488 | "1316 articles for Sublingual Gland\n", 489 | "8982 articles for Ear\n", 490 | "5827 articles for Adipose Tissue, Brown\n", 491 | "6326 articles for Ulnar Nerve\n", 492 | "28976 articles for Brain Stem\n", 493 | "7950 articles for Pupil\n", 494 | "25665 articles for Fingers\n", 495 | "7456 articles for Heart Septum\n", 496 | "2905 articles for Hypoglossal Nerve\n", 497 | "11207 articles for Pineal Gland\n", 498 | "7808 articles for Optic Disk\n", 499 | "4657 articles for Cochlear Nerve\n", 500 | "7849 articles for Sternum\n", 501 | "3487 articles for Splenic Artery\n", 502 | "2059 articles for Juxtaglomerular Apparatus\n", 503 | "7755 articles for Scrotum\n", 504 | "14802 articles for Shoulder Joint\n", 505 | "1622 articles for Joint Capsule\n", 506 | "37166 articles for Bronchi\n", 507 | "34463 articles for Hand\n", 508 | "29131 articles for Vagina\n", 509 | "39872 articles for Knee Joint\n", 510 | "13964 articles for Larynx\n", 511 | "529 articles for Posterior Cerebral Artery\n", 512 | "3832 articles for Kidney Tubules, Collecting\n", 513 | "4097 articles for Tunica Media\n", 514 | "415 articles for Zona Fasciculata\n", 515 | "9454 articles for Choroid\n", 516 | "6330 articles for Myometrium\n", 517 | "1541 articles for Uvula\n", 518 | "1377 articles for Clitoris\n", 519 | "2054 articles for Temporal Muscle\n", 520 | "4811 articles for Heart Valves\n", 521 | "21040 articles for Kidney Tubules\n", 522 | "4609 articles for Radial Artery\n", 523 | "3766 articles for Middle Cerebral Artery\n", 524 | "8773 articles for Lip\n", 525 | "58594 articles for Bone Marrow\n", 526 | "63407 articles for Adipose Tissue\n", 527 | "9228 articles for Abdominal Muscles\n", 528 | "1739 articles for Rectus Abdominis\n", 529 | "40413 articles for Lumbar Vertebrae\n", 530 | "4744 articles for Glottis\n", 531 | "2472 articles for Rhombencephalon\n", 532 | "15689 articles for Ovarian Follicle\n", 533 | "72824 articles for Feces\n", 534 | "6049 articles for Phrenic Nerve\n", 535 | "25711 articles for Cochlea\n", 536 | "19935 articles for Connective Tissue\n", 537 | "1666 articles for Popliteal Vein\n", 538 | "1473 articles for Lateral Ventricles\n", 539 | "6089 articles for Urinary Tract\n" 540 | ] 541 | } 542 | ], 543 | "source": [ 544 | "rows_out = list()\n", 545 | "\n", 546 | "for i, row in uberon_df.iterrows():\n", 547 | " term_query = '{tissue}[MeSH Terms:noexp]'.format(tissue = row.mesh_name.lower())\n", 548 | " payload = {'db': 'pubmed', 'term': term_query}\n", 549 | " pmids = eutility.esearch_query(payload, retmax = 5000, sleep=2)\n", 550 | " row['term_query'] = term_query\n", 551 | " row['n_articles'] = len(pmids)\n", 552 | " row['pubmed_ids'] = '|'.join(pmids)\n", 553 | " rows_out.append(row)\n", 554 | " print('{} articles for {}'.format(len(pmids), row.mesh_name))\n", 555 | "\n", 556 | "uberon_pmids_df = pandas.DataFrame(rows_out)" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 4, 562 | "metadata": { 563 | "collapsed": false, 564 | "jupyter": { 565 | "outputs_hidden": false 566 | } 567 | }, 568 | "outputs": [ 569 | { 570 | "data": { 571 | "text/html": [ 572 | "
\n", 573 | "\n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | "
uberon_iduberon_namemesh_idmesh_nameterm_queryn_articlespubmed_ids
0UBERON:0001716secondary palateD010159Palatepalate[MeSH Terms:noexp]928426023113|25975064|25895319|25872295|25869559|2...
1UBERON:0001908optic tractD014795Visual Pathwaysvisual pathways[MeSH Terms:noexp]1578626113723|26089513|26080589|26080584|25972183|2...
2UBERON:0002286third ventricleD020542Third Ventriclethird ventricle[MeSH Terms:noexp]135926120619|26023696|25723723|25723303|25723298|2...
3UBERON:0002349myocardiumD009206Myocardiummyocardium[MeSH Terms:noexp]13968726072537|26062198|26040042|26040041|26039915|2...
4UBERON:0000978legD035002Lower Extremitylower extremity[MeSH Terms:noexp]859626118216|26072540|26062181|26047150|26047149|2...
\n", 639 | "
" 640 | ], 641 | "text/plain": [ 642 | " uberon_id uberon_name mesh_id mesh_name \\\n", 643 | "0 UBERON:0001716 secondary palate D010159 Palate \n", 644 | "1 UBERON:0001908 optic tract D014795 Visual Pathways \n", 645 | "2 UBERON:0002286 third ventricle D020542 Third Ventricle \n", 646 | "3 UBERON:0002349 myocardium D009206 Myocardium \n", 647 | "4 UBERON:0000978 leg D035002 Lower Extremity \n", 648 | "\n", 649 | " term_query n_articles \\\n", 650 | "0 palate[MeSH Terms:noexp] 9284 \n", 651 | "1 visual pathways[MeSH Terms:noexp] 15786 \n", 652 | "2 third ventricle[MeSH Terms:noexp] 1359 \n", 653 | "3 myocardium[MeSH Terms:noexp] 139687 \n", 654 | "4 lower extremity[MeSH Terms:noexp] 8596 \n", 655 | "\n", 656 | " pubmed_ids \n", 657 | "0 26023113|25975064|25895319|25872295|25869559|2... \n", 658 | "1 26113723|26089513|26080589|26080584|25972183|2... \n", 659 | "2 26120619|26023696|25723723|25723303|25723298|2... \n", 660 | "3 26072537|26062198|26040042|26040041|26039915|2... \n", 661 | "4 26118216|26072540|26062181|26047150|26047149|2... " 662 | ] 663 | }, 664 | "execution_count": 4, 665 | "metadata": {}, 666 | "output_type": "execute_result" 667 | } 668 | ], 669 | "source": [ 670 | "with gzip.open('data/uberon-pmids.tsv.gz', 'w') as write_file:\n", 671 | " write_file = io.TextIOWrapper(write_file)\n", 672 | " uberon_pmids_df.to_csv(write_file, sep='\\t', index=False)\n", 673 | "\n", 674 | "uberon_pmids_df.head()" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "# Tissue-Disease Cooccurrence" 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": 9, 687 | "metadata": { 688 | "collapsed": false, 689 | "jupyter": { 690 | "outputs_hidden": false 691 | } 692 | }, 693 | "outputs": [], 694 | "source": [ 695 | "uberon_df, uberon_to_pmids = cooccurrence.read_pmids_tsv('data/uberon-pmids.tsv.gz', key='uberon_id')\n", 696 | "disease_df, disease_to_pmids = cooccurrence.read_pmids_tsv('data/disease-pmids.tsv.gz', key='doid_code')" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "execution_count": 10, 702 | "metadata": { 703 | "collapsed": false, 704 | "jupyter": { 705 | "outputs_hidden": false 706 | } 707 | }, 708 | "outputs": [ 709 | { 710 | "name": "stdout", 711 | "output_type": "stream", 712 | "text": [ 713 | "Total articles containing a doid_code: 3686312\n", 714 | "Total articles containing a uberon_id: 4697277\n", 715 | "Total articles containing both a doid_code and uberon_id: 696252\n", 716 | "\n", 717 | "After removing terms without any cooccurences:\n", 718 | "+ 133 doid_codes remain\n", 719 | "+ 401 uberon_ids remain\n", 720 | "\n", 721 | "Cooccurrence scores calculated for 53333 doid_code -- uberon_id pairs\n" 722 | ] 723 | } 724 | ], 725 | "source": [ 726 | "cooc_df = cooccurrence.score_pmid_cooccurrence(disease_to_pmids, uberon_to_pmids, 'doid_code', 'uberon_id')" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 11, 732 | "metadata": { 733 | "collapsed": false, 734 | "jupyter": { 735 | "outputs_hidden": false 736 | } 737 | }, 738 | "outputs": [ 739 | { 740 | "data": { 741 | "text/html": [ 742 | "
\n", 743 | "\n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | "
doid_codedoid_nameuberon_iduberon_namecooccurrenceexpectedenrichmentodds_ratiop_fisher
28748DOID:10652Alzheimer's diseaseUBERON:0000955brain112091182.6340699.47799574.2107610.000000e+00
28553DOID:10652Alzheimer's diseaseUBERON:0001890forebrain1147.32635015.56027221.7337645.971023e-99
28476DOID:10652Alzheimer's diseaseUBERON:0002037cerebellum30386.5483683.5009333.7401493.504584e-76
28541DOID:10652Alzheimer's diseaseUBERON:0002148locus ceruleus978.45059811.47847714.4497001.183699e-70
28708DOID:10652Alzheimer's diseaseUBERON:0000011parasympathetic nervous system10314.7276506.9936487.9524123.985211e-53
\n", 821 | "
" 822 | ], 823 | "text/plain": [ 824 | " doid_code doid_name uberon_id \\\n", 825 | "28748 DOID:10652 Alzheimer's disease UBERON:0000955 \n", 826 | "28553 DOID:10652 Alzheimer's disease UBERON:0001890 \n", 827 | "28476 DOID:10652 Alzheimer's disease UBERON:0002037 \n", 828 | "28541 DOID:10652 Alzheimer's disease UBERON:0002148 \n", 829 | "28708 DOID:10652 Alzheimer's disease UBERON:0000011 \n", 830 | "\n", 831 | " uberon_name cooccurrence expected enrichment \\\n", 832 | "28748 brain 11209 1182.634069 9.477995 \n", 833 | "28553 forebrain 114 7.326350 15.560272 \n", 834 | "28476 cerebellum 303 86.548368 3.500933 \n", 835 | "28541 locus ceruleus 97 8.450598 11.478477 \n", 836 | "28708 parasympathetic nervous system 103 14.727650 6.993648 \n", 837 | "\n", 838 | " odds_ratio p_fisher \n", 839 | "28748 74.210761 0.000000e+00 \n", 840 | "28553 21.733764 5.971023e-99 \n", 841 | "28476 3.740149 3.504584e-76 \n", 842 | "28541 14.449700 1.183699e-70 \n", 843 | "28708 7.952412 3.985211e-53 " 844 | ] 845 | }, 846 | "execution_count": 11, 847 | "metadata": {}, 848 | "output_type": "execute_result" 849 | } 850 | ], 851 | "source": [ 852 | "cooc_df = uberon_df[['uberon_id', 'uberon_name']].drop_duplicates().merge(cooc_df)\n", 853 | "cooc_df = disease_df[['doid_code', 'doid_name']].drop_duplicates().merge(cooc_df)\n", 854 | "cooc_df = cooc_df.sort_values(by=['doid_name', 'p_fisher'])\n", 855 | "cooc_df.head()" 856 | ] 857 | }, 858 | { 859 | "cell_type": "code", 860 | "execution_count": 12, 861 | "metadata": { 862 | "collapsed": true, 863 | "jupyter": { 864 | "outputs_hidden": true 865 | } 866 | }, 867 | "outputs": [], 868 | "source": [ 869 | "cooc_df.to_csv('data/disease-uberon-cooccurrence.tsv', index=False, sep='\\t')" 870 | ] 871 | } 872 | ], 873 | "metadata": { 874 | "kernelspec": { 875 | "display_name": "Python 3", 876 | "language": "python", 877 | "name": "python3" 878 | }, 879 | "language_info": { 880 | "codemirror_mode": { 881 | "name": "ipython", 882 | "version": 3 883 | }, 884 | "file_extension": ".py", 885 | "mimetype": "text/x-python", 886 | "name": "python", 887 | "nbconvert_exporter": "python", 888 | "pygments_lexer": "ipython3", 889 | "version": "3.9.2" 890 | } 891 | }, 892 | "nbformat": 4, 893 | "nbformat_minor": 4 894 | } 895 | --------------------------------------------------------------------------------