├── .gitignore ├── README.md ├── SPARQL_pandas.ipynb └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tutorial: accessing SPARQL endpoints in Python with Pandas 2 | 3 | ## Installation 4 | 5 | 1. Clone the repository. 6 | 2. Set up a virtual environment: `pyvenv venv` 7 | 3. Load the virtual environment: `source venv/bin/activate` 8 | 4. Install necessary dependencies: `pip install -r requirements.txt` 9 | 5. Start a Jupyter notebook server: `jupyter notebook` 10 | 6. Run the notebook. 11 | -------------------------------------------------------------------------------- /SPARQL_pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Querying a SPARQL endpoint (knowledge graph) in python\n", 8 | "The code below contains everything needed to execute a SPARQL query, manipulate the data it returns, and store the data in a file suitable for rendering with cytoscape. Your goal is to get this to work locally and attempt to extend it by, for example, altering the query, sorting the results differently, or working with the output in cytoscape to build an informative visualization. \n", 9 | "\n", 10 | "## Assignment\n", 11 | "Hand in your version of this notebook, a short report describing what you attempted to do with it, and a visualization of your results (e.g. as an image exported from cytoscape). \n", 12 | "\n", 13 | "## Necessary imports\n", 14 | "\n", 15 | "Note you will need to install (e.g. pip install) pandas (which should have come with Anaconda) and SPARQLWrapper on your machine before this will work" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "import pandas as pd\n", 27 | "\n", 28 | "from pandas.io.json import json_normalize\n", 29 | "from SPARQLWrapper import SPARQLWrapper, JSON" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Create query function\n", 37 | "\n", 38 | "The function below takes a SPARQL query string, queries a sparql service, and returns the result as a pandas DataFrame (a table)." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "def query_wikidata(sparql_query, sparql_service_url):\n", 50 | " \"\"\"\n", 51 | " Query the endpoint with the given query string and return the results as a pandas Dataframe.\n", 52 | " \"\"\"\n", 53 | " # create the connection to the endpoint\n", 54 | " # Wikidata enforces now a strict User-Agent policy, we need to specify the agent\n", 55 | " # See here https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2019/07#problems_with_query_API\n", 56 | " # https://meta.wikimedia.org/wiki/User-Agent_policy\n", 57 | " sparql = SPARQLWrapper(sparql_service_url, agent=\"Sparql Wrapper on Jupyter example\") \n", 58 | " \n", 59 | " sparql.setQuery(sparql_query)\n", 60 | " sparql.setReturnFormat(JSON)\n", 61 | "\n", 62 | " # ask for the result\n", 63 | " result = sparql.query().convert()\n", 64 | " return json_normalize(result[\"results\"][\"bindings\"])" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Run our SPARQL query\n", 72 | "The example here is the drug -> interacts with protein -> encoded by -> gene -> gwas -> disease query pattern. \n", 73 | "Test it here http://tinyurl.com/hwm9388 and explore the results to see you you might adapt it. Other example queries for the wikidata service: http://tinyurl.com/j222k6g , http://tinyurl.com/gpfr9kj " 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": { 80 | "collapsed": false 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "sparql_query = \"\"\"SELECT ?drug ?drugLabel ?gene ?geneLabel ?entrez_id ?disease ?diseaseLabel WHERE {\n", 85 | " ?drug wdt:P129 ?gene_product . # drug interacts with a gene_product \n", 86 | " ?gene_product wdt:P702 ?gene . # gene_product is encoded by a gene\n", 87 | " ?gene wdt:P2293 ?disease . # gene is genetically associated with a disease \n", 88 | " ?gene wdt:P351 ?entrez_id . # get the entrez gene id for the gene \n", 89 | " # add labels\n", 90 | " SERVICE wikibase:label {\n", 91 | " bd:serviceParam wikibase:language \"en\" .\n", 92 | " }\n", 93 | " }\n", 94 | " limit 1000\n", 95 | " \"\"\"\n", 96 | "#to query another endpoint, change the URL for the service and the query\n", 97 | "sparql_service_url = \"https://query.wikidata.org/sparql\"\n", 98 | "result_table = query_wikidata(sparql_query, sparql_service_url)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "## From here on we are using the Python Data Analysis Library \"Pandas\"\n", 106 | "Look for an introduction like this http://synesthesiam.com/posts/an-introduction-to-pandas.html if you get stuck.." 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "Now look at the results of our SPARQL query. \"shape\" shows the dimensions (n rows by n cols). head() shows the column headers (which will be important later) and a sample of the first few rows of data" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 4, 119 | "metadata": { 120 | "collapsed": false 121 | }, 122 | "outputs": [ 123 | { 124 | "data": { 125 | "text/plain": [ 126 | "(595, 17)" 127 | ] 128 | }, 129 | "execution_count": 4, 130 | "metadata": {}, 131 | "output_type": "execute_result" 132 | } 133 | ], 134 | "source": [ 135 | "result_table.shape" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 5, 141 | "metadata": { 142 | "collapsed": false, 143 | "scrolled": false 144 | }, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/html": [ 149 | "
\n", 150 | "\n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | "
disease.typedisease.valuediseaseLabel.typediseaseLabel.valuediseaseLabel.xml:langdrug.typedrug.valuedrugLabel.typedrugLabel.valuedrugLabel.xml:langentrez_id.typeentrez_id.valuegene.typegene.valuegeneLabel.typegeneLabel.valuegeneLabel.xml:lang
0urihttp://www.wikidata.org/entity/Q12174literalobesityenurihttp://www.wikidata.org/entity/Q60235literalCaffeineenliteral140urihttp://www.wikidata.org/entity/Q4682275literaladenosine A3 receptoren
1urihttp://www.wikidata.org/entity/Q12174literalobesityenurihttp://www.wikidata.org/entity/Q190012literalAdenosineenliteral140urihttp://www.wikidata.org/entity/Q4682275literaladenosine A3 receptoren
2urihttp://www.wikidata.org/entity/Q12174literalobesityenurihttp://www.wikidata.org/entity/Q407308literalTheophyllineenliteral140urihttp://www.wikidata.org/entity/Q4682275literaladenosine A3 receptoren
3urihttp://www.wikidata.org/entity/Q12174literalobesityenurihttp://www.wikidata.org/entity/Q729213literalNicardipineenliteral140urihttp://www.wikidata.org/entity/Q4682275literaladenosine A3 receptoren
4urihttp://www.wikidata.org/entity/Q12174literalobesityenurihttp://www.wikidata.org/entity/Q905783literalIstradefyllineenliteral140urihttp://www.wikidata.org/entity/Q4682275literaladenosine A3 receptoren
\n", 276 | "
" 277 | ], 278 | "text/plain": [ 279 | " disease.type disease.value diseaseLabel.type \\\n", 280 | "0 uri http://www.wikidata.org/entity/Q12174 literal \n", 281 | "1 uri http://www.wikidata.org/entity/Q12174 literal \n", 282 | "2 uri http://www.wikidata.org/entity/Q12174 literal \n", 283 | "3 uri http://www.wikidata.org/entity/Q12174 literal \n", 284 | "4 uri http://www.wikidata.org/entity/Q12174 literal \n", 285 | "\n", 286 | " diseaseLabel.value diseaseLabel.xml:lang drug.type \\\n", 287 | "0 obesity en uri \n", 288 | "1 obesity en uri \n", 289 | "2 obesity en uri \n", 290 | "3 obesity en uri \n", 291 | "4 obesity en uri \n", 292 | "\n", 293 | " drug.value drugLabel.type drugLabel.value \\\n", 294 | "0 http://www.wikidata.org/entity/Q60235 literal Caffeine \n", 295 | "1 http://www.wikidata.org/entity/Q190012 literal Adenosine \n", 296 | "2 http://www.wikidata.org/entity/Q407308 literal Theophylline \n", 297 | "3 http://www.wikidata.org/entity/Q729213 literal Nicardipine \n", 298 | "4 http://www.wikidata.org/entity/Q905783 literal Istradefylline \n", 299 | "\n", 300 | " drugLabel.xml:lang entrez_id.type entrez_id.value gene.type \\\n", 301 | "0 en literal 140 uri \n", 302 | "1 en literal 140 uri \n", 303 | "2 en literal 140 uri \n", 304 | "3 en literal 140 uri \n", 305 | "4 en literal 140 uri \n", 306 | "\n", 307 | " gene.value geneLabel.type \\\n", 308 | "0 http://www.wikidata.org/entity/Q4682275 literal \n", 309 | "1 http://www.wikidata.org/entity/Q4682275 literal \n", 310 | "2 http://www.wikidata.org/entity/Q4682275 literal \n", 311 | "3 http://www.wikidata.org/entity/Q4682275 literal \n", 312 | "4 http://www.wikidata.org/entity/Q4682275 literal \n", 313 | "\n", 314 | " geneLabel.value geneLabel.xml:lang \n", 315 | "0 adenosine A3 receptor en \n", 316 | "1 adenosine A3 receptor en \n", 317 | "2 adenosine A3 receptor en \n", 318 | "3 adenosine A3 receptor en \n", 319 | "4 adenosine A3 receptor en " 320 | ] 321 | }, 322 | "execution_count": 5, 323 | "metadata": {}, 324 | "output_type": "execute_result" 325 | } 326 | ], 327 | "source": [ 328 | "result_table.head()" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "Compare the column names to the first line of the SPARQL query:\n", 336 | "\n", 337 | "SELECT ?drug ?drugLabel ?gene ?geneLabel ?entrez_id ?disease ?diseaseLabel\n", 338 | "\n", 339 | "Each of the SELECTed items (e.g. ?drug) results in 2 columns: one for its data type (either uri or literal) and one for its value (the thing you are probably looking for). String literal values for labels have an additional column indicating which language they are in. \n", 340 | "\n", 341 | "For the purposes of this exercise, we can simplify this to focus only on the string names of the drug, disease and gene in each row in the results. As follows." 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "## Extract only the columns we care about" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 6, 354 | "metadata": { 355 | "collapsed": true 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "simple_table = result_table[[\"drugLabel.value\", \"diseaseLabel.value\", \"geneLabel.value\"]]" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 7, 365 | "metadata": { 366 | "collapsed": false 367 | }, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/html": [ 372 | "
\n", 373 | "\n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | "
drugLabel.valuediseaseLabel.valuegeneLabel.value
0Caffeineobesityadenosine A3 receptor
1Adenosineobesityadenosine A3 receptor
2Theophyllineobesityadenosine A3 receptor
3Nicardipineobesityadenosine A3 receptor
4Istradefyllineobesityadenosine A3 receptor
\n", 415 | "
" 416 | ], 417 | "text/plain": [ 418 | " drugLabel.value diseaseLabel.value geneLabel.value\n", 419 | "0 Caffeine obesity adenosine A3 receptor\n", 420 | "1 Adenosine obesity adenosine A3 receptor\n", 421 | "2 Theophylline obesity adenosine A3 receptor\n", 422 | "3 Nicardipine obesity adenosine A3 receptor\n", 423 | "4 Istradefylline obesity adenosine A3 receptor" 424 | ] 425 | }, 426 | "execution_count": 7, 427 | "metadata": {}, 428 | "output_type": "execute_result" 429 | } 430 | ], 431 | "source": [ 432 | "simple_table.head()" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "### Rename the columns of our simple table\n", 440 | "\n", 441 | "Let's delete the \"Label.value\" portion from the column names." 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 8, 447 | "metadata": { 448 | "collapsed": false, 449 | "scrolled": true 450 | }, 451 | "outputs": [], 452 | "source": [ 453 | "simple_table = simple_table.rename(columns = lambda col: col.replace(\"Label.value\", \"\"))" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 9, 459 | "metadata": { 460 | "collapsed": false 461 | }, 462 | "outputs": [ 463 | { 464 | "data": { 465 | "text/html": [ 466 | "
\n", 467 | "\n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | "
drugdiseasegene
0Caffeineobesityadenosine A3 receptor
1Adenosineobesityadenosine A3 receptor
2Theophyllineobesityadenosine A3 receptor
3Nicardipineobesityadenosine A3 receptor
4Istradefyllineobesityadenosine A3 receptor
\n", 509 | "
" 510 | ], 511 | "text/plain": [ 512 | " drug disease gene\n", 513 | "0 Caffeine obesity adenosine A3 receptor\n", 514 | "1 Adenosine obesity adenosine A3 receptor\n", 515 | "2 Theophylline obesity adenosine A3 receptor\n", 516 | "3 Nicardipine obesity adenosine A3 receptor\n", 517 | "4 Istradefylline obesity adenosine A3 receptor" 518 | ] 519 | }, 520 | "execution_count": 9, 521 | "metadata": {}, 522 | "output_type": "execute_result" 523 | } 524 | ], 525 | "source": [ 526 | "simple_table.head()" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": {}, 532 | "source": [ 533 | "We have just grabbed the three columns we care about and renamed them." 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "metadata": {}, 539 | "source": [ 540 | "## Count the number of genes per (drug, disease) pair\n", 541 | "\n", 542 | "How many genes link each unique (drug, disease) pair? After counting, order the (drug, disease) pairs in descending order of number of linking genes. This is one simple method for finding stronger connections between A and C concepts - based on the number of shared B concepts. " 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "### Make unique (drug, disease) groups and count the number of genes for this group" 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": 10, 555 | "metadata": { 556 | "collapsed": true 557 | }, 558 | "outputs": [], 559 | "source": [ 560 | "counts = simple_table.groupby([\"drug\", \"disease\"]).size()" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 11, 566 | "metadata": { 567 | "collapsed": false 568 | }, 569 | "outputs": [ 570 | { 571 | "data": { 572 | "text/plain": [ 573 | "drug disease \n", 574 | "(2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor 1\n", 575 | " schizophrenia 1\n", 576 | "1,9-Pyrazoloanthrone peripheral artery disease 1\n", 577 | "2-Aminoethoxydiphenyl borate obesity 1\n", 578 | "2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis 1\n", 579 | "dtype: int64" 580 | ] 581 | }, 582 | "execution_count": 11, 583 | "metadata": {}, 584 | "output_type": "execute_result" 585 | } 586 | ], 587 | "source": [ 588 | "counts.head()" 589 | ] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": {}, 594 | "source": [ 595 | "### Turn result into a Pandas data frame (fancy smart table)" 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": 12, 601 | "metadata": { 602 | "collapsed": true 603 | }, 604 | "outputs": [], 605 | "source": [ 606 | "counts = counts.to_frame(\"gene_count\")" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": 13, 612 | "metadata": { 613 | "collapsed": false 614 | }, 615 | "outputs": [ 616 | { 617 | "data": { 618 | "text/html": [ 619 | "
\n", 620 | "\n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | "
gene_count
drugdisease
(2S,3S)-2-amino-3-phenylmethoxybutanedioic acidessential tremor1
schizophrenia1
1,9-Pyrazoloanthroneperipheral artery disease1
2-Aminoethoxydiphenyl borateobesity1
2-methyl-6-(phenylethynyl)pyridineRheumatoid Arthritis1
\n", 660 | "
" 661 | ], 662 | "text/plain": [ 663 | " gene_count\n", 664 | "drug disease \n", 665 | "(2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor 1\n", 666 | " schizophrenia 1\n", 667 | "1,9-Pyrazoloanthrone peripheral artery disease 1\n", 668 | "2-Aminoethoxydiphenyl borate obesity 1\n", 669 | "2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis 1" 670 | ] 671 | }, 672 | "execution_count": 13, 673 | "metadata": {}, 674 | "output_type": "execute_result" 675 | } 676 | ], 677 | "source": [ 678 | "counts.head()" 679 | ] 680 | }, 681 | { 682 | "cell_type": "markdown", 683 | "metadata": {}, 684 | "source": [ 685 | "### Put the drug and disease labels back into columns" 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "execution_count": 14, 691 | "metadata": { 692 | "collapsed": true 693 | }, 694 | "outputs": [], 695 | "source": [ 696 | "counts = counts.reset_index()" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "execution_count": 15, 702 | "metadata": { 703 | "collapsed": false 704 | }, 705 | "outputs": [ 706 | { 707 | "data": { 708 | "text/html": [ 709 | "
\n", 710 | "\n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | "
drugdiseasegene_count
0(2S,3S)-2-amino-3-phenylmethoxybutanedioic acidessential tremor1
1(2S,3S)-2-amino-3-phenylmethoxybutanedioic acidschizophrenia1
21,9-Pyrazoloanthroneperipheral artery disease1
32-Aminoethoxydiphenyl borateobesity1
42-methyl-6-(phenylethynyl)pyridineRheumatoid Arthritis1
\n", 752 | "
" 753 | ], 754 | "text/plain": [ 755 | " drug disease \\\n", 756 | "0 (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor \n", 757 | "1 (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid schizophrenia \n", 758 | "2 1,9-Pyrazoloanthrone peripheral artery disease \n", 759 | "3 2-Aminoethoxydiphenyl borate obesity \n", 760 | "4 2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis \n", 761 | "\n", 762 | " gene_count \n", 763 | "0 1 \n", 764 | "1 1 \n", 765 | "2 1 \n", 766 | "3 1 \n", 767 | "4 1 " 768 | ] 769 | }, 770 | "execution_count": 15, 771 | "metadata": {}, 772 | "output_type": "execute_result" 773 | } 774 | ], 775 | "source": [ 776 | "counts.head()" 777 | ] 778 | }, 779 | { 780 | "cell_type": "markdown", 781 | "metadata": {}, 782 | "source": [ 783 | "### Sort the table in descending order of genes" 784 | ] 785 | }, 786 | { 787 | "cell_type": "code", 788 | "execution_count": 16, 789 | "metadata": { 790 | "collapsed": true 791 | }, 792 | "outputs": [], 793 | "source": [ 794 | "counts = counts.sort_values(\"gene_count\", ascending = False)" 795 | ] 796 | }, 797 | { 798 | "cell_type": "code", 799 | "execution_count": 17, 800 | "metadata": { 801 | "collapsed": false 802 | }, 803 | "outputs": [ 804 | { 805 | "data": { 806 | "text/html": [ 807 | "
\n", 808 | "\n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | "
drugdiseasegene_count
127Caffeineobesity3
261Givinostatobesity2
529Trichostatin Aobesity2
319Linoleic aciddiabetes mellitus type 22
393Panobinostatobesity2
\n", 850 | "
" 851 | ], 852 | "text/plain": [ 853 | " drug disease gene_count\n", 854 | "127 Caffeine obesity 3\n", 855 | "261 Givinostat obesity 2\n", 856 | "529 Trichostatin A obesity 2\n", 857 | "319 Linoleic acid diabetes mellitus type 2 2\n", 858 | "393 Panobinostat obesity 2" 859 | ] 860 | }, 861 | "execution_count": 17, 862 | "metadata": {}, 863 | "output_type": "execute_result" 864 | } 865 | ], 866 | "source": [ 867 | "counts.head()" 868 | ] 869 | }, 870 | { 871 | "cell_type": "markdown", 872 | "metadata": {}, 873 | "source": [ 874 | "### Number the results from 0 onwards" 875 | ] 876 | }, 877 | { 878 | "cell_type": "code", 879 | "execution_count": 18, 880 | "metadata": { 881 | "collapsed": true 882 | }, 883 | "outputs": [], 884 | "source": [ 885 | "counts = counts.reset_index(drop = True)" 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 19, 891 | "metadata": { 892 | "collapsed": false 893 | }, 894 | "outputs": [ 895 | { 896 | "data": { 897 | "text/html": [ 898 | "
\n", 899 | "\n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | "
drugdiseasegene_count
0Caffeineobesity3
1Givinostatobesity2
2Trichostatin Aobesity2
3Linoleic aciddiabetes mellitus type 22
4Panobinostatobesity2
\n", 941 | "
" 942 | ], 943 | "text/plain": [ 944 | " drug disease gene_count\n", 945 | "0 Caffeine obesity 3\n", 946 | "1 Givinostat obesity 2\n", 947 | "2 Trichostatin A obesity 2\n", 948 | "3 Linoleic acid diabetes mellitus type 2 2\n", 949 | "4 Panobinostat obesity 2" 950 | ] 951 | }, 952 | "execution_count": 19, 953 | "metadata": {}, 954 | "output_type": "execute_result" 955 | } 956 | ], 957 | "source": [ 958 | "counts.head()" 959 | ] 960 | }, 961 | { 962 | "cell_type": "markdown", 963 | "metadata": {}, 964 | "source": [ 965 | "---" 966 | ] 967 | }, 968 | { 969 | "cell_type": "markdown", 970 | "metadata": {}, 971 | "source": [ 972 | "## Add the genes used to link each (drug, disease) pair\n", 973 | "\n", 974 | "Now add another column containing the actual genes linking each (drug, disease) pair." 975 | ] 976 | }, 977 | { 978 | "cell_type": "markdown", 979 | "metadata": {}, 980 | "source": [ 981 | "### Create a dictionary containing the linking genes for each unique pair" 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": 20, 987 | "metadata": { 988 | "collapsed": true 989 | }, 990 | "outputs": [], 991 | "source": [ 992 | "linking_genes = dict()\n", 993 | "for (drug, disease), small_table in simple_table.groupby([\"drug\", \"disease\"]):\n", 994 | " linking_genes[(drug, disease)] = list(small_table[\"gene\"])" 995 | ] 996 | }, 997 | { 998 | "cell_type": "markdown", 999 | "metadata": {}, 1000 | "source": [ 1001 | "### Example: retrieve the linking genes for (caffeine, obesity)" 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": 21, 1007 | "metadata": { 1008 | "collapsed": false 1009 | }, 1010 | "outputs": [ 1011 | { 1012 | "data": { 1013 | "text/plain": [ 1014 | "['adenosine A3 receptor',\n", 1015 | " 'inositol 1,4,5-trisphosphate receptor, type 1',\n", 1016 | " 'RYR2']" 1017 | ] 1018 | }, 1019 | "execution_count": 21, 1020 | "metadata": {}, 1021 | "output_type": "execute_result" 1022 | } 1023 | ], 1024 | "source": [ 1025 | "linking_genes[(\"Caffeine\", \"obesity\")]" 1026 | ] 1027 | }, 1028 | { 1029 | "cell_type": "markdown", 1030 | "metadata": {}, 1031 | "source": [ 1032 | "### Make a new column containing the linking genes" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": 22, 1038 | "metadata": { 1039 | "collapsed": true 1040 | }, 1041 | "outputs": [], 1042 | "source": [ 1043 | "counts[\"genes\"] = counts[[\"drug\", \"disease\"]].apply(\n", 1044 | " lambda row: linking_genes[(row[\"drug\"], row[\"disease\"])],\n", 1045 | " axis = 1\n", 1046 | ")" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "code", 1051 | "execution_count": 23, 1052 | "metadata": { 1053 | "collapsed": false 1054 | }, 1055 | "outputs": [ 1056 | { 1057 | "data": { 1058 | "text/html": [ 1059 | "
\n", 1060 | "\n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | "
drugdiseasegene_countgenes
0Caffeineobesity3[adenosine A3 receptor, inositol 1,4,5-trispho...
1Givinostatobesity2[histone deacetylase 9, histone deacetylase 7]
2Trichostatin Aobesity2[histone deacetylase 9, histone deacetylase 7]
3Linoleic aciddiabetes mellitus type 22[hepatocyte nuclear factor 4, alpha, peroxisom...
4Panobinostatobesity2[histone deacetylase 9, histone deacetylase 7]
\n", 1108 | "
" 1109 | ], 1110 | "text/plain": [ 1111 | " drug disease gene_count \\\n", 1112 | "0 Caffeine obesity 3 \n", 1113 | "1 Givinostat obesity 2 \n", 1114 | "2 Trichostatin A obesity 2 \n", 1115 | "3 Linoleic acid diabetes mellitus type 2 2 \n", 1116 | "4 Panobinostat obesity 2 \n", 1117 | "\n", 1118 | " genes \n", 1119 | "0 [adenosine A3 receptor, inositol 1,4,5-trispho... \n", 1120 | "1 [histone deacetylase 9, histone deacetylase 7] \n", 1121 | "2 [histone deacetylase 9, histone deacetylase 7] \n", 1122 | "3 [hepatocyte nuclear factor 4, alpha, peroxisom... \n", 1123 | "4 [histone deacetylase 9, histone deacetylase 7] " 1124 | ] 1125 | }, 1126 | "execution_count": 23, 1127 | "metadata": {}, 1128 | "output_type": "execute_result" 1129 | } 1130 | ], 1131 | "source": [ 1132 | "counts.head()" 1133 | ] 1134 | }, 1135 | { 1136 | "cell_type": "markdown", 1137 | "metadata": {}, 1138 | "source": [ 1139 | "## Save to file" 1140 | ] 1141 | }, 1142 | { 1143 | "cell_type": "code", 1144 | "execution_count": 24, 1145 | "metadata": { 1146 | "collapsed": true 1147 | }, 1148 | "outputs": [], 1149 | "source": [ 1150 | "counts.to_csv(\"drug_disease_count.tsv\", sep = '\\t', index = False, encoding = 'utf-8')" 1151 | ] 1152 | }, 1153 | { 1154 | "cell_type": "markdown", 1155 | "metadata": {}, 1156 | "source": [ 1157 | "---" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "markdown", 1162 | "metadata": {}, 1163 | "source": [ 1164 | "## Make cytoscape file\n", 1165 | "\n", 1166 | "For the example query, the sorting based on shared gene count seems unsatisfying. Now, lets look at all the results in a network view to see if any interesting patterns emerge that might shed some light on the data and how we might process it more effectively. \n", 1167 | "\n", 1168 | "Below we will create a file suitable for loading into cytoscape. It will contain the three edge types of interest, linking drugs to genes, diseases to genes, and drugs to diseases e.g.:\n", 1169 | "\n", 1170 | "source_node\tsource_type\tedge_type\ttarget_node\ttarget_type\n" 1171 | ] 1172 | }, 1173 | { 1174 | "cell_type": "markdown", 1175 | "metadata": {}, 1176 | "source": [ 1177 | "### Start with the drug and gene pairs" 1178 | ] 1179 | }, 1180 | { 1181 | "cell_type": "code", 1182 | "execution_count": 25, 1183 | "metadata": { 1184 | "collapsed": true 1185 | }, 1186 | "outputs": [], 1187 | "source": [ 1188 | "drug_gene_links = simple_table[[\"drug\", \"gene\"]]" 1189 | ] 1190 | }, 1191 | { 1192 | "cell_type": "code", 1193 | "execution_count": 26, 1194 | "metadata": { 1195 | "collapsed": false 1196 | }, 1197 | "outputs": [ 1198 | { 1199 | "data": { 1200 | "text/html": [ 1201 | "
\n", 1202 | "\n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | "
druggene
0Caffeineadenosine A3 receptor
1Adenosineadenosine A3 receptor
2Theophyllineadenosine A3 receptor
3Nicardipineadenosine A3 receptor
4Istradefyllineadenosine A3 receptor
\n", 1238 | "
" 1239 | ], 1240 | "text/plain": [ 1241 | " drug gene\n", 1242 | "0 Caffeine adenosine A3 receptor\n", 1243 | "1 Adenosine adenosine A3 receptor\n", 1244 | "2 Theophylline adenosine A3 receptor\n", 1245 | "3 Nicardipine adenosine A3 receptor\n", 1246 | "4 Istradefylline adenosine A3 receptor" 1247 | ] 1248 | }, 1249 | "execution_count": 26, 1250 | "metadata": {}, 1251 | "output_type": "execute_result" 1252 | } 1253 | ], 1254 | "source": [ 1255 | "drug_gene_links.head()" 1256 | ] 1257 | }, 1258 | { 1259 | "cell_type": "markdown", 1260 | "metadata": {}, 1261 | "source": [ 1262 | "### Rename the columns" 1263 | ] 1264 | }, 1265 | { 1266 | "cell_type": "code", 1267 | "execution_count": 27, 1268 | "metadata": { 1269 | "collapsed": true 1270 | }, 1271 | "outputs": [], 1272 | "source": [ 1273 | "drug_gene_links = drug_gene_links.rename(columns = {\"drug\": \"source_node\", \"gene\": \"target_node\"})" 1274 | ] 1275 | }, 1276 | { 1277 | "cell_type": "code", 1278 | "execution_count": 28, 1279 | "metadata": { 1280 | "collapsed": false 1281 | }, 1282 | "outputs": [ 1283 | { 1284 | "data": { 1285 | "text/html": [ 1286 | "
\n", 1287 | "\n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | "
source_nodetarget_node
0Caffeineadenosine A3 receptor
1Adenosineadenosine A3 receptor
2Theophyllineadenosine A3 receptor
3Nicardipineadenosine A3 receptor
4Istradefyllineadenosine A3 receptor
\n", 1323 | "
" 1324 | ], 1325 | "text/plain": [ 1326 | " source_node target_node\n", 1327 | "0 Caffeine adenosine A3 receptor\n", 1328 | "1 Adenosine adenosine A3 receptor\n", 1329 | "2 Theophylline adenosine A3 receptor\n", 1330 | "3 Nicardipine adenosine A3 receptor\n", 1331 | "4 Istradefylline adenosine A3 receptor" 1332 | ] 1333 | }, 1334 | "execution_count": 28, 1335 | "metadata": {}, 1336 | "output_type": "execute_result" 1337 | } 1338 | ], 1339 | "source": [ 1340 | "drug_gene_links.head()" 1341 | ] 1342 | }, 1343 | { 1344 | "cell_type": "markdown", 1345 | "metadata": {}, 1346 | "source": [ 1347 | "### Create a new column specifying the source node type" 1348 | ] 1349 | }, 1350 | { 1351 | "cell_type": "code", 1352 | "execution_count": 29, 1353 | "metadata": { 1354 | "collapsed": true 1355 | }, 1356 | "outputs": [], 1357 | "source": [ 1358 | "drug_gene_links[\"source_type\"] = \"drug\"" 1359 | ] 1360 | }, 1361 | { 1362 | "cell_type": "code", 1363 | "execution_count": 30, 1364 | "metadata": { 1365 | "collapsed": false 1366 | }, 1367 | "outputs": [ 1368 | { 1369 | "data": { 1370 | "text/html": [ 1371 | "
\n", 1372 | "\n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | "
source_nodetarget_nodesource_type
0Caffeineadenosine A3 receptordrug
1Adenosineadenosine A3 receptordrug
2Theophyllineadenosine A3 receptordrug
3Nicardipineadenosine A3 receptordrug
4Istradefyllineadenosine A3 receptordrug
\n", 1414 | "
" 1415 | ], 1416 | "text/plain": [ 1417 | " source_node target_node source_type\n", 1418 | "0 Caffeine adenosine A3 receptor drug\n", 1419 | "1 Adenosine adenosine A3 receptor drug\n", 1420 | "2 Theophylline adenosine A3 receptor drug\n", 1421 | "3 Nicardipine adenosine A3 receptor drug\n", 1422 | "4 Istradefylline adenosine A3 receptor drug" 1423 | ] 1424 | }, 1425 | "execution_count": 30, 1426 | "metadata": {}, 1427 | "output_type": "execute_result" 1428 | } 1429 | ], 1430 | "source": [ 1431 | "drug_gene_links.head()" 1432 | ] 1433 | }, 1434 | { 1435 | "cell_type": "markdown", 1436 | "metadata": {}, 1437 | "source": [ 1438 | "### Create a new column containing the edge type" 1439 | ] 1440 | }, 1441 | { 1442 | "cell_type": "code", 1443 | "execution_count": 31, 1444 | "metadata": { 1445 | "collapsed": true 1446 | }, 1447 | "outputs": [], 1448 | "source": [ 1449 | "drug_gene_links[\"edge_type\"] = \"interacts_with\"" 1450 | ] 1451 | }, 1452 | { 1453 | "cell_type": "code", 1454 | "execution_count": 32, 1455 | "metadata": { 1456 | "collapsed": false 1457 | }, 1458 | "outputs": [ 1459 | { 1460 | "data": { 1461 | "text/html": [ 1462 | "
\n", 1463 | "\n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | "
source_nodetarget_nodesource_typeedge_type
0Caffeineadenosine A3 receptordruginteracts_with
1Adenosineadenosine A3 receptordruginteracts_with
2Theophyllineadenosine A3 receptordruginteracts_with
3Nicardipineadenosine A3 receptordruginteracts_with
4Istradefyllineadenosine A3 receptordruginteracts_with
\n", 1511 | "
" 1512 | ], 1513 | "text/plain": [ 1514 | " source_node target_node source_type edge_type\n", 1515 | "0 Caffeine adenosine A3 receptor drug interacts_with\n", 1516 | "1 Adenosine adenosine A3 receptor drug interacts_with\n", 1517 | "2 Theophylline adenosine A3 receptor drug interacts_with\n", 1518 | "3 Nicardipine adenosine A3 receptor drug interacts_with\n", 1519 | "4 Istradefylline adenosine A3 receptor drug interacts_with" 1520 | ] 1521 | }, 1522 | "execution_count": 32, 1523 | "metadata": {}, 1524 | "output_type": "execute_result" 1525 | } 1526 | ], 1527 | "source": [ 1528 | "drug_gene_links.head()" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "markdown", 1533 | "metadata": {}, 1534 | "source": [ 1535 | "### Create a new column containing the target type" 1536 | ] 1537 | }, 1538 | { 1539 | "cell_type": "code", 1540 | "execution_count": 33, 1541 | "metadata": { 1542 | "collapsed": true 1543 | }, 1544 | "outputs": [], 1545 | "source": [ 1546 | "drug_gene_links[\"target_type\"] = \"gene\"" 1547 | ] 1548 | }, 1549 | { 1550 | "cell_type": "code", 1551 | "execution_count": 34, 1552 | "metadata": { 1553 | "collapsed": false 1554 | }, 1555 | "outputs": [ 1556 | { 1557 | "data": { 1558 | "text/html": [ 1559 | "
\n", 1560 | "\n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | "
source_nodetarget_nodesource_typeedge_typetarget_type
0Caffeineadenosine A3 receptordruginteracts_withgene
1Adenosineadenosine A3 receptordruginteracts_withgene
2Theophyllineadenosine A3 receptordruginteracts_withgene
3Nicardipineadenosine A3 receptordruginteracts_withgene
4Istradefyllineadenosine A3 receptordruginteracts_withgene
\n", 1614 | "
" 1615 | ], 1616 | "text/plain": [ 1617 | " source_node target_node source_type edge_type \\\n", 1618 | "0 Caffeine adenosine A3 receptor drug interacts_with \n", 1619 | "1 Adenosine adenosine A3 receptor drug interacts_with \n", 1620 | "2 Theophylline adenosine A3 receptor drug interacts_with \n", 1621 | "3 Nicardipine adenosine A3 receptor drug interacts_with \n", 1622 | "4 Istradefylline adenosine A3 receptor drug interacts_with \n", 1623 | "\n", 1624 | " target_type \n", 1625 | "0 gene \n", 1626 | "1 gene \n", 1627 | "2 gene \n", 1628 | "3 gene \n", 1629 | "4 gene " 1630 | ] 1631 | }, 1632 | "execution_count": 34, 1633 | "metadata": {}, 1634 | "output_type": "execute_result" 1635 | } 1636 | ], 1637 | "source": [ 1638 | "drug_gene_links.head()" 1639 | ] 1640 | }, 1641 | { 1642 | "cell_type": "markdown", 1643 | "metadata": {}, 1644 | "source": [ 1645 | "## Repeat for disease gene pairs" 1646 | ] 1647 | }, 1648 | { 1649 | "cell_type": "code", 1650 | "execution_count": 35, 1651 | "metadata": { 1652 | "collapsed": true 1653 | }, 1654 | "outputs": [], 1655 | "source": [ 1656 | "disease_gene_links = simple_table[[\"disease\", \"gene\"]]" 1657 | ] 1658 | }, 1659 | { 1660 | "cell_type": "code", 1661 | "execution_count": 36, 1662 | "metadata": { 1663 | "collapsed": false 1664 | }, 1665 | "outputs": [ 1666 | { 1667 | "data": { 1668 | "text/html": [ 1669 | "
\n", 1670 | "\n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | "
diseasegene
0obesityadenosine A3 receptor
1obesityadenosine A3 receptor
2obesityadenosine A3 receptor
3obesityadenosine A3 receptor
4obesityadenosine A3 receptor
\n", 1706 | "
" 1707 | ], 1708 | "text/plain": [ 1709 | " disease gene\n", 1710 | "0 obesity adenosine A3 receptor\n", 1711 | "1 obesity adenosine A3 receptor\n", 1712 | "2 obesity adenosine A3 receptor\n", 1713 | "3 obesity adenosine A3 receptor\n", 1714 | "4 obesity adenosine A3 receptor" 1715 | ] 1716 | }, 1717 | "execution_count": 36, 1718 | "metadata": {}, 1719 | "output_type": "execute_result" 1720 | } 1721 | ], 1722 | "source": [ 1723 | "disease_gene_links.head()" 1724 | ] 1725 | }, 1726 | { 1727 | "cell_type": "markdown", 1728 | "metadata": {}, 1729 | "source": [ 1730 | "## Rename the columns" 1731 | ] 1732 | }, 1733 | { 1734 | "cell_type": "code", 1735 | "execution_count": 37, 1736 | "metadata": { 1737 | "collapsed": true 1738 | }, 1739 | "outputs": [], 1740 | "source": [ 1741 | "disease_gene_links = disease_gene_links.rename(columns = {\"disease\": \"source_node\", \"gene\": \"target_node\"})" 1742 | ] 1743 | }, 1744 | { 1745 | "cell_type": "code", 1746 | "execution_count": 38, 1747 | "metadata": { 1748 | "collapsed": false 1749 | }, 1750 | "outputs": [ 1751 | { 1752 | "data": { 1753 | "text/html": [ 1754 | "
\n", 1755 | "\n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | "
source_nodetarget_node
0obesityadenosine A3 receptor
1obesityadenosine A3 receptor
2obesityadenosine A3 receptor
3obesityadenosine A3 receptor
4obesityadenosine A3 receptor
\n", 1791 | "
" 1792 | ], 1793 | "text/plain": [ 1794 | " source_node target_node\n", 1795 | "0 obesity adenosine A3 receptor\n", 1796 | "1 obesity adenosine A3 receptor\n", 1797 | "2 obesity adenosine A3 receptor\n", 1798 | "3 obesity adenosine A3 receptor\n", 1799 | "4 obesity adenosine A3 receptor" 1800 | ] 1801 | }, 1802 | "execution_count": 38, 1803 | "metadata": {}, 1804 | "output_type": "execute_result" 1805 | } 1806 | ], 1807 | "source": [ 1808 | "disease_gene_links.head()" 1809 | ] 1810 | }, 1811 | { 1812 | "cell_type": "markdown", 1813 | "metadata": {}, 1814 | "source": [ 1815 | "### Create the new columns" 1816 | ] 1817 | }, 1818 | { 1819 | "cell_type": "code", 1820 | "execution_count": 39, 1821 | "metadata": { 1822 | "collapsed": true 1823 | }, 1824 | "outputs": [], 1825 | "source": [ 1826 | "disease_gene_links[\"source_type\"] = \"disease\"\n", 1827 | "disease_gene_links[\"edge_type\"] = \"associated_with\"\n", 1828 | "disease_gene_links[\"target_type\"] = \"gene\"" 1829 | ] 1830 | }, 1831 | { 1832 | "cell_type": "code", 1833 | "execution_count": 40, 1834 | "metadata": { 1835 | "collapsed": false 1836 | }, 1837 | "outputs": [ 1838 | { 1839 | "data": { 1840 | "text/html": [ 1841 | "
\n", 1842 | "\n", 1843 | " \n", 1844 | " \n", 1845 | " \n", 1846 | " \n", 1847 | " \n", 1848 | " \n", 1849 | " \n", 1850 | " \n", 1851 | " \n", 1852 | " \n", 1853 | " \n", 1854 | " \n", 1855 | " \n", 1856 | " \n", 1857 | " \n", 1858 | " \n", 1859 | " \n", 1860 | " \n", 1861 | " \n", 1862 | " \n", 1863 | " \n", 1864 | " \n", 1865 | " \n", 1866 | " \n", 1867 | " \n", 1868 | " \n", 1869 | " \n", 1870 | " \n", 1871 | " \n", 1872 | " \n", 1873 | " \n", 1874 | " \n", 1875 | " \n", 1876 | " \n", 1877 | " \n", 1878 | " \n", 1879 | " \n", 1880 | " \n", 1881 | " \n", 1882 | " \n", 1883 | " \n", 1884 | " \n", 1885 | " \n", 1886 | " \n", 1887 | " \n", 1888 | " \n", 1889 | " \n", 1890 | " \n", 1891 | " \n", 1892 | " \n", 1893 | " \n", 1894 | " \n", 1895 | "
source_nodetarget_nodesource_typeedge_typetarget_type
0obesityadenosine A3 receptordiseaseassociated_withgene
1obesityadenosine A3 receptordiseaseassociated_withgene
2obesityadenosine A3 receptordiseaseassociated_withgene
3obesityadenosine A3 receptordiseaseassociated_withgene
4obesityadenosine A3 receptordiseaseassociated_withgene
\n", 1896 | "
" 1897 | ], 1898 | "text/plain": [ 1899 | " source_node target_node source_type edge_type target_type\n", 1900 | "0 obesity adenosine A3 receptor disease associated_with gene\n", 1901 | "1 obesity adenosine A3 receptor disease associated_with gene\n", 1902 | "2 obesity adenosine A3 receptor disease associated_with gene\n", 1903 | "3 obesity adenosine A3 receptor disease associated_with gene\n", 1904 | "4 obesity adenosine A3 receptor disease associated_with gene" 1905 | ] 1906 | }, 1907 | "execution_count": 40, 1908 | "metadata": {}, 1909 | "output_type": "execute_result" 1910 | } 1911 | ], 1912 | "source": [ 1913 | "disease_gene_links.head()" 1914 | ] 1915 | }, 1916 | { 1917 | "cell_type": "code", 1918 | "execution_count": 41, 1919 | "metadata": { 1920 | "collapsed": true 1921 | }, 1922 | "outputs": [], 1923 | "source": [ 1924 | "drug_disease_links = (simple_table\n", 1925 | " [[\"drug\", \"disease\"]]\n", 1926 | " .assign(\n", 1927 | " source_type = \"drug\",\n", 1928 | " edge_type = \"may treat\",\n", 1929 | " target_type = \"disease\"\n", 1930 | " )\n", 1931 | " .rename(columns = {\"drug\": \"source_node\", \"disease\": \"target_node\"})\n", 1932 | ")" 1933 | ] 1934 | }, 1935 | { 1936 | "cell_type": "markdown", 1937 | "metadata": {}, 1938 | "source": [ 1939 | "# Join the (disease, gene), (drug, gene), and (drug, disease) tables together" 1940 | ] 1941 | }, 1942 | { 1943 | "cell_type": "markdown", 1944 | "metadata": {}, 1945 | "source": [ 1946 | "### Number of rows of each table" 1947 | ] 1948 | }, 1949 | { 1950 | "cell_type": "code", 1951 | "execution_count": 42, 1952 | "metadata": { 1953 | "collapsed": false 1954 | }, 1955 | "outputs": [ 1956 | { 1957 | "data": { 1958 | "text/plain": [ 1959 | "595" 1960 | ] 1961 | }, 1962 | "execution_count": 42, 1963 | "metadata": {}, 1964 | "output_type": "execute_result" 1965 | } 1966 | ], 1967 | "source": [ 1968 | "len(drug_gene_links)" 1969 | ] 1970 | }, 1971 | { 1972 | "cell_type": "code", 1973 | "execution_count": 43, 1974 | "metadata": { 1975 | "collapsed": false 1976 | }, 1977 | "outputs": [ 1978 | { 1979 | "data": { 1980 | "text/plain": [ 1981 | "595" 1982 | ] 1983 | }, 1984 | "execution_count": 43, 1985 | "metadata": {}, 1986 | "output_type": "execute_result" 1987 | } 1988 | ], 1989 | "source": [ 1990 | "len(disease_gene_links)" 1991 | ] 1992 | }, 1993 | { 1994 | "cell_type": "code", 1995 | "execution_count": 44, 1996 | "metadata": { 1997 | "collapsed": false 1998 | }, 1999 | "outputs": [ 2000 | { 2001 | "data": { 2002 | "text/plain": [ 2003 | "595" 2004 | ] 2005 | }, 2006 | "execution_count": 44, 2007 | "metadata": {}, 2008 | "output_type": "execute_result" 2009 | } 2010 | ], 2011 | "source": [ 2012 | "len(drug_disease_links)" 2013 | ] 2014 | }, 2015 | { 2016 | "cell_type": "markdown", 2017 | "metadata": {}, 2018 | "source": [ 2019 | "### Join all three tables together" 2020 | ] 2021 | }, 2022 | { 2023 | "cell_type": "code", 2024 | "execution_count": 45, 2025 | "metadata": { 2026 | "collapsed": false 2027 | }, 2028 | "outputs": [], 2029 | "source": [ 2030 | "cytoscape_edges = pd.concat([drug_gene_links, disease_gene_links, drug_disease_links])" 2031 | ] 2032 | }, 2033 | { 2034 | "cell_type": "code", 2035 | "execution_count": 46, 2036 | "metadata": { 2037 | "collapsed": false 2038 | }, 2039 | "outputs": [ 2040 | { 2041 | "data": { 2042 | "text/html": [ 2043 | "
\n", 2044 | "\n", 2045 | " \n", 2046 | " \n", 2047 | " \n", 2048 | " \n", 2049 | " \n", 2050 | " \n", 2051 | " \n", 2052 | " \n", 2053 | " \n", 2054 | " \n", 2055 | " \n", 2056 | " \n", 2057 | " \n", 2058 | " \n", 2059 | " \n", 2060 | " \n", 2061 | " \n", 2062 | " \n", 2063 | " \n", 2064 | " \n", 2065 | " \n", 2066 | " \n", 2067 | " \n", 2068 | " \n", 2069 | " \n", 2070 | " \n", 2071 | " \n", 2072 | " \n", 2073 | " \n", 2074 | " \n", 2075 | " \n", 2076 | " \n", 2077 | " \n", 2078 | " \n", 2079 | " \n", 2080 | " \n", 2081 | " \n", 2082 | " \n", 2083 | " \n", 2084 | " \n", 2085 | " \n", 2086 | " \n", 2087 | " \n", 2088 | " \n", 2089 | " \n", 2090 | " \n", 2091 | " \n", 2092 | " \n", 2093 | " \n", 2094 | " \n", 2095 | " \n", 2096 | " \n", 2097 | "
edge_typesource_nodesource_typetarget_nodetarget_type
0interacts_withCaffeinedrugadenosine A3 receptorgene
1interacts_withAdenosinedrugadenosine A3 receptorgene
2interacts_withTheophyllinedrugadenosine A3 receptorgene
3interacts_withNicardipinedrugadenosine A3 receptorgene
4interacts_withIstradefyllinedrugadenosine A3 receptorgene
\n", 2098 | "
" 2099 | ], 2100 | "text/plain": [ 2101 | " edge_type source_node source_type target_node \\\n", 2102 | "0 interacts_with Caffeine drug adenosine A3 receptor \n", 2103 | "1 interacts_with Adenosine drug adenosine A3 receptor \n", 2104 | "2 interacts_with Theophylline drug adenosine A3 receptor \n", 2105 | "3 interacts_with Nicardipine drug adenosine A3 receptor \n", 2106 | "4 interacts_with Istradefylline drug adenosine A3 receptor \n", 2107 | "\n", 2108 | " target_type \n", 2109 | "0 gene \n", 2110 | "1 gene \n", 2111 | "2 gene \n", 2112 | "3 gene \n", 2113 | "4 gene " 2114 | ] 2115 | }, 2116 | "execution_count": 46, 2117 | "metadata": {}, 2118 | "output_type": "execute_result" 2119 | } 2120 | ], 2121 | "source": [ 2122 | "cytoscape_edges.head()" 2123 | ] 2124 | }, 2125 | { 2126 | "cell_type": "code", 2127 | "execution_count": 47, 2128 | "metadata": { 2129 | "collapsed": false 2130 | }, 2131 | "outputs": [ 2132 | { 2133 | "data": { 2134 | "text/plain": [ 2135 | "1785" 2136 | ] 2137 | }, 2138 | "execution_count": 47, 2139 | "metadata": {}, 2140 | "output_type": "execute_result" 2141 | } 2142 | ], 2143 | "source": [ 2144 | "len(cytoscape_edges)" 2145 | ] 2146 | }, 2147 | { 2148 | "cell_type": "markdown", 2149 | "metadata": {}, 2150 | "source": [ 2151 | "Note that the final result has 1785 (= 595 * 3) rows. (595 was the original number of results returned)" 2152 | ] 2153 | }, 2154 | { 2155 | "cell_type": "markdown", 2156 | "metadata": {}, 2157 | "source": [ 2158 | "### Reorder columns" 2159 | ] 2160 | }, 2161 | { 2162 | "cell_type": "code", 2163 | "execution_count": 48, 2164 | "metadata": { 2165 | "collapsed": true 2166 | }, 2167 | "outputs": [], 2168 | "source": [ 2169 | "cytoscape_edges = cytoscape_edges[[\"source_node\", \"source_type\", \"edge_type\", \"target_node\", \"target_type\"]]" 2170 | ] 2171 | }, 2172 | { 2173 | "cell_type": "code", 2174 | "execution_count": 49, 2175 | "metadata": { 2176 | "collapsed": false 2177 | }, 2178 | "outputs": [ 2179 | { 2180 | "data": { 2181 | "text/html": [ 2182 | "
\n", 2183 | "\n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2190 | " \n", 2191 | " \n", 2192 | " \n", 2193 | " \n", 2194 | " \n", 2195 | " \n", 2196 | " \n", 2197 | " \n", 2198 | " \n", 2199 | " \n", 2200 | " \n", 2201 | " \n", 2202 | " \n", 2203 | " \n", 2204 | " \n", 2205 | " \n", 2206 | " \n", 2207 | " \n", 2208 | " \n", 2209 | " \n", 2210 | " \n", 2211 | " \n", 2212 | " \n", 2213 | " \n", 2214 | " \n", 2215 | " \n", 2216 | " \n", 2217 | " \n", 2218 | " \n", 2219 | " \n", 2220 | " \n", 2221 | " \n", 2222 | " \n", 2223 | " \n", 2224 | " \n", 2225 | " \n", 2226 | " \n", 2227 | " \n", 2228 | " \n", 2229 | " \n", 2230 | " \n", 2231 | " \n", 2232 | " \n", 2233 | " \n", 2234 | " \n", 2235 | " \n", 2236 | "
source_nodesource_typeedge_typetarget_nodetarget_type
0Caffeinedruginteracts_withadenosine A3 receptorgene
1Adenosinedruginteracts_withadenosine A3 receptorgene
2Theophyllinedruginteracts_withadenosine A3 receptorgene
3Nicardipinedruginteracts_withadenosine A3 receptorgene
4Istradefyllinedruginteracts_withadenosine A3 receptorgene
\n", 2237 | "
" 2238 | ], 2239 | "text/plain": [ 2240 | " source_node source_type edge_type target_node \\\n", 2241 | "0 Caffeine drug interacts_with adenosine A3 receptor \n", 2242 | "1 Adenosine drug interacts_with adenosine A3 receptor \n", 2243 | "2 Theophylline drug interacts_with adenosine A3 receptor \n", 2244 | "3 Nicardipine drug interacts_with adenosine A3 receptor \n", 2245 | "4 Istradefylline drug interacts_with adenosine A3 receptor \n", 2246 | "\n", 2247 | " target_type \n", 2248 | "0 gene \n", 2249 | "1 gene \n", 2250 | "2 gene \n", 2251 | "3 gene \n", 2252 | "4 gene " 2253 | ] 2254 | }, 2255 | "execution_count": 49, 2256 | "metadata": {}, 2257 | "output_type": "execute_result" 2258 | } 2259 | ], 2260 | "source": [ 2261 | "cytoscape_edges.head()" 2262 | ] 2263 | }, 2264 | { 2265 | "cell_type": "markdown", 2266 | "metadata": {}, 2267 | "source": [ 2268 | "## Save file to disk" 2269 | ] 2270 | }, 2271 | { 2272 | "cell_type": "code", 2273 | "execution_count": 50, 2274 | "metadata": { 2275 | "collapsed": true 2276 | }, 2277 | "outputs": [], 2278 | "source": [ 2279 | "cytoscape_edges.to_csv(\"drug_gene_disease_network.txt\", sep = '\\t', index = False, encoding = 'utf-8')" 2280 | ] 2281 | }, 2282 | { 2283 | "cell_type": "markdown", 2284 | "metadata": {}, 2285 | "source": [ 2286 | "#open a new network in cytoscape and load the file by: File..Import..Network\n", 2287 | "#use Cytoscape to develop a visualization \n", 2288 | "#submit the tab-delimited output files, an image of your network, and an explanation of what you did as your assignment\n", 2289 | "#Are there any notable hubs in the network?\n", 2290 | "#Could you extend the code to identify them automatically?\n", 2291 | "#can you adapt the code to search for a specific drug or a specific disease?" 2292 | ] 2293 | } 2294 | ], 2295 | "metadata": { 2296 | "kernelspec": { 2297 | "display_name": "Python 3", 2298 | "language": "python", 2299 | "name": "python3" 2300 | }, 2301 | "language_info": { 2302 | "codemirror_mode": { 2303 | "name": "ipython", 2304 | "version": 3 2305 | }, 2306 | "file_extension": ".py", 2307 | "mimetype": "text/x-python", 2308 | "name": "python", 2309 | "nbconvert_exporter": "python", 2310 | "pygments_lexer": "ipython3", 2311 | "version": "3.4.3" 2312 | } 2313 | }, 2314 | "nbformat": 4, 2315 | "nbformat_minor": 0 2316 | } -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | backports-abc==0.4 2 | backports.shutil-get-terminal-size==1.0.0 3 | decorator==4.0.9 4 | entrypoints==0.2.2 5 | ipykernel==4.3.1 6 | ipython==4.2.0 7 | ipython-genutils==0.1.0 8 | ipywidgets==5.1.5 9 | isodate==0.5.4 10 | Jinja2==2.8 11 | jsonschema==2.5.1 12 | jupyter==1.0.0 13 | jupyter-client==4.2.2 14 | jupyter-console==4.1.1 15 | jupyter-core==4.1.0 16 | keepalive==0.5 17 | MarkupSafe==0.23 18 | mistune==0.8.1 19 | nbconvert==4.2.0 20 | nbformat==4.0.1 21 | notebook==5.7.8 22 | numpy==1.11.0 23 | pandas==0.18.1 24 | pexpect==4.1.0 25 | pickleshare==0.7.2 26 | ptyprocess==0.5.1 27 | Pygments==2.1.3 28 | pyparsing==2.1.4 29 | python-dateutil==2.5.3 30 | pytz==2016.4 31 | pyzmq==15.2.0 32 | qtconsole==4.2.1 33 | rdflib==4.2.1 34 | simplegeneric==0.8.1 35 | six==1.10.0 36 | SPARQLWrapper==1.7.6 37 | terminado==0.6 38 | tornado==4.3 39 | traitlets==4.2.1 40 | widgetsnbextension==1.2.3 41 | --------------------------------------------------------------------------------