├── .gitignore
├── README.md
├── SPARQL_pandas.ipynb
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | venv/
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial: accessing SPARQL endpoints in Python with Pandas
2 |
3 | ## Installation
4 |
5 | 1. Clone the repository.
6 | 2. Set up a virtual environment: `pyvenv venv`
7 | 3. Load the virtual environment: `source venv/bin/activate`
8 | 4. Install necessary dependencies: `pip install -r requirements.txt`
9 | 5. Start a Jupyter notebook server: `jupyter notebook`
10 | 6. Run the notebook.
11 |
--------------------------------------------------------------------------------
/SPARQL_pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Querying a SPARQL endpoint (knowledge graph) in python\n",
8 | "The code below contains everything needed to execute a SPARQL query, manipulate the data it returns, and store the data in a file suitable for rendering with cytoscape. Your goal is to get this to work locally and attempt to extend it by, for example, altering the query, sorting the results differently, or working with the output in cytoscape to build an informative visualization. \n",
9 | "\n",
10 | "## Assignment\n",
11 | "Hand in your version of this notebook, a short report describing what you attempted to do with it, and a visualization of your results (e.g. as an image exported from cytoscape). \n",
12 | "\n",
13 | "## Necessary imports\n",
14 | "\n",
15 | "Note you will need to install (e.g. pip install) pandas (which should have come with Anaconda) and SPARQLWrapper on your machine before this will work"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {
22 | "collapsed": true
23 | },
24 | "outputs": [],
25 | "source": [
26 | "import pandas as pd\n",
27 | "\n",
28 | "from pandas.io.json import json_normalize\n",
29 | "from SPARQLWrapper import SPARQLWrapper, JSON"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Create query function\n",
37 | "\n",
38 | "The function below takes a SPARQL query string, queries a sparql service, and returns the result as a pandas DataFrame (a table)."
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 2,
44 | "metadata": {
45 | "collapsed": true
46 | },
47 | "outputs": [],
48 | "source": [
49 | "def query_wikidata(sparql_query, sparql_service_url):\n",
50 | " \"\"\"\n",
51 | " Query the endpoint with the given query string and return the results as a pandas Dataframe.\n",
52 | " \"\"\"\n",
53 | " # create the connection to the endpoint\n",
54 | " # Wikidata enforces now a strict User-Agent policy, we need to specify the agent\n",
55 | " # See here https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2019/07#problems_with_query_API\n",
56 | " # https://meta.wikimedia.org/wiki/User-Agent_policy\n",
57 | " sparql = SPARQLWrapper(sparql_service_url, agent=\"Sparql Wrapper on Jupyter example\") \n",
58 | " \n",
59 | " sparql.setQuery(sparql_query)\n",
60 | " sparql.setReturnFormat(JSON)\n",
61 | "\n",
62 | " # ask for the result\n",
63 | " result = sparql.query().convert()\n",
64 | " return json_normalize(result[\"results\"][\"bindings\"])"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "## Run our SPARQL query\n",
72 | "The example here is the drug -> interacts with protein -> encoded by -> gene -> gwas -> disease query pattern. \n",
73 | "Test it here http://tinyurl.com/hwm9388 and explore the results to see you you might adapt it. Other example queries for the wikidata service: http://tinyurl.com/j222k6g , http://tinyurl.com/gpfr9kj "
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 3,
79 | "metadata": {
80 | "collapsed": false
81 | },
82 | "outputs": [],
83 | "source": [
84 | "sparql_query = \"\"\"SELECT ?drug ?drugLabel ?gene ?geneLabel ?entrez_id ?disease ?diseaseLabel WHERE {\n",
85 | " ?drug wdt:P129 ?gene_product . # drug interacts with a gene_product \n",
86 | " ?gene_product wdt:P702 ?gene . # gene_product is encoded by a gene\n",
87 | " ?gene wdt:P2293 ?disease . # gene is genetically associated with a disease \n",
88 | " ?gene wdt:P351 ?entrez_id . # get the entrez gene id for the gene \n",
89 | " # add labels\n",
90 | " SERVICE wikibase:label {\n",
91 | " bd:serviceParam wikibase:language \"en\" .\n",
92 | " }\n",
93 | " }\n",
94 | " limit 1000\n",
95 | " \"\"\"\n",
96 | "#to query another endpoint, change the URL for the service and the query\n",
97 | "sparql_service_url = \"https://query.wikidata.org/sparql\"\n",
98 | "result_table = query_wikidata(sparql_query, sparql_service_url)"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "## From here on we are using the Python Data Analysis Library \"Pandas\"\n",
106 | "Look for an introduction like this http://synesthesiam.com/posts/an-introduction-to-pandas.html if you get stuck.."
107 | ]
108 | },
109 | {
110 | "cell_type": "markdown",
111 | "metadata": {},
112 | "source": [
113 | "Now look at the results of our SPARQL query. \"shape\" shows the dimensions (n rows by n cols). head() shows the column headers (which will be important later) and a sample of the first few rows of data"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 4,
119 | "metadata": {
120 | "collapsed": false
121 | },
122 | "outputs": [
123 | {
124 | "data": {
125 | "text/plain": [
126 | "(595, 17)"
127 | ]
128 | },
129 | "execution_count": 4,
130 | "metadata": {},
131 | "output_type": "execute_result"
132 | }
133 | ],
134 | "source": [
135 | "result_table.shape"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 5,
141 | "metadata": {
142 | "collapsed": false,
143 | "scrolled": false
144 | },
145 | "outputs": [
146 | {
147 | "data": {
148 | "text/html": [
149 | "
\n",
150 | "
\n",
151 | " \n",
152 | " \n",
153 | " | \n",
154 | " disease.type | \n",
155 | " disease.value | \n",
156 | " diseaseLabel.type | \n",
157 | " diseaseLabel.value | \n",
158 | " diseaseLabel.xml:lang | \n",
159 | " drug.type | \n",
160 | " drug.value | \n",
161 | " drugLabel.type | \n",
162 | " drugLabel.value | \n",
163 | " drugLabel.xml:lang | \n",
164 | " entrez_id.type | \n",
165 | " entrez_id.value | \n",
166 | " gene.type | \n",
167 | " gene.value | \n",
168 | " geneLabel.type | \n",
169 | " geneLabel.value | \n",
170 | " geneLabel.xml:lang | \n",
171 | "
\n",
172 | " \n",
173 | " \n",
174 | " \n",
175 | " 0 | \n",
176 | " uri | \n",
177 | " http://www.wikidata.org/entity/Q12174 | \n",
178 | " literal | \n",
179 | " obesity | \n",
180 | " en | \n",
181 | " uri | \n",
182 | " http://www.wikidata.org/entity/Q60235 | \n",
183 | " literal | \n",
184 | " Caffeine | \n",
185 | " en | \n",
186 | " literal | \n",
187 | " 140 | \n",
188 | " uri | \n",
189 | " http://www.wikidata.org/entity/Q4682275 | \n",
190 | " literal | \n",
191 | " adenosine A3 receptor | \n",
192 | " en | \n",
193 | "
\n",
194 | " \n",
195 | " 1 | \n",
196 | " uri | \n",
197 | " http://www.wikidata.org/entity/Q12174 | \n",
198 | " literal | \n",
199 | " obesity | \n",
200 | " en | \n",
201 | " uri | \n",
202 | " http://www.wikidata.org/entity/Q190012 | \n",
203 | " literal | \n",
204 | " Adenosine | \n",
205 | " en | \n",
206 | " literal | \n",
207 | " 140 | \n",
208 | " uri | \n",
209 | " http://www.wikidata.org/entity/Q4682275 | \n",
210 | " literal | \n",
211 | " adenosine A3 receptor | \n",
212 | " en | \n",
213 | "
\n",
214 | " \n",
215 | " 2 | \n",
216 | " uri | \n",
217 | " http://www.wikidata.org/entity/Q12174 | \n",
218 | " literal | \n",
219 | " obesity | \n",
220 | " en | \n",
221 | " uri | \n",
222 | " http://www.wikidata.org/entity/Q407308 | \n",
223 | " literal | \n",
224 | " Theophylline | \n",
225 | " en | \n",
226 | " literal | \n",
227 | " 140 | \n",
228 | " uri | \n",
229 | " http://www.wikidata.org/entity/Q4682275 | \n",
230 | " literal | \n",
231 | " adenosine A3 receptor | \n",
232 | " en | \n",
233 | "
\n",
234 | " \n",
235 | " 3 | \n",
236 | " uri | \n",
237 | " http://www.wikidata.org/entity/Q12174 | \n",
238 | " literal | \n",
239 | " obesity | \n",
240 | " en | \n",
241 | " uri | \n",
242 | " http://www.wikidata.org/entity/Q729213 | \n",
243 | " literal | \n",
244 | " Nicardipine | \n",
245 | " en | \n",
246 | " literal | \n",
247 | " 140 | \n",
248 | " uri | \n",
249 | " http://www.wikidata.org/entity/Q4682275 | \n",
250 | " literal | \n",
251 | " adenosine A3 receptor | \n",
252 | " en | \n",
253 | "
\n",
254 | " \n",
255 | " 4 | \n",
256 | " uri | \n",
257 | " http://www.wikidata.org/entity/Q12174 | \n",
258 | " literal | \n",
259 | " obesity | \n",
260 | " en | \n",
261 | " uri | \n",
262 | " http://www.wikidata.org/entity/Q905783 | \n",
263 | " literal | \n",
264 | " Istradefylline | \n",
265 | " en | \n",
266 | " literal | \n",
267 | " 140 | \n",
268 | " uri | \n",
269 | " http://www.wikidata.org/entity/Q4682275 | \n",
270 | " literal | \n",
271 | " adenosine A3 receptor | \n",
272 | " en | \n",
273 | "
\n",
274 | " \n",
275 | "
\n",
276 | "
"
277 | ],
278 | "text/plain": [
279 | " disease.type disease.value diseaseLabel.type \\\n",
280 | "0 uri http://www.wikidata.org/entity/Q12174 literal \n",
281 | "1 uri http://www.wikidata.org/entity/Q12174 literal \n",
282 | "2 uri http://www.wikidata.org/entity/Q12174 literal \n",
283 | "3 uri http://www.wikidata.org/entity/Q12174 literal \n",
284 | "4 uri http://www.wikidata.org/entity/Q12174 literal \n",
285 | "\n",
286 | " diseaseLabel.value diseaseLabel.xml:lang drug.type \\\n",
287 | "0 obesity en uri \n",
288 | "1 obesity en uri \n",
289 | "2 obesity en uri \n",
290 | "3 obesity en uri \n",
291 | "4 obesity en uri \n",
292 | "\n",
293 | " drug.value drugLabel.type drugLabel.value \\\n",
294 | "0 http://www.wikidata.org/entity/Q60235 literal Caffeine \n",
295 | "1 http://www.wikidata.org/entity/Q190012 literal Adenosine \n",
296 | "2 http://www.wikidata.org/entity/Q407308 literal Theophylline \n",
297 | "3 http://www.wikidata.org/entity/Q729213 literal Nicardipine \n",
298 | "4 http://www.wikidata.org/entity/Q905783 literal Istradefylline \n",
299 | "\n",
300 | " drugLabel.xml:lang entrez_id.type entrez_id.value gene.type \\\n",
301 | "0 en literal 140 uri \n",
302 | "1 en literal 140 uri \n",
303 | "2 en literal 140 uri \n",
304 | "3 en literal 140 uri \n",
305 | "4 en literal 140 uri \n",
306 | "\n",
307 | " gene.value geneLabel.type \\\n",
308 | "0 http://www.wikidata.org/entity/Q4682275 literal \n",
309 | "1 http://www.wikidata.org/entity/Q4682275 literal \n",
310 | "2 http://www.wikidata.org/entity/Q4682275 literal \n",
311 | "3 http://www.wikidata.org/entity/Q4682275 literal \n",
312 | "4 http://www.wikidata.org/entity/Q4682275 literal \n",
313 | "\n",
314 | " geneLabel.value geneLabel.xml:lang \n",
315 | "0 adenosine A3 receptor en \n",
316 | "1 adenosine A3 receptor en \n",
317 | "2 adenosine A3 receptor en \n",
318 | "3 adenosine A3 receptor en \n",
319 | "4 adenosine A3 receptor en "
320 | ]
321 | },
322 | "execution_count": 5,
323 | "metadata": {},
324 | "output_type": "execute_result"
325 | }
326 | ],
327 | "source": [
328 | "result_table.head()"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "Compare the column names to the first line of the SPARQL query:\n",
336 | "\n",
337 | "SELECT ?drug ?drugLabel ?gene ?geneLabel ?entrez_id ?disease ?diseaseLabel\n",
338 | "\n",
339 | "Each of the SELECTed items (e.g. ?drug) results in 2 columns: one for its data type (either uri or literal) and one for its value (the thing you are probably looking for). String literal values for labels have an additional column indicating which language they are in. \n",
340 | "\n",
341 | "For the purposes of this exercise, we can simplify this to focus only on the string names of the drug, disease and gene in each row in the results. As follows."
342 | ]
343 | },
344 | {
345 | "cell_type": "markdown",
346 | "metadata": {},
347 | "source": [
348 | "## Extract only the columns we care about"
349 | ]
350 | },
351 | {
352 | "cell_type": "code",
353 | "execution_count": 6,
354 | "metadata": {
355 | "collapsed": true
356 | },
357 | "outputs": [],
358 | "source": [
359 | "simple_table = result_table[[\"drugLabel.value\", \"diseaseLabel.value\", \"geneLabel.value\"]]"
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": 7,
365 | "metadata": {
366 | "collapsed": false
367 | },
368 | "outputs": [
369 | {
370 | "data": {
371 | "text/html": [
372 | "\n",
373 | "
\n",
374 | " \n",
375 | " \n",
376 | " | \n",
377 | " drugLabel.value | \n",
378 | " diseaseLabel.value | \n",
379 | " geneLabel.value | \n",
380 | "
\n",
381 | " \n",
382 | " \n",
383 | " \n",
384 | " 0 | \n",
385 | " Caffeine | \n",
386 | " obesity | \n",
387 | " adenosine A3 receptor | \n",
388 | "
\n",
389 | " \n",
390 | " 1 | \n",
391 | " Adenosine | \n",
392 | " obesity | \n",
393 | " adenosine A3 receptor | \n",
394 | "
\n",
395 | " \n",
396 | " 2 | \n",
397 | " Theophylline | \n",
398 | " obesity | \n",
399 | " adenosine A3 receptor | \n",
400 | "
\n",
401 | " \n",
402 | " 3 | \n",
403 | " Nicardipine | \n",
404 | " obesity | \n",
405 | " adenosine A3 receptor | \n",
406 | "
\n",
407 | " \n",
408 | " 4 | \n",
409 | " Istradefylline | \n",
410 | " obesity | \n",
411 | " adenosine A3 receptor | \n",
412 | "
\n",
413 | " \n",
414 | "
\n",
415 | "
"
416 | ],
417 | "text/plain": [
418 | " drugLabel.value diseaseLabel.value geneLabel.value\n",
419 | "0 Caffeine obesity adenosine A3 receptor\n",
420 | "1 Adenosine obesity adenosine A3 receptor\n",
421 | "2 Theophylline obesity adenosine A3 receptor\n",
422 | "3 Nicardipine obesity adenosine A3 receptor\n",
423 | "4 Istradefylline obesity adenosine A3 receptor"
424 | ]
425 | },
426 | "execution_count": 7,
427 | "metadata": {},
428 | "output_type": "execute_result"
429 | }
430 | ],
431 | "source": [
432 | "simple_table.head()"
433 | ]
434 | },
435 | {
436 | "cell_type": "markdown",
437 | "metadata": {},
438 | "source": [
439 | "### Rename the columns of our simple table\n",
440 | "\n",
441 | "Let's delete the \"Label.value\" portion from the column names."
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": 8,
447 | "metadata": {
448 | "collapsed": false,
449 | "scrolled": true
450 | },
451 | "outputs": [],
452 | "source": [
453 | "simple_table = simple_table.rename(columns = lambda col: col.replace(\"Label.value\", \"\"))"
454 | ]
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": 9,
459 | "metadata": {
460 | "collapsed": false
461 | },
462 | "outputs": [
463 | {
464 | "data": {
465 | "text/html": [
466 | "\n",
467 | "
\n",
468 | " \n",
469 | " \n",
470 | " | \n",
471 | " drug | \n",
472 | " disease | \n",
473 | " gene | \n",
474 | "
\n",
475 | " \n",
476 | " \n",
477 | " \n",
478 | " 0 | \n",
479 | " Caffeine | \n",
480 | " obesity | \n",
481 | " adenosine A3 receptor | \n",
482 | "
\n",
483 | " \n",
484 | " 1 | \n",
485 | " Adenosine | \n",
486 | " obesity | \n",
487 | " adenosine A3 receptor | \n",
488 | "
\n",
489 | " \n",
490 | " 2 | \n",
491 | " Theophylline | \n",
492 | " obesity | \n",
493 | " adenosine A3 receptor | \n",
494 | "
\n",
495 | " \n",
496 | " 3 | \n",
497 | " Nicardipine | \n",
498 | " obesity | \n",
499 | " adenosine A3 receptor | \n",
500 | "
\n",
501 | " \n",
502 | " 4 | \n",
503 | " Istradefylline | \n",
504 | " obesity | \n",
505 | " adenosine A3 receptor | \n",
506 | "
\n",
507 | " \n",
508 | "
\n",
509 | "
"
510 | ],
511 | "text/plain": [
512 | " drug disease gene\n",
513 | "0 Caffeine obesity adenosine A3 receptor\n",
514 | "1 Adenosine obesity adenosine A3 receptor\n",
515 | "2 Theophylline obesity adenosine A3 receptor\n",
516 | "3 Nicardipine obesity adenosine A3 receptor\n",
517 | "4 Istradefylline obesity adenosine A3 receptor"
518 | ]
519 | },
520 | "execution_count": 9,
521 | "metadata": {},
522 | "output_type": "execute_result"
523 | }
524 | ],
525 | "source": [
526 | "simple_table.head()"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "metadata": {},
532 | "source": [
533 | "We have just grabbed the three columns we care about and renamed them."
534 | ]
535 | },
536 | {
537 | "cell_type": "markdown",
538 | "metadata": {},
539 | "source": [
540 | "## Count the number of genes per (drug, disease) pair\n",
541 | "\n",
542 | "How many genes link each unique (drug, disease) pair? After counting, order the (drug, disease) pairs in descending order of number of linking genes. This is one simple method for finding stronger connections between A and C concepts - based on the number of shared B concepts. "
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "metadata": {},
548 | "source": [
549 | "### Make unique (drug, disease) groups and count the number of genes for this group"
550 | ]
551 | },
552 | {
553 | "cell_type": "code",
554 | "execution_count": 10,
555 | "metadata": {
556 | "collapsed": true
557 | },
558 | "outputs": [],
559 | "source": [
560 | "counts = simple_table.groupby([\"drug\", \"disease\"]).size()"
561 | ]
562 | },
563 | {
564 | "cell_type": "code",
565 | "execution_count": 11,
566 | "metadata": {
567 | "collapsed": false
568 | },
569 | "outputs": [
570 | {
571 | "data": {
572 | "text/plain": [
573 | "drug disease \n",
574 | "(2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor 1\n",
575 | " schizophrenia 1\n",
576 | "1,9-Pyrazoloanthrone peripheral artery disease 1\n",
577 | "2-Aminoethoxydiphenyl borate obesity 1\n",
578 | "2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis 1\n",
579 | "dtype: int64"
580 | ]
581 | },
582 | "execution_count": 11,
583 | "metadata": {},
584 | "output_type": "execute_result"
585 | }
586 | ],
587 | "source": [
588 | "counts.head()"
589 | ]
590 | },
591 | {
592 | "cell_type": "markdown",
593 | "metadata": {},
594 | "source": [
595 | "### Turn result into a Pandas data frame (fancy smart table)"
596 | ]
597 | },
598 | {
599 | "cell_type": "code",
600 | "execution_count": 12,
601 | "metadata": {
602 | "collapsed": true
603 | },
604 | "outputs": [],
605 | "source": [
606 | "counts = counts.to_frame(\"gene_count\")"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": 13,
612 | "metadata": {
613 | "collapsed": false
614 | },
615 | "outputs": [
616 | {
617 | "data": {
618 | "text/html": [
619 | "\n",
620 | "
\n",
621 | " \n",
622 | " \n",
623 | " | \n",
624 | " | \n",
625 | " gene_count | \n",
626 | "
\n",
627 | " \n",
628 | " drug | \n",
629 | " disease | \n",
630 | " | \n",
631 | "
\n",
632 | " \n",
633 | " \n",
634 | " \n",
635 | " (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid | \n",
636 | " essential tremor | \n",
637 | " 1 | \n",
638 | "
\n",
639 | " \n",
640 | " schizophrenia | \n",
641 | " 1 | \n",
642 | "
\n",
643 | " \n",
644 | " 1,9-Pyrazoloanthrone | \n",
645 | " peripheral artery disease | \n",
646 | " 1 | \n",
647 | "
\n",
648 | " \n",
649 | " 2-Aminoethoxydiphenyl borate | \n",
650 | " obesity | \n",
651 | " 1 | \n",
652 | "
\n",
653 | " \n",
654 | " 2-methyl-6-(phenylethynyl)pyridine | \n",
655 | " Rheumatoid Arthritis | \n",
656 | " 1 | \n",
657 | "
\n",
658 | " \n",
659 | "
\n",
660 | "
"
661 | ],
662 | "text/plain": [
663 | " gene_count\n",
664 | "drug disease \n",
665 | "(2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor 1\n",
666 | " schizophrenia 1\n",
667 | "1,9-Pyrazoloanthrone peripheral artery disease 1\n",
668 | "2-Aminoethoxydiphenyl borate obesity 1\n",
669 | "2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis 1"
670 | ]
671 | },
672 | "execution_count": 13,
673 | "metadata": {},
674 | "output_type": "execute_result"
675 | }
676 | ],
677 | "source": [
678 | "counts.head()"
679 | ]
680 | },
681 | {
682 | "cell_type": "markdown",
683 | "metadata": {},
684 | "source": [
685 | "### Put the drug and disease labels back into columns"
686 | ]
687 | },
688 | {
689 | "cell_type": "code",
690 | "execution_count": 14,
691 | "metadata": {
692 | "collapsed": true
693 | },
694 | "outputs": [],
695 | "source": [
696 | "counts = counts.reset_index()"
697 | ]
698 | },
699 | {
700 | "cell_type": "code",
701 | "execution_count": 15,
702 | "metadata": {
703 | "collapsed": false
704 | },
705 | "outputs": [
706 | {
707 | "data": {
708 | "text/html": [
709 | "\n",
710 | "
\n",
711 | " \n",
712 | " \n",
713 | " | \n",
714 | " drug | \n",
715 | " disease | \n",
716 | " gene_count | \n",
717 | "
\n",
718 | " \n",
719 | " \n",
720 | " \n",
721 | " 0 | \n",
722 | " (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid | \n",
723 | " essential tremor | \n",
724 | " 1 | \n",
725 | "
\n",
726 | " \n",
727 | " 1 | \n",
728 | " (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid | \n",
729 | " schizophrenia | \n",
730 | " 1 | \n",
731 | "
\n",
732 | " \n",
733 | " 2 | \n",
734 | " 1,9-Pyrazoloanthrone | \n",
735 | " peripheral artery disease | \n",
736 | " 1 | \n",
737 | "
\n",
738 | " \n",
739 | " 3 | \n",
740 | " 2-Aminoethoxydiphenyl borate | \n",
741 | " obesity | \n",
742 | " 1 | \n",
743 | "
\n",
744 | " \n",
745 | " 4 | \n",
746 | " 2-methyl-6-(phenylethynyl)pyridine | \n",
747 | " Rheumatoid Arthritis | \n",
748 | " 1 | \n",
749 | "
\n",
750 | " \n",
751 | "
\n",
752 | "
"
753 | ],
754 | "text/plain": [
755 | " drug disease \\\n",
756 | "0 (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid essential tremor \n",
757 | "1 (2S,3S)-2-amino-3-phenylmethoxybutanedioic acid schizophrenia \n",
758 | "2 1,9-Pyrazoloanthrone peripheral artery disease \n",
759 | "3 2-Aminoethoxydiphenyl borate obesity \n",
760 | "4 2-methyl-6-(phenylethynyl)pyridine Rheumatoid Arthritis \n",
761 | "\n",
762 | " gene_count \n",
763 | "0 1 \n",
764 | "1 1 \n",
765 | "2 1 \n",
766 | "3 1 \n",
767 | "4 1 "
768 | ]
769 | },
770 | "execution_count": 15,
771 | "metadata": {},
772 | "output_type": "execute_result"
773 | }
774 | ],
775 | "source": [
776 | "counts.head()"
777 | ]
778 | },
779 | {
780 | "cell_type": "markdown",
781 | "metadata": {},
782 | "source": [
783 | "### Sort the table in descending order of genes"
784 | ]
785 | },
786 | {
787 | "cell_type": "code",
788 | "execution_count": 16,
789 | "metadata": {
790 | "collapsed": true
791 | },
792 | "outputs": [],
793 | "source": [
794 | "counts = counts.sort_values(\"gene_count\", ascending = False)"
795 | ]
796 | },
797 | {
798 | "cell_type": "code",
799 | "execution_count": 17,
800 | "metadata": {
801 | "collapsed": false
802 | },
803 | "outputs": [
804 | {
805 | "data": {
806 | "text/html": [
807 | "\n",
808 | "
\n",
809 | " \n",
810 | " \n",
811 | " | \n",
812 | " drug | \n",
813 | " disease | \n",
814 | " gene_count | \n",
815 | "
\n",
816 | " \n",
817 | " \n",
818 | " \n",
819 | " 127 | \n",
820 | " Caffeine | \n",
821 | " obesity | \n",
822 | " 3 | \n",
823 | "
\n",
824 | " \n",
825 | " 261 | \n",
826 | " Givinostat | \n",
827 | " obesity | \n",
828 | " 2 | \n",
829 | "
\n",
830 | " \n",
831 | " 529 | \n",
832 | " Trichostatin A | \n",
833 | " obesity | \n",
834 | " 2 | \n",
835 | "
\n",
836 | " \n",
837 | " 319 | \n",
838 | " Linoleic acid | \n",
839 | " diabetes mellitus type 2 | \n",
840 | " 2 | \n",
841 | "
\n",
842 | " \n",
843 | " 393 | \n",
844 | " Panobinostat | \n",
845 | " obesity | \n",
846 | " 2 | \n",
847 | "
\n",
848 | " \n",
849 | "
\n",
850 | "
"
851 | ],
852 | "text/plain": [
853 | " drug disease gene_count\n",
854 | "127 Caffeine obesity 3\n",
855 | "261 Givinostat obesity 2\n",
856 | "529 Trichostatin A obesity 2\n",
857 | "319 Linoleic acid diabetes mellitus type 2 2\n",
858 | "393 Panobinostat obesity 2"
859 | ]
860 | },
861 | "execution_count": 17,
862 | "metadata": {},
863 | "output_type": "execute_result"
864 | }
865 | ],
866 | "source": [
867 | "counts.head()"
868 | ]
869 | },
870 | {
871 | "cell_type": "markdown",
872 | "metadata": {},
873 | "source": [
874 | "### Number the results from 0 onwards"
875 | ]
876 | },
877 | {
878 | "cell_type": "code",
879 | "execution_count": 18,
880 | "metadata": {
881 | "collapsed": true
882 | },
883 | "outputs": [],
884 | "source": [
885 | "counts = counts.reset_index(drop = True)"
886 | ]
887 | },
888 | {
889 | "cell_type": "code",
890 | "execution_count": 19,
891 | "metadata": {
892 | "collapsed": false
893 | },
894 | "outputs": [
895 | {
896 | "data": {
897 | "text/html": [
898 | "\n",
899 | "
\n",
900 | " \n",
901 | " \n",
902 | " | \n",
903 | " drug | \n",
904 | " disease | \n",
905 | " gene_count | \n",
906 | "
\n",
907 | " \n",
908 | " \n",
909 | " \n",
910 | " 0 | \n",
911 | " Caffeine | \n",
912 | " obesity | \n",
913 | " 3 | \n",
914 | "
\n",
915 | " \n",
916 | " 1 | \n",
917 | " Givinostat | \n",
918 | " obesity | \n",
919 | " 2 | \n",
920 | "
\n",
921 | " \n",
922 | " 2 | \n",
923 | " Trichostatin A | \n",
924 | " obesity | \n",
925 | " 2 | \n",
926 | "
\n",
927 | " \n",
928 | " 3 | \n",
929 | " Linoleic acid | \n",
930 | " diabetes mellitus type 2 | \n",
931 | " 2 | \n",
932 | "
\n",
933 | " \n",
934 | " 4 | \n",
935 | " Panobinostat | \n",
936 | " obesity | \n",
937 | " 2 | \n",
938 | "
\n",
939 | " \n",
940 | "
\n",
941 | "
"
942 | ],
943 | "text/plain": [
944 | " drug disease gene_count\n",
945 | "0 Caffeine obesity 3\n",
946 | "1 Givinostat obesity 2\n",
947 | "2 Trichostatin A obesity 2\n",
948 | "3 Linoleic acid diabetes mellitus type 2 2\n",
949 | "4 Panobinostat obesity 2"
950 | ]
951 | },
952 | "execution_count": 19,
953 | "metadata": {},
954 | "output_type": "execute_result"
955 | }
956 | ],
957 | "source": [
958 | "counts.head()"
959 | ]
960 | },
961 | {
962 | "cell_type": "markdown",
963 | "metadata": {},
964 | "source": [
965 | "---"
966 | ]
967 | },
968 | {
969 | "cell_type": "markdown",
970 | "metadata": {},
971 | "source": [
972 | "## Add the genes used to link each (drug, disease) pair\n",
973 | "\n",
974 | "Now add another column containing the actual genes linking each (drug, disease) pair."
975 | ]
976 | },
977 | {
978 | "cell_type": "markdown",
979 | "metadata": {},
980 | "source": [
981 | "### Create a dictionary containing the linking genes for each unique pair"
982 | ]
983 | },
984 | {
985 | "cell_type": "code",
986 | "execution_count": 20,
987 | "metadata": {
988 | "collapsed": true
989 | },
990 | "outputs": [],
991 | "source": [
992 | "linking_genes = dict()\n",
993 | "for (drug, disease), small_table in simple_table.groupby([\"drug\", \"disease\"]):\n",
994 | " linking_genes[(drug, disease)] = list(small_table[\"gene\"])"
995 | ]
996 | },
997 | {
998 | "cell_type": "markdown",
999 | "metadata": {},
1000 | "source": [
1001 | "### Example: retrieve the linking genes for (caffeine, obesity)"
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "code",
1006 | "execution_count": 21,
1007 | "metadata": {
1008 | "collapsed": false
1009 | },
1010 | "outputs": [
1011 | {
1012 | "data": {
1013 | "text/plain": [
1014 | "['adenosine A3 receptor',\n",
1015 | " 'inositol 1,4,5-trisphosphate receptor, type 1',\n",
1016 | " 'RYR2']"
1017 | ]
1018 | },
1019 | "execution_count": 21,
1020 | "metadata": {},
1021 | "output_type": "execute_result"
1022 | }
1023 | ],
1024 | "source": [
1025 | "linking_genes[(\"Caffeine\", \"obesity\")]"
1026 | ]
1027 | },
1028 | {
1029 | "cell_type": "markdown",
1030 | "metadata": {},
1031 | "source": [
1032 | "### Make a new column containing the linking genes"
1033 | ]
1034 | },
1035 | {
1036 | "cell_type": "code",
1037 | "execution_count": 22,
1038 | "metadata": {
1039 | "collapsed": true
1040 | },
1041 | "outputs": [],
1042 | "source": [
1043 | "counts[\"genes\"] = counts[[\"drug\", \"disease\"]].apply(\n",
1044 | " lambda row: linking_genes[(row[\"drug\"], row[\"disease\"])],\n",
1045 | " axis = 1\n",
1046 | ")"
1047 | ]
1048 | },
1049 | {
1050 | "cell_type": "code",
1051 | "execution_count": 23,
1052 | "metadata": {
1053 | "collapsed": false
1054 | },
1055 | "outputs": [
1056 | {
1057 | "data": {
1058 | "text/html": [
1059 | "\n",
1060 | "
\n",
1061 | " \n",
1062 | " \n",
1063 | " | \n",
1064 | " drug | \n",
1065 | " disease | \n",
1066 | " gene_count | \n",
1067 | " genes | \n",
1068 | "
\n",
1069 | " \n",
1070 | " \n",
1071 | " \n",
1072 | " 0 | \n",
1073 | " Caffeine | \n",
1074 | " obesity | \n",
1075 | " 3 | \n",
1076 | " [adenosine A3 receptor, inositol 1,4,5-trispho... | \n",
1077 | "
\n",
1078 | " \n",
1079 | " 1 | \n",
1080 | " Givinostat | \n",
1081 | " obesity | \n",
1082 | " 2 | \n",
1083 | " [histone deacetylase 9, histone deacetylase 7] | \n",
1084 | "
\n",
1085 | " \n",
1086 | " 2 | \n",
1087 | " Trichostatin A | \n",
1088 | " obesity | \n",
1089 | " 2 | \n",
1090 | " [histone deacetylase 9, histone deacetylase 7] | \n",
1091 | "
\n",
1092 | " \n",
1093 | " 3 | \n",
1094 | " Linoleic acid | \n",
1095 | " diabetes mellitus type 2 | \n",
1096 | " 2 | \n",
1097 | " [hepatocyte nuclear factor 4, alpha, peroxisom... | \n",
1098 | "
\n",
1099 | " \n",
1100 | " 4 | \n",
1101 | " Panobinostat | \n",
1102 | " obesity | \n",
1103 | " 2 | \n",
1104 | " [histone deacetylase 9, histone deacetylase 7] | \n",
1105 | "
\n",
1106 | " \n",
1107 | "
\n",
1108 | "
"
1109 | ],
1110 | "text/plain": [
1111 | " drug disease gene_count \\\n",
1112 | "0 Caffeine obesity 3 \n",
1113 | "1 Givinostat obesity 2 \n",
1114 | "2 Trichostatin A obesity 2 \n",
1115 | "3 Linoleic acid diabetes mellitus type 2 2 \n",
1116 | "4 Panobinostat obesity 2 \n",
1117 | "\n",
1118 | " genes \n",
1119 | "0 [adenosine A3 receptor, inositol 1,4,5-trispho... \n",
1120 | "1 [histone deacetylase 9, histone deacetylase 7] \n",
1121 | "2 [histone deacetylase 9, histone deacetylase 7] \n",
1122 | "3 [hepatocyte nuclear factor 4, alpha, peroxisom... \n",
1123 | "4 [histone deacetylase 9, histone deacetylase 7] "
1124 | ]
1125 | },
1126 | "execution_count": 23,
1127 | "metadata": {},
1128 | "output_type": "execute_result"
1129 | }
1130 | ],
1131 | "source": [
1132 | "counts.head()"
1133 | ]
1134 | },
1135 | {
1136 | "cell_type": "markdown",
1137 | "metadata": {},
1138 | "source": [
1139 | "## Save to file"
1140 | ]
1141 | },
1142 | {
1143 | "cell_type": "code",
1144 | "execution_count": 24,
1145 | "metadata": {
1146 | "collapsed": true
1147 | },
1148 | "outputs": [],
1149 | "source": [
1150 | "counts.to_csv(\"drug_disease_count.tsv\", sep = '\\t', index = False, encoding = 'utf-8')"
1151 | ]
1152 | },
1153 | {
1154 | "cell_type": "markdown",
1155 | "metadata": {},
1156 | "source": [
1157 | "---"
1158 | ]
1159 | },
1160 | {
1161 | "cell_type": "markdown",
1162 | "metadata": {},
1163 | "source": [
1164 | "## Make cytoscape file\n",
1165 | "\n",
1166 | "For the example query, the sorting based on shared gene count seems unsatisfying. Now, lets look at all the results in a network view to see if any interesting patterns emerge that might shed some light on the data and how we might process it more effectively. \n",
1167 | "\n",
1168 | "Below we will create a file suitable for loading into cytoscape. It will contain the three edge types of interest, linking drugs to genes, diseases to genes, and drugs to diseases e.g.:\n",
1169 | "\n",
1170 | "source_node\tsource_type\tedge_type\ttarget_node\ttarget_type\n"
1171 | ]
1172 | },
1173 | {
1174 | "cell_type": "markdown",
1175 | "metadata": {},
1176 | "source": [
1177 | "### Start with the drug and gene pairs"
1178 | ]
1179 | },
1180 | {
1181 | "cell_type": "code",
1182 | "execution_count": 25,
1183 | "metadata": {
1184 | "collapsed": true
1185 | },
1186 | "outputs": [],
1187 | "source": [
1188 | "drug_gene_links = simple_table[[\"drug\", \"gene\"]]"
1189 | ]
1190 | },
1191 | {
1192 | "cell_type": "code",
1193 | "execution_count": 26,
1194 | "metadata": {
1195 | "collapsed": false
1196 | },
1197 | "outputs": [
1198 | {
1199 | "data": {
1200 | "text/html": [
1201 | "\n",
1202 | "
\n",
1203 | " \n",
1204 | " \n",
1205 | " | \n",
1206 | " drug | \n",
1207 | " gene | \n",
1208 | "
\n",
1209 | " \n",
1210 | " \n",
1211 | " \n",
1212 | " 0 | \n",
1213 | " Caffeine | \n",
1214 | " adenosine A3 receptor | \n",
1215 | "
\n",
1216 | " \n",
1217 | " 1 | \n",
1218 | " Adenosine | \n",
1219 | " adenosine A3 receptor | \n",
1220 | "
\n",
1221 | " \n",
1222 | " 2 | \n",
1223 | " Theophylline | \n",
1224 | " adenosine A3 receptor | \n",
1225 | "
\n",
1226 | " \n",
1227 | " 3 | \n",
1228 | " Nicardipine | \n",
1229 | " adenosine A3 receptor | \n",
1230 | "
\n",
1231 | " \n",
1232 | " 4 | \n",
1233 | " Istradefylline | \n",
1234 | " adenosine A3 receptor | \n",
1235 | "
\n",
1236 | " \n",
1237 | "
\n",
1238 | "
"
1239 | ],
1240 | "text/plain": [
1241 | " drug gene\n",
1242 | "0 Caffeine adenosine A3 receptor\n",
1243 | "1 Adenosine adenosine A3 receptor\n",
1244 | "2 Theophylline adenosine A3 receptor\n",
1245 | "3 Nicardipine adenosine A3 receptor\n",
1246 | "4 Istradefylline adenosine A3 receptor"
1247 | ]
1248 | },
1249 | "execution_count": 26,
1250 | "metadata": {},
1251 | "output_type": "execute_result"
1252 | }
1253 | ],
1254 | "source": [
1255 | "drug_gene_links.head()"
1256 | ]
1257 | },
1258 | {
1259 | "cell_type": "markdown",
1260 | "metadata": {},
1261 | "source": [
1262 | "### Rename the columns"
1263 | ]
1264 | },
1265 | {
1266 | "cell_type": "code",
1267 | "execution_count": 27,
1268 | "metadata": {
1269 | "collapsed": true
1270 | },
1271 | "outputs": [],
1272 | "source": [
1273 | "drug_gene_links = drug_gene_links.rename(columns = {\"drug\": \"source_node\", \"gene\": \"target_node\"})"
1274 | ]
1275 | },
1276 | {
1277 | "cell_type": "code",
1278 | "execution_count": 28,
1279 | "metadata": {
1280 | "collapsed": false
1281 | },
1282 | "outputs": [
1283 | {
1284 | "data": {
1285 | "text/html": [
1286 | "\n",
1287 | "
\n",
1288 | " \n",
1289 | " \n",
1290 | " | \n",
1291 | " source_node | \n",
1292 | " target_node | \n",
1293 | "
\n",
1294 | " \n",
1295 | " \n",
1296 | " \n",
1297 | " 0 | \n",
1298 | " Caffeine | \n",
1299 | " adenosine A3 receptor | \n",
1300 | "
\n",
1301 | " \n",
1302 | " 1 | \n",
1303 | " Adenosine | \n",
1304 | " adenosine A3 receptor | \n",
1305 | "
\n",
1306 | " \n",
1307 | " 2 | \n",
1308 | " Theophylline | \n",
1309 | " adenosine A3 receptor | \n",
1310 | "
\n",
1311 | " \n",
1312 | " 3 | \n",
1313 | " Nicardipine | \n",
1314 | " adenosine A3 receptor | \n",
1315 | "
\n",
1316 | " \n",
1317 | " 4 | \n",
1318 | " Istradefylline | \n",
1319 | " adenosine A3 receptor | \n",
1320 | "
\n",
1321 | " \n",
1322 | "
\n",
1323 | "
"
1324 | ],
1325 | "text/plain": [
1326 | " source_node target_node\n",
1327 | "0 Caffeine adenosine A3 receptor\n",
1328 | "1 Adenosine adenosine A3 receptor\n",
1329 | "2 Theophylline adenosine A3 receptor\n",
1330 | "3 Nicardipine adenosine A3 receptor\n",
1331 | "4 Istradefylline adenosine A3 receptor"
1332 | ]
1333 | },
1334 | "execution_count": 28,
1335 | "metadata": {},
1336 | "output_type": "execute_result"
1337 | }
1338 | ],
1339 | "source": [
1340 | "drug_gene_links.head()"
1341 | ]
1342 | },
1343 | {
1344 | "cell_type": "markdown",
1345 | "metadata": {},
1346 | "source": [
1347 | "### Create a new column specifying the source node type"
1348 | ]
1349 | },
1350 | {
1351 | "cell_type": "code",
1352 | "execution_count": 29,
1353 | "metadata": {
1354 | "collapsed": true
1355 | },
1356 | "outputs": [],
1357 | "source": [
1358 | "drug_gene_links[\"source_type\"] = \"drug\""
1359 | ]
1360 | },
1361 | {
1362 | "cell_type": "code",
1363 | "execution_count": 30,
1364 | "metadata": {
1365 | "collapsed": false
1366 | },
1367 | "outputs": [
1368 | {
1369 | "data": {
1370 | "text/html": [
1371 | "\n",
1372 | "
\n",
1373 | " \n",
1374 | " \n",
1375 | " | \n",
1376 | " source_node | \n",
1377 | " target_node | \n",
1378 | " source_type | \n",
1379 | "
\n",
1380 | " \n",
1381 | " \n",
1382 | " \n",
1383 | " 0 | \n",
1384 | " Caffeine | \n",
1385 | " adenosine A3 receptor | \n",
1386 | " drug | \n",
1387 | "
\n",
1388 | " \n",
1389 | " 1 | \n",
1390 | " Adenosine | \n",
1391 | " adenosine A3 receptor | \n",
1392 | " drug | \n",
1393 | "
\n",
1394 | " \n",
1395 | " 2 | \n",
1396 | " Theophylline | \n",
1397 | " adenosine A3 receptor | \n",
1398 | " drug | \n",
1399 | "
\n",
1400 | " \n",
1401 | " 3 | \n",
1402 | " Nicardipine | \n",
1403 | " adenosine A3 receptor | \n",
1404 | " drug | \n",
1405 | "
\n",
1406 | " \n",
1407 | " 4 | \n",
1408 | " Istradefylline | \n",
1409 | " adenosine A3 receptor | \n",
1410 | " drug | \n",
1411 | "
\n",
1412 | " \n",
1413 | "
\n",
1414 | "
"
1415 | ],
1416 | "text/plain": [
1417 | " source_node target_node source_type\n",
1418 | "0 Caffeine adenosine A3 receptor drug\n",
1419 | "1 Adenosine adenosine A3 receptor drug\n",
1420 | "2 Theophylline adenosine A3 receptor drug\n",
1421 | "3 Nicardipine adenosine A3 receptor drug\n",
1422 | "4 Istradefylline adenosine A3 receptor drug"
1423 | ]
1424 | },
1425 | "execution_count": 30,
1426 | "metadata": {},
1427 | "output_type": "execute_result"
1428 | }
1429 | ],
1430 | "source": [
1431 | "drug_gene_links.head()"
1432 | ]
1433 | },
1434 | {
1435 | "cell_type": "markdown",
1436 | "metadata": {},
1437 | "source": [
1438 | "### Create a new column containing the edge type"
1439 | ]
1440 | },
1441 | {
1442 | "cell_type": "code",
1443 | "execution_count": 31,
1444 | "metadata": {
1445 | "collapsed": true
1446 | },
1447 | "outputs": [],
1448 | "source": [
1449 | "drug_gene_links[\"edge_type\"] = \"interacts_with\""
1450 | ]
1451 | },
1452 | {
1453 | "cell_type": "code",
1454 | "execution_count": 32,
1455 | "metadata": {
1456 | "collapsed": false
1457 | },
1458 | "outputs": [
1459 | {
1460 | "data": {
1461 | "text/html": [
1462 | "\n",
1463 | "
\n",
1464 | " \n",
1465 | " \n",
1466 | " | \n",
1467 | " source_node | \n",
1468 | " target_node | \n",
1469 | " source_type | \n",
1470 | " edge_type | \n",
1471 | "
\n",
1472 | " \n",
1473 | " \n",
1474 | " \n",
1475 | " 0 | \n",
1476 | " Caffeine | \n",
1477 | " adenosine A3 receptor | \n",
1478 | " drug | \n",
1479 | " interacts_with | \n",
1480 | "
\n",
1481 | " \n",
1482 | " 1 | \n",
1483 | " Adenosine | \n",
1484 | " adenosine A3 receptor | \n",
1485 | " drug | \n",
1486 | " interacts_with | \n",
1487 | "
\n",
1488 | " \n",
1489 | " 2 | \n",
1490 | " Theophylline | \n",
1491 | " adenosine A3 receptor | \n",
1492 | " drug | \n",
1493 | " interacts_with | \n",
1494 | "
\n",
1495 | " \n",
1496 | " 3 | \n",
1497 | " Nicardipine | \n",
1498 | " adenosine A3 receptor | \n",
1499 | " drug | \n",
1500 | " interacts_with | \n",
1501 | "
\n",
1502 | " \n",
1503 | " 4 | \n",
1504 | " Istradefylline | \n",
1505 | " adenosine A3 receptor | \n",
1506 | " drug | \n",
1507 | " interacts_with | \n",
1508 | "
\n",
1509 | " \n",
1510 | "
\n",
1511 | "
"
1512 | ],
1513 | "text/plain": [
1514 | " source_node target_node source_type edge_type\n",
1515 | "0 Caffeine adenosine A3 receptor drug interacts_with\n",
1516 | "1 Adenosine adenosine A3 receptor drug interacts_with\n",
1517 | "2 Theophylline adenosine A3 receptor drug interacts_with\n",
1518 | "3 Nicardipine adenosine A3 receptor drug interacts_with\n",
1519 | "4 Istradefylline adenosine A3 receptor drug interacts_with"
1520 | ]
1521 | },
1522 | "execution_count": 32,
1523 | "metadata": {},
1524 | "output_type": "execute_result"
1525 | }
1526 | ],
1527 | "source": [
1528 | "drug_gene_links.head()"
1529 | ]
1530 | },
1531 | {
1532 | "cell_type": "markdown",
1533 | "metadata": {},
1534 | "source": [
1535 | "### Create a new column containing the target type"
1536 | ]
1537 | },
1538 | {
1539 | "cell_type": "code",
1540 | "execution_count": 33,
1541 | "metadata": {
1542 | "collapsed": true
1543 | },
1544 | "outputs": [],
1545 | "source": [
1546 | "drug_gene_links[\"target_type\"] = \"gene\""
1547 | ]
1548 | },
1549 | {
1550 | "cell_type": "code",
1551 | "execution_count": 34,
1552 | "metadata": {
1553 | "collapsed": false
1554 | },
1555 | "outputs": [
1556 | {
1557 | "data": {
1558 | "text/html": [
1559 | "\n",
1560 | "
\n",
1561 | " \n",
1562 | " \n",
1563 | " | \n",
1564 | " source_node | \n",
1565 | " target_node | \n",
1566 | " source_type | \n",
1567 | " edge_type | \n",
1568 | " target_type | \n",
1569 | "
\n",
1570 | " \n",
1571 | " \n",
1572 | " \n",
1573 | " 0 | \n",
1574 | " Caffeine | \n",
1575 | " adenosine A3 receptor | \n",
1576 | " drug | \n",
1577 | " interacts_with | \n",
1578 | " gene | \n",
1579 | "
\n",
1580 | " \n",
1581 | " 1 | \n",
1582 | " Adenosine | \n",
1583 | " adenosine A3 receptor | \n",
1584 | " drug | \n",
1585 | " interacts_with | \n",
1586 | " gene | \n",
1587 | "
\n",
1588 | " \n",
1589 | " 2 | \n",
1590 | " Theophylline | \n",
1591 | " adenosine A3 receptor | \n",
1592 | " drug | \n",
1593 | " interacts_with | \n",
1594 | " gene | \n",
1595 | "
\n",
1596 | " \n",
1597 | " 3 | \n",
1598 | " Nicardipine | \n",
1599 | " adenosine A3 receptor | \n",
1600 | " drug | \n",
1601 | " interacts_with | \n",
1602 | " gene | \n",
1603 | "
\n",
1604 | " \n",
1605 | " 4 | \n",
1606 | " Istradefylline | \n",
1607 | " adenosine A3 receptor | \n",
1608 | " drug | \n",
1609 | " interacts_with | \n",
1610 | " gene | \n",
1611 | "
\n",
1612 | " \n",
1613 | "
\n",
1614 | "
"
1615 | ],
1616 | "text/plain": [
1617 | " source_node target_node source_type edge_type \\\n",
1618 | "0 Caffeine adenosine A3 receptor drug interacts_with \n",
1619 | "1 Adenosine adenosine A3 receptor drug interacts_with \n",
1620 | "2 Theophylline adenosine A3 receptor drug interacts_with \n",
1621 | "3 Nicardipine adenosine A3 receptor drug interacts_with \n",
1622 | "4 Istradefylline adenosine A3 receptor drug interacts_with \n",
1623 | "\n",
1624 | " target_type \n",
1625 | "0 gene \n",
1626 | "1 gene \n",
1627 | "2 gene \n",
1628 | "3 gene \n",
1629 | "4 gene "
1630 | ]
1631 | },
1632 | "execution_count": 34,
1633 | "metadata": {},
1634 | "output_type": "execute_result"
1635 | }
1636 | ],
1637 | "source": [
1638 | "drug_gene_links.head()"
1639 | ]
1640 | },
1641 | {
1642 | "cell_type": "markdown",
1643 | "metadata": {},
1644 | "source": [
1645 | "## Repeat for disease gene pairs"
1646 | ]
1647 | },
1648 | {
1649 | "cell_type": "code",
1650 | "execution_count": 35,
1651 | "metadata": {
1652 | "collapsed": true
1653 | },
1654 | "outputs": [],
1655 | "source": [
1656 | "disease_gene_links = simple_table[[\"disease\", \"gene\"]]"
1657 | ]
1658 | },
1659 | {
1660 | "cell_type": "code",
1661 | "execution_count": 36,
1662 | "metadata": {
1663 | "collapsed": false
1664 | },
1665 | "outputs": [
1666 | {
1667 | "data": {
1668 | "text/html": [
1669 | "\n",
1670 | "
\n",
1671 | " \n",
1672 | " \n",
1673 | " | \n",
1674 | " disease | \n",
1675 | " gene | \n",
1676 | "
\n",
1677 | " \n",
1678 | " \n",
1679 | " \n",
1680 | " 0 | \n",
1681 | " obesity | \n",
1682 | " adenosine A3 receptor | \n",
1683 | "
\n",
1684 | " \n",
1685 | " 1 | \n",
1686 | " obesity | \n",
1687 | " adenosine A3 receptor | \n",
1688 | "
\n",
1689 | " \n",
1690 | " 2 | \n",
1691 | " obesity | \n",
1692 | " adenosine A3 receptor | \n",
1693 | "
\n",
1694 | " \n",
1695 | " 3 | \n",
1696 | " obesity | \n",
1697 | " adenosine A3 receptor | \n",
1698 | "
\n",
1699 | " \n",
1700 | " 4 | \n",
1701 | " obesity | \n",
1702 | " adenosine A3 receptor | \n",
1703 | "
\n",
1704 | " \n",
1705 | "
\n",
1706 | "
"
1707 | ],
1708 | "text/plain": [
1709 | " disease gene\n",
1710 | "0 obesity adenosine A3 receptor\n",
1711 | "1 obesity adenosine A3 receptor\n",
1712 | "2 obesity adenosine A3 receptor\n",
1713 | "3 obesity adenosine A3 receptor\n",
1714 | "4 obesity adenosine A3 receptor"
1715 | ]
1716 | },
1717 | "execution_count": 36,
1718 | "metadata": {},
1719 | "output_type": "execute_result"
1720 | }
1721 | ],
1722 | "source": [
1723 | "disease_gene_links.head()"
1724 | ]
1725 | },
1726 | {
1727 | "cell_type": "markdown",
1728 | "metadata": {},
1729 | "source": [
1730 | "## Rename the columns"
1731 | ]
1732 | },
1733 | {
1734 | "cell_type": "code",
1735 | "execution_count": 37,
1736 | "metadata": {
1737 | "collapsed": true
1738 | },
1739 | "outputs": [],
1740 | "source": [
1741 | "disease_gene_links = disease_gene_links.rename(columns = {\"disease\": \"source_node\", \"gene\": \"target_node\"})"
1742 | ]
1743 | },
1744 | {
1745 | "cell_type": "code",
1746 | "execution_count": 38,
1747 | "metadata": {
1748 | "collapsed": false
1749 | },
1750 | "outputs": [
1751 | {
1752 | "data": {
1753 | "text/html": [
1754 | "\n",
1755 | "
\n",
1756 | " \n",
1757 | " \n",
1758 | " | \n",
1759 | " source_node | \n",
1760 | " target_node | \n",
1761 | "
\n",
1762 | " \n",
1763 | " \n",
1764 | " \n",
1765 | " 0 | \n",
1766 | " obesity | \n",
1767 | " adenosine A3 receptor | \n",
1768 | "
\n",
1769 | " \n",
1770 | " 1 | \n",
1771 | " obesity | \n",
1772 | " adenosine A3 receptor | \n",
1773 | "
\n",
1774 | " \n",
1775 | " 2 | \n",
1776 | " obesity | \n",
1777 | " adenosine A3 receptor | \n",
1778 | "
\n",
1779 | " \n",
1780 | " 3 | \n",
1781 | " obesity | \n",
1782 | " adenosine A3 receptor | \n",
1783 | "
\n",
1784 | " \n",
1785 | " 4 | \n",
1786 | " obesity | \n",
1787 | " adenosine A3 receptor | \n",
1788 | "
\n",
1789 | " \n",
1790 | "
\n",
1791 | "
"
1792 | ],
1793 | "text/plain": [
1794 | " source_node target_node\n",
1795 | "0 obesity adenosine A3 receptor\n",
1796 | "1 obesity adenosine A3 receptor\n",
1797 | "2 obesity adenosine A3 receptor\n",
1798 | "3 obesity adenosine A3 receptor\n",
1799 | "4 obesity adenosine A3 receptor"
1800 | ]
1801 | },
1802 | "execution_count": 38,
1803 | "metadata": {},
1804 | "output_type": "execute_result"
1805 | }
1806 | ],
1807 | "source": [
1808 | "disease_gene_links.head()"
1809 | ]
1810 | },
1811 | {
1812 | "cell_type": "markdown",
1813 | "metadata": {},
1814 | "source": [
1815 | "### Create the new columns"
1816 | ]
1817 | },
1818 | {
1819 | "cell_type": "code",
1820 | "execution_count": 39,
1821 | "metadata": {
1822 | "collapsed": true
1823 | },
1824 | "outputs": [],
1825 | "source": [
1826 | "disease_gene_links[\"source_type\"] = \"disease\"\n",
1827 | "disease_gene_links[\"edge_type\"] = \"associated_with\"\n",
1828 | "disease_gene_links[\"target_type\"] = \"gene\""
1829 | ]
1830 | },
1831 | {
1832 | "cell_type": "code",
1833 | "execution_count": 40,
1834 | "metadata": {
1835 | "collapsed": false
1836 | },
1837 | "outputs": [
1838 | {
1839 | "data": {
1840 | "text/html": [
1841 | "\n",
1842 | "
\n",
1843 | " \n",
1844 | " \n",
1845 | " | \n",
1846 | " source_node | \n",
1847 | " target_node | \n",
1848 | " source_type | \n",
1849 | " edge_type | \n",
1850 | " target_type | \n",
1851 | "
\n",
1852 | " \n",
1853 | " \n",
1854 | " \n",
1855 | " 0 | \n",
1856 | " obesity | \n",
1857 | " adenosine A3 receptor | \n",
1858 | " disease | \n",
1859 | " associated_with | \n",
1860 | " gene | \n",
1861 | "
\n",
1862 | " \n",
1863 | " 1 | \n",
1864 | " obesity | \n",
1865 | " adenosine A3 receptor | \n",
1866 | " disease | \n",
1867 | " associated_with | \n",
1868 | " gene | \n",
1869 | "
\n",
1870 | " \n",
1871 | " 2 | \n",
1872 | " obesity | \n",
1873 | " adenosine A3 receptor | \n",
1874 | " disease | \n",
1875 | " associated_with | \n",
1876 | " gene | \n",
1877 | "
\n",
1878 | " \n",
1879 | " 3 | \n",
1880 | " obesity | \n",
1881 | " adenosine A3 receptor | \n",
1882 | " disease | \n",
1883 | " associated_with | \n",
1884 | " gene | \n",
1885 | "
\n",
1886 | " \n",
1887 | " 4 | \n",
1888 | " obesity | \n",
1889 | " adenosine A3 receptor | \n",
1890 | " disease | \n",
1891 | " associated_with | \n",
1892 | " gene | \n",
1893 | "
\n",
1894 | " \n",
1895 | "
\n",
1896 | "
"
1897 | ],
1898 | "text/plain": [
1899 | " source_node target_node source_type edge_type target_type\n",
1900 | "0 obesity adenosine A3 receptor disease associated_with gene\n",
1901 | "1 obesity adenosine A3 receptor disease associated_with gene\n",
1902 | "2 obesity adenosine A3 receptor disease associated_with gene\n",
1903 | "3 obesity adenosine A3 receptor disease associated_with gene\n",
1904 | "4 obesity adenosine A3 receptor disease associated_with gene"
1905 | ]
1906 | },
1907 | "execution_count": 40,
1908 | "metadata": {},
1909 | "output_type": "execute_result"
1910 | }
1911 | ],
1912 | "source": [
1913 | "disease_gene_links.head()"
1914 | ]
1915 | },
1916 | {
1917 | "cell_type": "code",
1918 | "execution_count": 41,
1919 | "metadata": {
1920 | "collapsed": true
1921 | },
1922 | "outputs": [],
1923 | "source": [
1924 | "drug_disease_links = (simple_table\n",
1925 | " [[\"drug\", \"disease\"]]\n",
1926 | " .assign(\n",
1927 | " source_type = \"drug\",\n",
1928 | " edge_type = \"may treat\",\n",
1929 | " target_type = \"disease\"\n",
1930 | " )\n",
1931 | " .rename(columns = {\"drug\": \"source_node\", \"disease\": \"target_node\"})\n",
1932 | ")"
1933 | ]
1934 | },
1935 | {
1936 | "cell_type": "markdown",
1937 | "metadata": {},
1938 | "source": [
1939 | "# Join the (disease, gene), (drug, gene), and (drug, disease) tables together"
1940 | ]
1941 | },
1942 | {
1943 | "cell_type": "markdown",
1944 | "metadata": {},
1945 | "source": [
1946 | "### Number of rows of each table"
1947 | ]
1948 | },
1949 | {
1950 | "cell_type": "code",
1951 | "execution_count": 42,
1952 | "metadata": {
1953 | "collapsed": false
1954 | },
1955 | "outputs": [
1956 | {
1957 | "data": {
1958 | "text/plain": [
1959 | "595"
1960 | ]
1961 | },
1962 | "execution_count": 42,
1963 | "metadata": {},
1964 | "output_type": "execute_result"
1965 | }
1966 | ],
1967 | "source": [
1968 | "len(drug_gene_links)"
1969 | ]
1970 | },
1971 | {
1972 | "cell_type": "code",
1973 | "execution_count": 43,
1974 | "metadata": {
1975 | "collapsed": false
1976 | },
1977 | "outputs": [
1978 | {
1979 | "data": {
1980 | "text/plain": [
1981 | "595"
1982 | ]
1983 | },
1984 | "execution_count": 43,
1985 | "metadata": {},
1986 | "output_type": "execute_result"
1987 | }
1988 | ],
1989 | "source": [
1990 | "len(disease_gene_links)"
1991 | ]
1992 | },
1993 | {
1994 | "cell_type": "code",
1995 | "execution_count": 44,
1996 | "metadata": {
1997 | "collapsed": false
1998 | },
1999 | "outputs": [
2000 | {
2001 | "data": {
2002 | "text/plain": [
2003 | "595"
2004 | ]
2005 | },
2006 | "execution_count": 44,
2007 | "metadata": {},
2008 | "output_type": "execute_result"
2009 | }
2010 | ],
2011 | "source": [
2012 | "len(drug_disease_links)"
2013 | ]
2014 | },
2015 | {
2016 | "cell_type": "markdown",
2017 | "metadata": {},
2018 | "source": [
2019 | "### Join all three tables together"
2020 | ]
2021 | },
2022 | {
2023 | "cell_type": "code",
2024 | "execution_count": 45,
2025 | "metadata": {
2026 | "collapsed": false
2027 | },
2028 | "outputs": [],
2029 | "source": [
2030 | "cytoscape_edges = pd.concat([drug_gene_links, disease_gene_links, drug_disease_links])"
2031 | ]
2032 | },
2033 | {
2034 | "cell_type": "code",
2035 | "execution_count": 46,
2036 | "metadata": {
2037 | "collapsed": false
2038 | },
2039 | "outputs": [
2040 | {
2041 | "data": {
2042 | "text/html": [
2043 | "\n",
2044 | "
\n",
2045 | " \n",
2046 | " \n",
2047 | " | \n",
2048 | " edge_type | \n",
2049 | " source_node | \n",
2050 | " source_type | \n",
2051 | " target_node | \n",
2052 | " target_type | \n",
2053 | "
\n",
2054 | " \n",
2055 | " \n",
2056 | " \n",
2057 | " 0 | \n",
2058 | " interacts_with | \n",
2059 | " Caffeine | \n",
2060 | " drug | \n",
2061 | " adenosine A3 receptor | \n",
2062 | " gene | \n",
2063 | "
\n",
2064 | " \n",
2065 | " 1 | \n",
2066 | " interacts_with | \n",
2067 | " Adenosine | \n",
2068 | " drug | \n",
2069 | " adenosine A3 receptor | \n",
2070 | " gene | \n",
2071 | "
\n",
2072 | " \n",
2073 | " 2 | \n",
2074 | " interacts_with | \n",
2075 | " Theophylline | \n",
2076 | " drug | \n",
2077 | " adenosine A3 receptor | \n",
2078 | " gene | \n",
2079 | "
\n",
2080 | " \n",
2081 | " 3 | \n",
2082 | " interacts_with | \n",
2083 | " Nicardipine | \n",
2084 | " drug | \n",
2085 | " adenosine A3 receptor | \n",
2086 | " gene | \n",
2087 | "
\n",
2088 | " \n",
2089 | " 4 | \n",
2090 | " interacts_with | \n",
2091 | " Istradefylline | \n",
2092 | " drug | \n",
2093 | " adenosine A3 receptor | \n",
2094 | " gene | \n",
2095 | "
\n",
2096 | " \n",
2097 | "
\n",
2098 | "
"
2099 | ],
2100 | "text/plain": [
2101 | " edge_type source_node source_type target_node \\\n",
2102 | "0 interacts_with Caffeine drug adenosine A3 receptor \n",
2103 | "1 interacts_with Adenosine drug adenosine A3 receptor \n",
2104 | "2 interacts_with Theophylline drug adenosine A3 receptor \n",
2105 | "3 interacts_with Nicardipine drug adenosine A3 receptor \n",
2106 | "4 interacts_with Istradefylline drug adenosine A3 receptor \n",
2107 | "\n",
2108 | " target_type \n",
2109 | "0 gene \n",
2110 | "1 gene \n",
2111 | "2 gene \n",
2112 | "3 gene \n",
2113 | "4 gene "
2114 | ]
2115 | },
2116 | "execution_count": 46,
2117 | "metadata": {},
2118 | "output_type": "execute_result"
2119 | }
2120 | ],
2121 | "source": [
2122 | "cytoscape_edges.head()"
2123 | ]
2124 | },
2125 | {
2126 | "cell_type": "code",
2127 | "execution_count": 47,
2128 | "metadata": {
2129 | "collapsed": false
2130 | },
2131 | "outputs": [
2132 | {
2133 | "data": {
2134 | "text/plain": [
2135 | "1785"
2136 | ]
2137 | },
2138 | "execution_count": 47,
2139 | "metadata": {},
2140 | "output_type": "execute_result"
2141 | }
2142 | ],
2143 | "source": [
2144 | "len(cytoscape_edges)"
2145 | ]
2146 | },
2147 | {
2148 | "cell_type": "markdown",
2149 | "metadata": {},
2150 | "source": [
2151 | "Note that the final result has 1785 (= 595 * 3) rows. (595 was the original number of results returned)"
2152 | ]
2153 | },
2154 | {
2155 | "cell_type": "markdown",
2156 | "metadata": {},
2157 | "source": [
2158 | "### Reorder columns"
2159 | ]
2160 | },
2161 | {
2162 | "cell_type": "code",
2163 | "execution_count": 48,
2164 | "metadata": {
2165 | "collapsed": true
2166 | },
2167 | "outputs": [],
2168 | "source": [
2169 | "cytoscape_edges = cytoscape_edges[[\"source_node\", \"source_type\", \"edge_type\", \"target_node\", \"target_type\"]]"
2170 | ]
2171 | },
2172 | {
2173 | "cell_type": "code",
2174 | "execution_count": 49,
2175 | "metadata": {
2176 | "collapsed": false
2177 | },
2178 | "outputs": [
2179 | {
2180 | "data": {
2181 | "text/html": [
2182 | "\n",
2183 | "
\n",
2184 | " \n",
2185 | " \n",
2186 | " | \n",
2187 | " source_node | \n",
2188 | " source_type | \n",
2189 | " edge_type | \n",
2190 | " target_node | \n",
2191 | " target_type | \n",
2192 | "
\n",
2193 | " \n",
2194 | " \n",
2195 | " \n",
2196 | " 0 | \n",
2197 | " Caffeine | \n",
2198 | " drug | \n",
2199 | " interacts_with | \n",
2200 | " adenosine A3 receptor | \n",
2201 | " gene | \n",
2202 | "
\n",
2203 | " \n",
2204 | " 1 | \n",
2205 | " Adenosine | \n",
2206 | " drug | \n",
2207 | " interacts_with | \n",
2208 | " adenosine A3 receptor | \n",
2209 | " gene | \n",
2210 | "
\n",
2211 | " \n",
2212 | " 2 | \n",
2213 | " Theophylline | \n",
2214 | " drug | \n",
2215 | " interacts_with | \n",
2216 | " adenosine A3 receptor | \n",
2217 | " gene | \n",
2218 | "
\n",
2219 | " \n",
2220 | " 3 | \n",
2221 | " Nicardipine | \n",
2222 | " drug | \n",
2223 | " interacts_with | \n",
2224 | " adenosine A3 receptor | \n",
2225 | " gene | \n",
2226 | "
\n",
2227 | " \n",
2228 | " 4 | \n",
2229 | " Istradefylline | \n",
2230 | " drug | \n",
2231 | " interacts_with | \n",
2232 | " adenosine A3 receptor | \n",
2233 | " gene | \n",
2234 | "
\n",
2235 | " \n",
2236 | "
\n",
2237 | "
"
2238 | ],
2239 | "text/plain": [
2240 | " source_node source_type edge_type target_node \\\n",
2241 | "0 Caffeine drug interacts_with adenosine A3 receptor \n",
2242 | "1 Adenosine drug interacts_with adenosine A3 receptor \n",
2243 | "2 Theophylline drug interacts_with adenosine A3 receptor \n",
2244 | "3 Nicardipine drug interacts_with adenosine A3 receptor \n",
2245 | "4 Istradefylline drug interacts_with adenosine A3 receptor \n",
2246 | "\n",
2247 | " target_type \n",
2248 | "0 gene \n",
2249 | "1 gene \n",
2250 | "2 gene \n",
2251 | "3 gene \n",
2252 | "4 gene "
2253 | ]
2254 | },
2255 | "execution_count": 49,
2256 | "metadata": {},
2257 | "output_type": "execute_result"
2258 | }
2259 | ],
2260 | "source": [
2261 | "cytoscape_edges.head()"
2262 | ]
2263 | },
2264 | {
2265 | "cell_type": "markdown",
2266 | "metadata": {},
2267 | "source": [
2268 | "## Save file to disk"
2269 | ]
2270 | },
2271 | {
2272 | "cell_type": "code",
2273 | "execution_count": 50,
2274 | "metadata": {
2275 | "collapsed": true
2276 | },
2277 | "outputs": [],
2278 | "source": [
2279 | "cytoscape_edges.to_csv(\"drug_gene_disease_network.txt\", sep = '\\t', index = False, encoding = 'utf-8')"
2280 | ]
2281 | },
2282 | {
2283 | "cell_type": "markdown",
2284 | "metadata": {},
2285 | "source": [
2286 | "#open a new network in cytoscape and load the file by: File..Import..Network\n",
2287 | "#use Cytoscape to develop a visualization \n",
2288 | "#submit the tab-delimited output files, an image of your network, and an explanation of what you did as your assignment\n",
2289 | "#Are there any notable hubs in the network?\n",
2290 | "#Could you extend the code to identify them automatically?\n",
2291 | "#can you adapt the code to search for a specific drug or a specific disease?"
2292 | ]
2293 | }
2294 | ],
2295 | "metadata": {
2296 | "kernelspec": {
2297 | "display_name": "Python 3",
2298 | "language": "python",
2299 | "name": "python3"
2300 | },
2301 | "language_info": {
2302 | "codemirror_mode": {
2303 | "name": "ipython",
2304 | "version": 3
2305 | },
2306 | "file_extension": ".py",
2307 | "mimetype": "text/x-python",
2308 | "name": "python",
2309 | "nbconvert_exporter": "python",
2310 | "pygments_lexer": "ipython3",
2311 | "version": "3.4.3"
2312 | }
2313 | },
2314 | "nbformat": 4,
2315 | "nbformat_minor": 0
2316 | }
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | backports-abc==0.4
2 | backports.shutil-get-terminal-size==1.0.0
3 | decorator==4.0.9
4 | entrypoints==0.2.2
5 | ipykernel==4.3.1
6 | ipython==4.2.0
7 | ipython-genutils==0.1.0
8 | ipywidgets==5.1.5
9 | isodate==0.5.4
10 | Jinja2==2.8
11 | jsonschema==2.5.1
12 | jupyter==1.0.0
13 | jupyter-client==4.2.2
14 | jupyter-console==4.1.1
15 | jupyter-core==4.1.0
16 | keepalive==0.5
17 | MarkupSafe==0.23
18 | mistune==0.8.1
19 | nbconvert==4.2.0
20 | nbformat==4.0.1
21 | notebook==5.7.8
22 | numpy==1.11.0
23 | pandas==0.18.1
24 | pexpect==4.1.0
25 | pickleshare==0.7.2
26 | ptyprocess==0.5.1
27 | Pygments==2.1.3
28 | pyparsing==2.1.4
29 | python-dateutil==2.5.3
30 | pytz==2016.4
31 | pyzmq==15.2.0
32 | qtconsole==4.2.1
33 | rdflib==4.2.1
34 | simplegeneric==0.8.1
35 | six==1.10.0
36 | SPARQLWrapper==1.7.6
37 | terminado==0.6
38 | tornado==4.3
39 | traitlets==4.2.1
40 | widgetsnbextension==1.2.3
41 |
--------------------------------------------------------------------------------