├── .gitignore
├── README.md
├── notebooks
    ├── gwas_cat_rest.ipynb
    ├── sumstats_rest.ipynb
    └── workshop_01.ipynb
├── requirements.txt
└── slides
    ├── API_slides.pdf
    └── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | # Checkpoint folder:
2 | .ipynb_checkpoints/
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # GWAS Catalog and Summary Statistics REST API Workshop
 2 | 
 3 | This repository contains training materials for the GWAS Catalog and Summary Statistics REST API workshop. This workshop provides examples on how to extract data from the GWAS Catalog for the most frequent use cases. For the complete documentation see API documentation: [GWAS Catalog REST API documentation](https://www.ebi.ac.uk/gwas/rest/docs/api) and [GWAS Catalog Summary Statistics REST API documentation](https://www.ebi.ac.uk/gwas/summary-statistics/docs/)
 4 | 
 5 | ### Repository contents:
 6 | 
 7 | * **slides**: the REST API presentation materials
 8 | * **notebooks**: jupyter notebooks containing interactive code examples to demonstrate how to extract and parse data from the GWAS Catalog and the GWAS Catalog Summary statistics REST API. 
 9 | 
10 | ## Startup
11 | 
12 | ### 1. If git and Jupyter available on local machine
13 | 
14 | Clone repository to local folder. Start up jupyter notebook and open the workshop notebook.
15 | 
16 | ```bash
17 | git clone https://github.com/EBISPOT/GWAS_Catalog-workshop 
18 | cd GWAS_Catalog-workshop/notebooks
19 | jupyter notebook
20 | ```
21 | Then load the notebook of your interest.
22 | 
23 | ### 2. If user has Google account
24 | 
25 | Via the Google account, a user can open the interactive notebooks on [google colab](https://colab.research.google.com) virtual machine.
26 | 
27 | Use the following links to start up virtual machines:
28 | 
29 | * [workshop_01](https://colab.research.google.com/github/EBISPOT/GWAS_Catalog-workshop/blob/master/notebooks/workshop_01.ipynb)
30 | 
31 | No modification will be saved in the repository, so there is no need to be careful! Have fun! 
32 | 
33 | ### 3. Without google account
34 | 
35 | Publicly stored Jupyter notebooks can be uploaded and run on a remote server hosed by [Binder](https://mybinder.org/). The required Python packages are read from the `requirements.txt` file. Upon the first build the required packages will be installed subsequent startups will be faster. 
36 | 
37 | After the first build a direct link can also be used to access the virtual environment: 
38 | 
39 | * [workshop_01](https://mybinder.org/v2/gh/EBISPOT/GWAS_Catalog-workshop/master?filepath=notebooks%2Fworkshop_01.ipynb)
40 | 
41 | No modification will be saved, so there is no need to be careful! Have fun! 
42 | 
43 | ## Available notebooks:
44 | 
45 | ### workshop_01.ipynb
46 | 
47 | * Contains a short description of the returned data of the API. 
48 | * Basic use-cases to fetch association data from the GWAS Catalog REST API and Summary Statistics API.
49 | 
50 | ### gwas_cat_rest.ipynb
51 | 
52 | * Basic use-cases to fetch association data from the GWAS Catalog REST API
53 | 
54 | ### sumstats_rest.ipynb
55 | 
56 | * Introductory exercise to the GWAS Catalog's summary statistics API.
57 | 
58 | ## Links
59 | 
60 | * [GWAS Catalog REST API](https://www.ebi.ac.uk/gwas/rest/api)
61 | * [GWAS Catalog REST API documentation](https://www.ebi.ac.uk/gwas/rest/docs)
62 | * [GWAS Catalog Summary Statistics REST API](https://www.ebi.ac.uk/gwas/summary-statistics/api)
63 | * [GWAS Catalog Summary Statistics REST API documentation](https://www.ebi.ac.uk/gwas/summary-statistics/docs/)
64 | 
65 | **Other readings:**
66 | 
67 | * Jupyter notebook tutorial: [link](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook)
68 | * Binder documentation: [link](https://mybinder.readthedocs.io/en/latest/)
69 | * Opening Jupyter notebooks hosted on github via Google colab: [link](https://medium.com/@steve7an/how-to-test-jupyter-notebook-from-github-via-google-colab-7dc4b9b11a19)
70 | 
71 | ## Feedback
72 | 
73 | We would love to hear feedback from you! Please send your comments to [gwas-info@ebi.ac.uk](gwas-info@ebi.ac.uk)
74 | 


--------------------------------------------------------------------------------
/notebooks/gwas_cat_rest.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# GWAS Catalog REST API workshop\n",
  8 |     "\n",
  9 |     "* The following example shows a basic example how to access and parse data from the GWAS Catalog through the REST API. \n",
 10 |     "* Although this example is written in Python, any other programming language is equally good.\n",
 11 |     "* Examples in other languages will be available soon.\n",
 12 |     "\n",
 13 |     "\n",
 14 |     "### Contents:\n",
 15 |     "\n",
 16 |     "* **Exercise 1**: fetching data from the API manually, via a browser\n",
 17 |     "* **Exercise 2**: fetching data programatically for a single variant\n",
 18 |     "* **Exercise 3**: fetching data for a list of variants\n",
 19 |     "* **Exercise 4**: fetching and merging data from multiple endpoints"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "## Exercise 1\n",
 27 |     "\n",
 28 |     "Fetching data of a single study with accession ID [GCST001795](https://www.ebi.ac.uk/gwas/studies/GCST001795) from the GWAS Catalog REST API using a browser.\n",
 29 |     "\n",
 30 |     "**Generating the URL:**\n",
 31 |     "\n",
 32 |     "* API URL: `https://www.ebi.ac.uk/gwas/rest/api`\n",
 33 |     "* Endpoint: `studies`\n",
 34 |     "* AccessionID: `GCST001795`\n",
 35 |     "\n",
 36 |     "**URL:**\n",
 37 |     "\n",
 38 |     "[https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795)"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "### Understanding the returned data:\n",
 46 |     "\n",
 47 |     "* Number of simple key-value pairs eg:\n",
 48 |     "\n",
 49 |     "```json\n",
 50 |     "    \"initialSampleSize\" : \"1,656 Han Chinese ancestry cases, 3,394 Han Chinese ancestry controls\",\n",
 51 |     "    \"snpCount\" : 2100739,\n",
 52 |     "    \"imputed\" : true,\n",
 53 |     "    \"accessionId\" : \"GCST001795\",\n",
 54 |     "```\n",
 55 |     "\n",
 56 |     "* List allowing multiple elements for a key:\n",
 57 |     "\n",
 58 |     "```json\n",
 59 |     "    \"genotypingTechnologies\" : [ {\n",
 60 |     "        \"genotypingTechnology\" : \"Genome-wide genotyping array\"\n",
 61 |     "    } ],\n",
 62 |     "```\n",
 63 |     "* List where the values are themselves complex objects eg. ancestries.\n",
 64 |     "\n",
 65 |     "\n",
 66 |     "* The returned data is highly structured, easy to read for computer. \n",
 67 |     "* The same information is accessible via the UI.\n",
 68 |     "\n",
 69 |     "In the following examples we make small scripts in Python to organize this data to make is easy to read for humans."
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "## Exercise 2\n",
 77 |     "\n",
 78 |     "Fetch the trait and p-value of all associations for a single rsID ([rs7329174](https://www.ebi.ac.uk/gwas/variants/rs7329174))"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {
 85 |     "ExecuteTime": {
 86 |      "end_time": "2019-06-16T07:26:46.706202Z",
 87 |      "start_time": "2019-06-16T07:26:45.818867Z"
 88 |     }
 89 |    },
 90 |    "outputs": [],
 91 |    "source": [
 92 |     "# Importing required packages\n",
 93 |     "import requests     # Manages data transfer from the GWAS Catalog REST API\n",
 94 |     "import pandas as pd # Makes data handling easier\n",
 95 |     "import json         # Hanling the returned data type called JSON\n",
 96 |     "from collections import OrderedDict\n",
 97 |     "\n",
 98 |     "print(\"[Info] Required libraries are loaded.\")"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "markdown",
103 |    "metadata": {},
104 |    "source": [
105 |     "### Return association data:"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {
112 |     "ExecuteTime": {
113 |      "end_time": "2019-06-16T07:27:57.616887Z",
114 |      "start_time": "2019-06-16T07:27:56.894982Z"
115 |     },
116 |     "scrolled": true
117 |    },
118 |    "outputs": [],
119 |    "source": [
120 |     "# API Address:\n",
121 |     "apiUrl = 'https://www.ebi.ac.uk/gwas/rest/api'\n",
122 |     "\n",
123 |     "# Accessing data for a single variant:\n",
124 |     "variant = 'rs7329174'\n",
125 |     "requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)\n",
126 |     "response = requests.get(requestUrl, headers={ \"Content-Type\" : \"application/json\"})\n",
127 |     "\n",
128 |     "# The returned response is a \"response\" object, from which we have to extract and parse the information:\n",
129 |     "decoded = response.json()\n",
130 |     "\n",
131 |     "# The returned information is parsed as a python dictionary. Take a look at the values:\n",
132 |     "print(json.dumps(decoded, indent = 2))"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "markdown",
137 |    "metadata": {},
138 |    "source": [
139 |     "### Parsing returned data to get traits and p-values\n",
140 |     "\n",
141 |     "To find out how the returned data is structured, visit the API documentation [here](https://www.ebi.ac.uk/gwas/rest/docs/api)."
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": null,
147 |    "metadata": {
148 |     "ExecuteTime": {
149 |      "end_time": "2019-06-16T07:28:29.353414Z",
150 |      "start_time": "2019-06-16T07:28:29.345736Z"
151 |     }
152 |    },
153 |    "outputs": [],
154 |    "source": [
155 |     "for association in decoded['_embedded']['associations']:\n",
156 |     "    trait = \",\".join([trait['trait'] for trait in association['efoTraits']])\n",
157 |     "    pvalue = association['pvalue']\n",
158 |     "    \n",
159 |     "    print(\"Trait: %s, p-value: %s\" %(trait, pvalue))\n"
160 |    ]
161 |   },
162 |   {
163 |    "cell_type": "markdown",
164 |    "metadata": {},
165 |    "source": [
166 |     "The same associations (trait and p-values) can be found in the UI: [rs7329174](https://www.ebi.ac.uk/gwas/variants/rs7329174)"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "metadata": {
172 |     "ExecuteTime": {
173 |      "end_time": "2019-03-21T21:53:49.832293Z",
174 |      "start_time": "2019-03-21T21:53:49.826997Z"
175 |     }
176 |    },
177 |    "source": [
178 |     "## Exercise 3\n",
179 |     "\n",
180 |     "1. Fetch the trait and p-value of all associations for multiple rsIDs. \n",
181 |     "2. Organize the data in a table.\n",
182 |     "3. Be careful, might not all rsIDs have associations!"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "code",
187 |    "execution_count": null,
188 |    "metadata": {
189 |     "ExecuteTime": {
190 |      "end_time": "2019-06-16T07:28:48.769982Z",
191 |      "start_time": "2019-06-16T07:28:45.594638Z"
192 |     }
193 |    },
194 |    "outputs": [],
195 |    "source": [
196 |     "\n",
197 |     "# List of variants:\n",
198 |     "variants = ['rs142968358', 'rs62402518', 'rs12199222', 'rs7329174', 'rs9879858765']\n",
199 |     "\n",
200 |     "# Store extracted data in this list:\n",
201 |     "extractedData = []\n",
202 |     "\n",
203 |     "# Iterating over all variants:\n",
204 |     "for variant in variants:\n",
205 |     "\n",
206 |     "    # Accessing data for a single variant:\n",
207 |     "    requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)\n",
208 |     "    response = requests.get(requestUrl, headers={ \"Content-Type\" : \"application/json\"})\n",
209 |     "    \n",
210 |     "    # Testing if rsID exists:\n",
211 |     "    if not response.ok:\n",
212 |     "        print(\"[Warning] %s is not in the GWAS Catalog!!\" % variant)\n",
213 |     "        continue\n",
214 |     "    \n",
215 |     "    # Test if the returned data looks good:\n",
216 |     "    try:\n",
217 |     "        decoded = response.json()\n",
218 |     "    except:\n",
219 |     "        print(\"[Warning] Failed to encode data for %s\" % variant)\n",
220 |     "        continue\n",
221 |     "    \n",
222 |     "    for association in decoded['_embedded']['associations']:\n",
223 |     "        trait = \",\".join([trait['trait'] for trait in association['efoTraits']])\n",
224 |     "        pvalue = association['pvalue']\n",
225 |     "        \n",
226 |     "        extractedData.append(OrderedDict({'variant' : variant,\n",
227 |     "                              'trait' : trait,\n",
228 |     "                              'pvalue' : pvalue}))\n",
229 |     "\n",
230 |     "# Format data into a table:\n",
231 |     "table = pd.DataFrame.from_dict(extractedData)\n",
232 |     "table"
233 |    ]
234 |   },
235 |   {
236 |    "cell_type": "markdown",
237 |    "metadata": {
238 |     "ExecuteTime": {
239 |      "end_time": "2019-03-21T22:11:20.354795Z",
240 |      "start_time": "2019-03-21T22:11:20.281169Z"
241 |     }
242 |    },
243 |    "source": [
244 |     "## Exercise 4\n",
245 |     "\n",
246 |     "* Extend the previous table with pubmed ID and study accession of the associations. \n",
247 |     "* These pieces of information is not found in the association data, they have to be fetched from other endpoints.\n",
248 |     "\n",
249 |     "Use the links to related resources provided by each association data:\n",
250 |     "\n",
251 |     "```json\n",
252 |     "\n",
253 |     "\"_links\": {\n",
254 |     "    \"self\": {\n",
255 |     "        \"href\": \"https://www.ebi.ac.uk/gwas/rest/api/associations/26384\"\n",
256 |     "    },\n",
257 |     "    \"association\": {\n",
258 |     "        \"href\": \"https://www.ebi.ac.uk/gwas/rest/api/associations/26384{?projection}\",\n",
259 |     "        \"templated\": true\n",
260 |     "    },\n",
261 |     "    \"snps\": {\n",
262 |     "        \"href\": \"https://www.ebi.ac.uk/gwas/rest/api/associations/26384/snps\"\n",
263 |     "    },\n",
264 |     "    \"efoTraits\": {\n",
265 |     "        \"href\": \"https://www.ebi.ac.uk/gwas/rest/api/associations/26384/efoTraits\"\n",
266 |     "    },\n",
267 |     "    \"study\": {\n",
268 |     "        \"href\": \"https://www.ebi.ac.uk/gwas/rest/api/associations/26384/study\"\n",
269 |     "    }\n",
270 |     "}\n",
271 |     "```"
272 |    ]
273 |   },
274 |   {
275 |    "cell_type": "markdown",
276 |    "metadata": {},
277 |    "source": [
278 |     "The small function below visits the link to the study and returns the accession ID and the pubmed ID of the study."
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "code",
283 |    "execution_count": null,
284 |    "metadata": {
285 |     "ExecuteTime": {
286 |      "end_time": "2019-06-16T07:29:03.569038Z",
287 |      "start_time": "2019-06-16T07:29:03.556961Z"
288 |     }
289 |    },
290 |    "outputs": [],
291 |    "source": [
292 |     "def getStudy(studyLink):\n",
293 |     "    # Accessing data for a single study:\n",
294 |     "    response = requests.get(studyLink, headers={ \"Content-Type\" : \"application/json\"})\n",
295 |     "    decoded = response.json()\n",
296 |     "    \n",
297 |     "    accessionID = decoded['accessionId']\n",
298 |     "    pubmedId = decoded['publicationInfo']['pubmedId']\n",
299 |     "    \n",
300 |     "    return((accessionID, pubmedId))"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": null,
306 |    "metadata": {
307 |     "ExecuteTime": {
308 |      "end_time": "2019-06-16T07:29:08.872082Z",
309 |      "start_time": "2019-06-16T07:29:04.853111Z"
310 |     }
311 |    },
312 |    "outputs": [],
313 |    "source": [
314 |     "extractedData = []\n",
315 |     "for variant in variants:\n",
316 |     "\n",
317 |     "    # Accessing data for a single variant:\n",
318 |     "    requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)\n",
319 |     "    response = requests.get(requestUrl, headers={ \"Content-Type\" : \"application/json\"})\n",
320 |     "    \n",
321 |     "    # Testing if rsID exists:\n",
322 |     "    if not response.ok:\n",
323 |     "        print(\"[Warning] %s is not in the GWAS Catalog!!\" % variant)\n",
324 |     "        continue\n",
325 |     "    \n",
326 |     "    # Test if the returned data looks good:\n",
327 |     "    try:\n",
328 |     "        decoded = response.json()\n",
329 |     "    except:\n",
330 |     "        print(\"[Warning] Failed to encode data for %s\" % variant)\n",
331 |     "        continue\n",
332 |     "    \n",
333 |     "    for association in decoded['_embedded']['associations']:\n",
334 |     "        # extract study data:\n",
335 |     "        (accessionID, pubmedId) = getStudy(association['_links']['study']['href'])\n",
336 |     "        \n",
337 |     "        # \n",
338 |     "        trait = \",\".join([trait['trait'] for trait in association['efoTraits']])\n",
339 |     "        pvalue = association['pvalue']\n",
340 |     "        \n",
341 |     "        extractedData.append(OrderedDict({'variant' : variant,\n",
342 |     "                              'trait' : trait,\n",
343 |     "                              'pvalue' : pvalue,\n",
344 |     "                              'accessionID' : accessionID,\n",
345 |     "                              'pubmedID' : pubmedId\n",
346 |     "                             }))\n",
347 |     "        \n",
348 |     "table = pd.DataFrame.from_dict(extractedData)\n",
349 |     "table\n",
350 |     "\n",
351 |     "# The table can also be exported as an excel file:\n",
352 |     "# table.to_excel('workshop.xlsx')\n"
353 |    ]
354 |   },
355 |   {
356 |    "cell_type": "code",
357 |    "execution_count": null,
358 |    "metadata": {},
359 |    "outputs": [],
360 |    "source": []
361 |   }
362 |  ],
363 |  "metadata": {
364 |   "kernelspec": {
365 |    "display_name": "Python 3",
366 |    "language": "python",
367 |    "name": "python3"
368 |   },
369 |   "language_info": {
370 |    "codemirror_mode": {
371 |     "name": "ipython",
372 |     "version": 3
373 |    },
374 |    "file_extension": ".py",
375 |    "mimetype": "text/x-python",
376 |    "name": "python",
377 |    "nbconvert_exporter": "python",
378 |    "pygments_lexer": "ipython3",
379 |    "version": "3.6.5"
380 |   }
381 |  },
382 |  "nbformat": 4,
383 |  "nbformat_minor": 2
384 | }
385 | 


--------------------------------------------------------------------------------
/notebooks/sumstats_rest.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Summary Statistics REST API workshop\n",
  8 |     "\n",
  9 |     "The following example shows how to access and parse data from the GWAS Summary Statistics database using the REST API. This demonstrates examples, but is not exhaustive. Please refer to the [documentation](https://www.ebi.ac.uk/gwas/summary-statistics/docs/) for more details.\n",
 10 |     "\n",
 11 |     "Version: `0.1`\n",
 12 |     "\n",
 13 |     "Date: `2019 June 04`\n",
 14 |     "\n",
 15 |     "REST is language-agnostic. Here we use Python, just for the purpose of demonstration."
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "code",
 20 |    "execution_count": null,
 21 |    "metadata": {},
 22 |    "outputs": [],
 23 |    "source": [
 24 |     "# Importing required packages\n",
 25 |     "\n",
 26 |     "import requests     # Manages data transfer from the GWAS Catalog REST API\n",
 27 |     "import pandas as pd # Makes data handling easier\n",
 28 |     "import json         # Hanling the returned data type called JSON"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "## Endpoints\n",
 36 |     "\n",
 37 |     "Run the following to see the endpoints (associations, traits, studies, chromosomes):"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": null,
 43 |    "metadata": {},
 44 |    "outputs": [],
 45 |    "source": [
 46 |     "# API root address:\n",
 47 |     "api_url='https://www.ebi.ac.uk/gwas/summary-statistics/api'\n",
 48 |     "\n",
 49 |     "response = requests.get(api_url)\n",
 50 |     "\n",
 51 |     "# The returned response is a \"response\" object, from which we have to extract and parse the information:\n",
 52 |     "decoded = response.json()\n",
 53 |     "\n",
 54 |     "print(json.dumps(decoded, indent = 2))\n"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "markdown",
 59 |    "metadata": {},
 60 |    "source": [
 61 |     "## Get associations for a given variant"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": null,
 67 |    "metadata": {},
 68 |    "outputs": [],
 69 |    "source": [
 70 |     "# Accessing data for a single variant. Must be an rsID:\n",
 71 |     "variant = 'rs62402518'\n",
 72 |     "request_url = '{api}/associations/{variant}'.format(api=api_url, variant=variant)\n",
 73 |     "response = requests.get(request_url)\n",
 74 |     "decoded = response.json()\n",
 75 |     "\n",
 76 |     "print(json.dumps(decoded, indent = 2))"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "### Interpreting the response\n",
 84 |     "From the returned JSON, you can see it has 20 associations each from different studies. For each association there are the values for p-value, beta, etc. and also links to the associated trait, variant, study and self (study & variant combination). It is also paginated and has `\"_links\"` at the bottom, showing the URLs for this page (`\"self\"`) the first page (`\"first\"`) and the next page (`\"next\"`). By default it shows 20 results per page, but this can be changed using the `size=` parameter, just as you see in the first and next links i.e. `?size=20`. \n",
 85 |     "\n",
 86 |     "The same 'layout' of the data applies to all the following examples."
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "markdown",
 91 |    "metadata": {},
 92 |    "source": [
 93 |     "## Get a list of associations for a given trait"
 94 |    ]
 95 |   },
 96 |   {
 97 |    "cell_type": "code",
 98 |    "execution_count": null,
 99 |    "metadata": {},
100 |    "outputs": [],
101 |    "source": [
102 |     "# Accessing data for a specific trait (EFO term):\n",
103 |     "trait = 'EFO_0004466'\n",
104 |     "request_url = '{api}/traits/{trait}/associations'.format(api=api_url, trait=trait)\n",
105 |     "response = requests.get(request_url)\n",
106 |     "decoded = response.json()\n",
107 |     "\n",
108 |     "print(json.dumps(decoded, indent = 2))"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "## Get a list of associations for a given study\n"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": null,
121 |    "metadata": {},
122 |    "outputs": [],
123 |    "source": [
124 |     "# Accessing data for a specific study. Must be a GWAS Catalog study accession ID e.g. GCST000571:\n",
125 |     "study = 'GCST000571'\n",
126 |     "request_url = '{api}/studies/{study}/associations'.format(api=api_url, study=study)\n",
127 |     "response = requests.get(request_url)\n",
128 |     "decoded = response.json()\n",
129 |     "\n",
130 |     "print(json.dumps(decoded, indent = 2))"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "metadata": {},
136 |    "source": [
137 |     "## Get a list of associations within a genomic region\n"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "# Accessing data for a specific genomic region (e.g. chr9:132000000-133000000):\n",
147 |     "chromosome = 9\n",
148 |     "bp_lower = 132000000\n",
149 |     "bp_upper = 133000000\n",
150 |     "request_url = '{api}/chromosomes/{chrom}/associations?bp_lower={low}&bp_upper={high}'.format(api=api_url, \n",
151 |     "                                                                                             chrom=chromosome, \n",
152 |     "                                                                                             low=bp_lower,\n",
153 |     "                                                                                             high=bp_upper)\n",
154 |     "response = requests.get(request_url)\n",
155 |     "decoded = response.json()\n",
156 |     "\n",
157 |     "print(json.dumps(decoded, indent = 2))"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "markdown",
162 |    "metadata": {},
163 |    "source": [
164 |     "## Get a list of associations below a p-value threshold\n",
165 |     "\n",
166 |     "You may want to filter the associations to only those that are below a p-value threshold e.g. for a given trait such as diabetes type II (EFO_0001360) you want all the associations below p-value 1.0e-5:"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": null,
172 |    "metadata": {},
173 |    "outputs": [],
174 |    "source": [
175 |     "# Accessing data for a specific trait (EFO term) below a p-value threshold:\n",
176 |     "trait = 'EFO_0001360'\n",
177 |     "pval_upper = 1.0e-5 # can be any valid float e.g. 0.00001\n",
178 |     "request_url = '{api}/traits/{trait}/associations?p_upper={high}'.format(api=api_url, \n",
179 |     "                                                                        trait=trait,\n",
180 |     "                                                                        high=pval_upper)\n",
181 |     "response = requests.get(request_url)\n",
182 |     "decoded = response.json()\n",
183 |     "\n",
184 |     "print(json.dumps(decoded, indent = 2))"
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "markdown",
189 |    "metadata": {},
190 |    "source": [
191 |     "Let's write the above so that it returns the response in a pandas dataframe"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": null,
197 |    "metadata": {},
198 |    "outputs": [],
199 |    "source": [
200 |     "# return a pandas dataframe of results for the above example:\n",
201 |     "\n",
202 |     "extracted_data = []\n",
203 |     "size = 10\n",
204 |     "\n",
205 |     "trait = 'EFO_0001360'\n",
206 |     "pval_upper = 1.0e-5 # can be any valid float e.g. 0.00001\n",
207 |     "request_url = '{api}/traits/{trait}/associations?p_upper={high}&size={size}'.format(api=api_url, \n",
208 |     "                                                                                    trait=trait,\n",
209 |     "                                                                                    high=pval_upper,\n",
210 |     "                                                                                    size=size)\n",
211 |     "response = requests.get(request_url)\n",
212 |     "decoded = response.json()\n",
213 |     "\n",
214 |     "for i in range(0, size):\n",
215 |     "    association = decoded['_embedded']['associations'][str(i)]\n",
216 |     "\n",
217 |     "    pval = association['p_value']\n",
218 |     "    bp = association['base_pair_location']\n",
219 |     "    chrom = association['chromosome']\n",
220 |     "    ea = association['effect_allele']\n",
221 |     "    oa = association['other_allele']\n",
222 |     "    beta = association['beta']\n",
223 |     "    odds = association['odds_ratio']\n",
224 |     "    \n",
225 |     "    extracted_data.append({'trait': trait,\n",
226 |     "                           'pvalue': pval,\n",
227 |     "                           'position': bp,\n",
228 |     "                           'chromosome': chrom,\n",
229 |     "                           'effect_allele': ea,\n",
230 |     "                           'other_allele': oa,\n",
231 |     "                           'beta': beta,\n",
232 |     "                           'odds_ratio': odds\n",
233 |     "                          })\n",
234 |     "\n",
235 |     "\n",
236 |     "table = pd.DataFrame.from_dict(extracted_data)\n",
237 |     "table"
238 |    ]
239 |   }
240 |  ],
241 |  "metadata": {
242 |   "kernelspec": {
243 |    "display_name": "Python 3",
244 |    "language": "python",
245 |    "name": "python3"
246 |   },
247 |   "language_info": {
248 |    "codemirror_mode": {
249 |     "name": "ipython",
250 |     "version": 3
251 |    },
252 |    "file_extension": ".py",
253 |    "mimetype": "text/x-python",
254 |    "name": "python",
255 |    "nbconvert_exporter": "python",
256 |    "pygments_lexer": "ipython3",
257 |    "version": "3.6.5"
258 |   }
259 |  },
260 |  "nbformat": 4,
261 |  "nbformat_minor": 2
262 | }
263 | 


--------------------------------------------------------------------------------
/notebooks/workshop_01.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# GWAS Catalog and Summary Statistics REST API workshop\n",
  8 |     "\n",
  9 |     "* The following shows basic examples of how to access and parse data from the GWAS Catalog through the REST API. \n",
 10 |     "* Although this example is written in Python, any other programming language is equally good.\n",
 11 |     "* There are TWO REST APIs, the GWAS Catalog API and the Summary Statistics API.\n",
 12 |     "* Examples in other languages will be available soon.\n",
 13 |     "\n",
 14 |     "\n",
 15 |     "### Contents:\n",
 16 |     "\n",
 17 |     "* **Exercise 1**: fetching data from the API manually, via a browser\n",
 18 |     "* **Exercise 2**: fetching data for a list of variants (GWAS Catalog API)\n",
 19 |     "* **Exercise 3**: fetching summary statistics data for a genomic region (Summary Stats API)\n",
 20 |     "* **Exercise 4**: combining the two APIs"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "metadata": {},
 26 |    "source": [
 27 |     "## Exercise 1\n",
 28 |     "### _Requests are just URLs_\n",
 29 |     "\n",
 30 |     "Fetch data of a single study with accession ID [GCST001795](https://www.ebi.ac.uk/gwas/studies/GCST001795) from the GWAS Catalog REST API using a browser.   \n",
 31 |     "\n",
 32 |     "**Generate the URL:**\n",
 33 |     "\n",
 34 |     "* API URL: `https://www.ebi.ac.uk/gwas/rest/api`\n",
 35 |     "* Endpoint: `studies`\n",
 36 |     "* AccessionID: `GCST001795`\n",
 37 |     "\n",
 38 |     "**URL:**\n",
 39 |     "\n",
 40 |     "[https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795)\n",
 41 |     "\n",
 42 |     "Visit the URL in a browser to see the response from the REST API.\n",
 43 |     "\n",
 44 |     "### Understanding the returned data:\n",
 45 |     "\n",
 46 |     "* Number of simple key-value pairs eg:\n",
 47 |     "\n",
 48 |     "```json\n",
 49 |     "    \"initialSampleSize\" : \"1,656 Han Chinese ancestry cases, 3,394 Han Chinese ancestry controls\",\n",
 50 |     "    \"snpCount\" : 2100739,\n",
 51 |     "    \"imputed\" : true,\n",
 52 |     "    \"accessionId\" : \"GCST001795\",\n",
 53 |     "```\n",
 54 |     "\n",
 55 |     "* List allowing multiple elements for a key:\n",
 56 |     "\n",
 57 |     "```json\n",
 58 |     "    \"genotypingTechnologies\" : [ {\n",
 59 |     "        \"genotypingTechnology\" : \"Genome-wide genotyping array\"\n",
 60 |     "    } ],\n",
 61 |     "```\n",
 62 |     "* List where the values are themselves complex objects eg. ancestries.\n",
 63 |     "\n",
 64 |     "\n",
 65 |     "* The returned data is highly structured, easy to read for computer. \n",
 66 |     "* The same information is accessible via the UI.\n",
 67 |     "\n",
 68 |     "In the following examples we make small scripts in Python to organize this data to make is easy to read for humans."
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "markdown",
 73 |    "metadata": {},
 74 |    "source": [
 75 |     "## Exercise 2\n",
 76 |     "\n",
 77 |     "### _Retrieve data for a list of variants_\n",
 78 |     "\n",
 79 |     "1. Fetch the trait and p-value of all associations for multiple rsIDs. \n",
 80 |     "2. Organize the data in a table.\n",
 81 |     "3. Be careful, might not all rsIDs have associations!"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": null,
 87 |    "metadata": {},
 88 |    "outputs": [],
 89 |    "source": [
 90 |     "# Import required packages\n",
 91 |     "import requests     # HTTP library - manages data transfer from web resource (e.g. GWAS Catalog)\n",
 92 |     "import json         # Handling the json response\n",
 93 |     "import pandas as pd # Data analysis library, a bit like R for Python!\n",
 94 |     "\n",
 95 |     "\n",
 96 |     "# API Address:\n",
 97 |     "apiUrl = 'https://www.ebi.ac.uk/gwas/rest/api'\n",
 98 |     "\n",
 99 |     "# List of variants:\n",
100 |     "variants = ['rs142968358', 'rs62402518', 'rs12199222', 'rs7329174', 'rs9879858765']\n",
101 |     "\n",
102 |     "# Store extracted data in this list:\n",
103 |     "extractedData = []\n",
104 |     "\n",
105 |     "# Iterating over all variants:\n",
106 |     "for variant in variants:\n",
107 |     "\n",
108 |     "    # Accessing data for a single variant:\n",
109 |     "    requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)\n",
110 |     "    response = requests.get(requestUrl, headers={ \"Content-Type\" : \"application/json\"})\n",
111 |     "    \n",
112 |     "    # Testing if rsID exists:\n",
113 |     "    if not response.ok:\n",
114 |     "        print(\"[Warning] %s is not in the GWAS Catalog!!\" % variant)\n",
115 |     "        continue\n",
116 |     "    \n",
117 |     "    # Test if the returned data looks good:\n",
118 |     "    try:\n",
119 |     "        decoded = response.json()\n",
120 |     "    except:\n",
121 |     "        print(\"[Warning] Failed to encode data for %s\" % variant)\n",
122 |     "        continue\n",
123 |     "    \n",
124 |     "    for association in decoded['_embedded']['associations']:\n",
125 |     "        trait = \",\".join([trait['trait'] for trait in association['efoTraits']])\n",
126 |     "        pvalue = association['pvalue']\n",
127 |     "        \n",
128 |     "        extractedData.append({'variant' : variant,\n",
129 |     "                              'trait' : trait,\n",
130 |     "                              'pvalue' : pvalue})\n",
131 |     "        \n",
132 |     "# Format data into a table (data frame):\n",
133 |     "table = pd.DataFrame.from_dict(extractedData)\n",
134 |     "table "
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "markdown",
139 |    "metadata": {},
140 |    "source": [
141 |     "## Exercise 3\n",
142 |     "### _Summary Statistics API_\n",
143 |     "\n",
144 |     "* Get the all associations for type II diabetes mellitus: EFO_0001360\n",
145 |     "* Only show the variant ID, study accession ID and p-value\n",
146 |     "* Filter by p-value 10<sup>-9</sup>"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": [
155 |     "# API Address:\n",
156 |     "apiUrl = 'https://www.ebi.ac.uk/gwas/summary-statistics/api'\n",
157 |     "\n",
158 |     "\n",
159 |     "trait = \"EFO_0001360\"\n",
160 |     "p_upper = \"0.000000001\"\n",
161 |     "\n",
162 |     "\n",
163 |     "requestUrl = '%s/traits/%s/associations?p_upper=%s&size=10' %(apiUrl, trait, p_upper)\n",
164 |     "response = requests.get(requestUrl, headers={ \"Content-Type\" : \"application/json\"})\n",
165 |     "\n",
166 |     "# The returned response is a \"response\" object, from which we have to extract and parse the information:\n",
167 |     "decoded = response.json()\n",
168 |     "extractedData = []\n",
169 |     "\n",
170 |     "for association in decoded['_embedded']['associations'].values():\n",
171 |     "    pvalue = association['p_value']\n",
172 |     "    variant = association['variant_id']\n",
173 |     "    studyID = association['study_accession']\n",
174 |     "    \n",
175 |     "    extractedData.append({'variant' : variant,\n",
176 |     "                          'studyID': studyID,\n",
177 |     "                          'pvalue' : pvalue})    \n",
178 |     "    \n",
179 |     "ssTable = pd.DataFrame.from_dict(extractedData)\n",
180 |     "ssTable \n"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "markdown",
185 |    "metadata": {},
186 |    "source": [
187 |     "## Exercise 4\n",
188 |     "### _Combine the two APIs!_\n",
189 |     "\n",
190 |     "* Get the all associations for type II diabetes mellitus: EFO_0001360\n",
191 |     "* Only show the variant ID, study accession ID and p-value\n",
192 |     "* Filter by p-value 10<sup>-9</sup>\n",
193 |     "* Add in the pubmed ID and trait name from the study info from the GWAS Catalog"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": null,
199 |    "metadata": {},
200 |    "outputs": [],
201 |    "source": [
202 |     "def getStudy(studyLink):\n",
203 |     "    # Accessing data for a single study:\n",
204 |     "    response = requests.get(studyLink, headers={ \"Content-Type\" : \"application/json\"})\n",
205 |     "    decoded = response.json()\n",
206 |     "    \n",
207 |     "    gwasData = requests.get(decoded['_links']['gwas_catalog']['href'], headers={ \"Content-Type\" : \"application/json\"})\n",
208 |     "    decodedGwasData = gwasData.json()\n",
209 |     "\n",
210 |     "    traitName = decodedGwasData['diseaseTrait']['trait']\n",
211 |     "    pubmedId = decodedGwasData['publicationInfo']['pubmedId']\n",
212 |     "    \n",
213 |     "    return(traitName, pubmedId)\n",
214 |     "\n",
215 |     "\n",
216 |     "extractedData = []\n",
217 |     "\n",
218 |     "for association in decoded['_embedded']['associations'].values():\n",
219 |     "    pvalue = association['p_value']\n",
220 |     "    variant = association['variant_id']\n",
221 |     "    studyID = association['study_accession']\n",
222 |     "    studyLink = association['_links']['study']['href']\n",
223 |     "    traitName, pubmedId = getStudy(studyLink)\n",
224 |     "    \n",
225 |     "    extractedData.append({'variant' : variant,\n",
226 |     "                          'studyID': studyID,\n",
227 |     "                          'pvalue' : pvalue,\n",
228 |     "                          'traitName': traitName,\n",
229 |     "                          'pubmedID': pubmedId}) \n",
230 |     "\n",
231 |     "    \n",
232 |     "ssWithGWASTable = pd.DataFrame.from_dict(extractedData)\n",
233 |     "ssWithGWASTable\n"
234 |    ]
235 |   }
236 |  ],
237 |  "metadata": {
238 |   "kernelspec": {
239 |    "display_name": "Python 3",
240 |    "language": "python",
241 |    "name": "python3"
242 |   },
243 |   "language_info": {
244 |    "codemirror_mode": {
245 |     "name": "ipython",
246 |     "version": 3
247 |    },
248 |    "file_extension": ".py",
249 |    "mimetype": "text/x-python",
250 |    "name": "python",
251 |    "nbconvert_exporter": "python",
252 |    "pygments_lexer": "ipython3",
253 |    "version": "3.6.5"
254 |   }
255 |  },
256 |  "nbformat": 4,
257 |  "nbformat_minor": 2
258 | }
259 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas==0.24.2
2 | requests==2.21.0
3 | 


--------------------------------------------------------------------------------
/slides/API_slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EBISPOT/GWAS_Catalog-workshop/361c7954a85bb047b3eefaf5cf64048db14e0276/slides/API_slides.pdf


--------------------------------------------------------------------------------
/slides/README.md:
--------------------------------------------------------------------------------
 1 | # Workshop presentation materials.
 2 | 
 3 | ## Outline of API_slides.pdf presentation
 4 | 
 5 | This presentation covers the following items:
 6 | 
 7 | * What an API is, and when to use it
 8 | * How to extract data via an API
 9 | * Understand the returned data
10 | * GWAS Catalog API - access to the manually curated dataset via API
11 | * GWAS Catalog Summary stats API - access to the summary stats database.
12 | 
13 | 


--------------------------------------------------------------------------------