├── CONTRIBUTING.md ├── LICENSE ├── MAINTAINERS.md ├── README.md ├── diabetes-prediction.ipynb └── doc └── source └── images ├── flow.png ├── pixiedust_age_bmi.png ├── pixiedust_hdl_ldl.png └── pixiedust_systolic_diastolic.png /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | This is an open source project, and we appreciate your help! 4 | 5 | We use the GitHub issue tracker to discuss new features and non-trivial bugs. 6 | 7 | In addition to the issue tracker, [#journeys on 8 | Slack](https://dwopen.slack.com) is the best way to get into contact with the 9 | project's maintainers. 10 | 11 | To contribute code, documentation, or tests, please submit a pull request to 12 | the GitHub repository. Generally, we expect two maintainers to review your pull 13 | request before it is approved for merging. For more details, see the 14 | [MAINTAINERS](MAINTAINERS.md) page. 15 | 16 | Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright 2019 IBM Corp. 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | # Maintainers Guide 2 | 3 | This guide is intended for maintainers - anybody with commit access to one or 4 | more Code Pattern repositories. 5 | 6 | ## Methodology 7 | 8 | This repository does not have a traditional release management cycle, but 9 | should instead be maintained as a useful, working, and polished reference at 10 | all times. While all work can therefore be focused on the master branch, the 11 | quality of this branch should never be compromised. 12 | 13 | The remainder of this document details how to merge pull requests to the 14 | repositories. 15 | 16 | ## Merge approval 17 | 18 | The project maintainers use LGTM (Looks Good To Me) in comments on the pull 19 | request to indicate acceptance prior to merging. A change requires LGTMs from 20 | two project maintainers. If the code is written by a maintainer, the change 21 | only requires one additional LGTM. 22 | 23 | ## Reviewing Pull Requests 24 | 25 | We recommend reviewing pull requests directly within GitHub. This allows a 26 | public commentary on changes, providing transparency for all users. When 27 | providing feedback be civil, courteous, and kind. Disagreement is fine, so long 28 | as the discourse is carried out politely. If we see a record of uncivil or 29 | abusive comments, we will revoke your commit privileges and invite you to leave 30 | the project. 31 | 32 | During your review, consider the following points: 33 | 34 | ### Does the change have positive impact? 35 | 36 | Some proposed changes may not represent a positive impact to the project. Ask 37 | whether or not the change will make understanding the code easier, or if it 38 | could simply be a personal preference on the part of the author (see 39 | [bikeshedding](https://en.wiktionary.org/wiki/bikeshedding)). 40 | 41 | Pull requests that do not have a clear positive impact should be closed without 42 | merging. 43 | 44 | ### Do the changes make sense? 45 | 46 | If you do not understand what the changes are or what they accomplish, ask the 47 | author for clarification. Ask the author to add comments and/or clarify test 48 | case names to make the intentions clear. 49 | 50 | At times, such clarification will reveal that the author may not be using the 51 | code correctly, or is unaware of features that accommodate their needs. If you 52 | feel this is the case, work up a code sample that would address the pull 53 | request for them, and feel free to close the pull request once they confirm. 54 | 55 | ### Does the change introduce a new feature? 56 | 57 | For any given pull request, ask yourself "is this a new feature?" If so, does 58 | the pull request (or associated issue) contain narrative indicating the need 59 | for the feature? If not, ask them to provide that information. 60 | 61 | Are new unit tests in place that test all new behaviors introduced? If not, do 62 | not merge the feature until they are! Is documentation in place for the new 63 | feature? (See the documentation guidelines). If not do not merge the feature 64 | until it is! Is the feature necessary for general use cases? Try and keep the 65 | scope of any given component narrow. If a proposed feature does not fit that 66 | scope, recommend to the user that they maintain the feature on their own, and 67 | close the request. You may also recommend that they see if the feature gains 68 | traction among other users, and suggest they re-submit when they can show such 69 | support. 70 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | > **DISCLAIMER**: This notebook is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review. It is not intended to serve as a medical application. There is no representation as to the accuracy of the output of this application and it is presented without warranty. 2 | 3 | # Machine learning using synthesized patient health records 4 | 5 | This notebook explores how to train a machine learning model to predict type 2 diabetes using synthesized patient health records. 6 | The use of synthesized data allows us to learn about building a model without any concern about the privacy issues surrounding the use of real patient health records. 7 | 8 | When the reader has completed this Code Pattern, they will understand how to: 9 | 10 | * Prepare data using Apache Spark. 11 | * Visualize data relationships using Pixiedust. 12 | * Train a machine learning model and publish it in the Watson Machine Learning (WML) repository. 13 | * Deploy the model as a web service and use it to make predictions. 14 | 15 | ## Flow 16 | 17 | ![flow](doc/source/images/flow.png) 18 | 19 | 1. Log in to IBM Watson Studio 20 | 2. Load the provided notebook into Watson Studio 21 | 3. Load data in the notebook 22 | 4. Transform the data with Apache Spark 23 | 5. Create charts with PixieDust 24 | 6. Publish and deploy model with Watson Machine Learning 25 | 26 | # Prerequisites 27 | 28 | This project is part of a series of code patterns pertaining to a fictional health care company called Example Health. 29 | This company stores electronic health records in a database on a z/OS server. 30 | Before running the notebook, the synthesized health records must be created and loaded into this database. 31 | Another project, https://github.com/IBM/example-health-synthea, provides the steps for doing this. 32 | The records are created using a tool called [Synthea](https://github.com/synthetichealth/synthea), transformed and loaded into the database. 33 | 34 | If required, set up the [Secure Gateway service](https://console.bluemix.net/docs/services/SecureGateway/index.html#getting-started-with-sg) 35 | to provide you with a secure way to access your on-premise data source. 36 | 37 | # Steps 38 | 39 | ## Sign up for Watson Studio 40 | 41 | Sign up for [IBM Watson Studio](https://dataplatform.cloud.ibm.com). 42 | 43 | ## Create a project 44 | 45 | * Click the **Create a project** tile. 46 | * A list of project types appears. Click the **Data Science** project type. 47 | * Provide a name for the project (e.g. "diabetes-prediction") and click the **Create** button. 48 | * The project is saved in a lite object storage instance in your account. 49 | 50 | ## Create a Watson Machine Learning instance 51 | 52 | * Click on the **Settings** tab of your project. 53 | * Scroll down to **Associated Services**. 54 | * Click **Add service** and select **Watson** from the drop-down menu. 55 | * Click **Add** on the **Machine Learning** tile. 56 | * Select the lite plan and click the **Create** button. 57 | 58 | ## Add the notebook to your project 59 | 60 | * Click on the **Add to project** button. 61 | * Click on **Notebook**. 62 | * Click on **From URL**. 63 | * Fill in a name for the notebook (e.g. "diabetes-prediction"). 64 | * Copy and paste this URL into the notebook URL field: https://raw.githubusercontent.com/IBM/example-health-machine-learning/master/diabetes-prediction.ipynb 65 | * In the **Select runtime** drop-down box, choose the entry that begins with **Default Spark Python**. 66 | * Click the **Create Notebook** button. 67 | 68 | ## Run the notebook 69 | 70 | * Click on **Cell** in the menu bar and select **All Output** > **Clear** to clear out the existing notebook output. 71 | 72 | * Move your cursor to each code cell and run the code in it. Read the comments for each cell to understand what the code is doing. 73 | When the code in a cell is still running, the label to the left changes to In [*]:. 74 | Do not continue to the next cell until the code is finished running. 75 | 76 | * There are a couple of cells which you have to update to provide your credentials. 77 | 78 | * At the top of the notebook is a cell for your database credentials. 79 | * Further on you will encounter a cell for your Watson Machine Learning credentials. 80 | In order to find these, click on the hamburger menu at the top left of the screen and select **Watson Services**. 81 | Click on your machine learning instance and then click on the **Service Credentials** tab. 82 | Click on **View Credentials**. 83 | 84 | 85 | # Sample output 86 | 87 | The notebook uses Pixiedust to visualize relationships between the data. 88 | Here are examples of scatter plots that it can produce. 89 | 90 | * HDL/LDL cholesterol for diabetics vs non-diabetics. 91 | The diabetes simulation in Synthea uses a distinct range of HDL readings for diabetic vs. non-diabetic patients. 92 | This makes the correlation of cholesterol readings to diabetes abnormally high. 93 | 94 | ![cholesterol-chart](doc/source/images/pixiedust_hdl_ldl.png) 95 | 96 | * Systolic/diastolic blood pressure for diabetics vs non-diabetics. 97 | The diabetes simulation in Synthea increases the chance of high blood pressure (hypertension) for diabetics 98 | but the non-diabetic patients also can have high blood pressure. Therefore the correlation 99 | of high blood pressure to diabetes isn't very strong. 100 | 101 | ![bloodpressure-chart](doc/source/images/pixiedust_systolic_diastolic.png) 102 | 103 | * Body mass index for diabetics vs non-diabetics. 104 | The diabetes simulation in Synthea does not change the weight of any diabetic patients so BMI has no correlation. 105 | 106 | ![bmi-chart](doc/source/images/pixiedust_age_bmi.png) 107 | 108 | 109 | ## License 110 | 111 | This code pattern is licensed under the Apache License, Version 2. 112 | Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. 113 | Contributions are subject to the [Developer Certificate of Origin, Version 1.1](https://developercertificate.org/) and the [Apache License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). 114 | 115 | [Apache License FAQ](https://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN) 116 | -------------------------------------------------------------------------------- /diabetes-prediction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Diabetes prediction using synthesized health records\n", 8 | "\n", 9 | "This notebook explores how to train a machine learning model to predict type 2 diabetes using synthesized patient health records. The use of synthesized data allows us to learn about building a model without any concern about the privacy issues surrounding the use of real patient health records.\n", 10 | "\n", 11 | "## Prerequisites\n", 12 | "\n", 13 | "This project is part of a series of code patterns pertaining to a fictional health care company called Example Health. This company stores electronic health records in a database on a z/OS server. Before running the notebook, the synthesized health records must be created and loaded into this database. Another project, https://github.com/IBM/example-health-synthea, provides the steps for doing this. The records are created using a tool called Synthea (https://github.com/synthetichealth/synthea), transformed and loaded into the database." 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Load and prepare the data" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "### Set up the information needed for a JDBC connection to your database below\n", 28 | "The database must be set up by following the instructions in https://github.com/IBM/example-health-synthea." 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 5, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "credentials_1 = {\n", 38 | " 'host':'xxx.yyy.com',\n", 39 | " 'port':'nnnn',\n", 40 | " 'username':'user',\n", 41 | " 'password':'password',\n", 42 | " 'database':'location',\n", 43 | " 'schema':'SMHEALTH'\n", 44 | "}" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "### Define a function to load data from a database table into a Spark dataframe\n", 52 | "\n", 53 | "The partitionColumn, lowerBound, upperBound, and numPartitions options are used to load the data more quickly\n", 54 | "using multiple JDBC connections. The data is partitioned by patient id. It is assumed that there are approximately\n", 55 | "5000 patients in the database. If there are more or less patients, adjust the upperBound value appropriately." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 6, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "def load_data_from_database(table_name):\n", 65 | " return (\n", 66 | " spark.read.format(\"jdbc\").options(\n", 67 | " driver = \"com.ibm.db2.jcc.DB2Driver\",\n", 68 | " url = \"jdbc:db2://\" + credentials_1[\"host\"] + \":\" + credentials_1[\"port\"] + \"/\" + credentials_1[\"database\"],\n", 69 | " user = credentials_1[\"username\"], \n", 70 | " password = credentials_1[\"password\"], \n", 71 | " dbtable = credentials_1[\"schema\"] + \".\" + table_name,\n", 72 | " partitionColumn = \"patientid\",\n", 73 | " lowerBound = 1,\n", 74 | " upperBound = 5000,\n", 75 | " numPartitions = 10\n", 76 | " ).load()\n", 77 | " )" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "### Read patient observations from the database\n", 85 | "\n", 86 | "The observations include things like blood pressure and cholesterol readings which are potential features for our model." 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 7, 92 | "metadata": {}, 93 | "outputs": [ 94 | { 95 | "name": "stdout", 96 | "output_type": "stream", 97 | "text": [ 98 | "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", 99 | "|PATIENTID|DATEOFOBSERVATION| CODE| DESCRIPTION|NUMERICVALUE|CHARACTERVALUE| UNITS|\n", 100 | "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", 101 | "| 222| 2019-01-26|8302-2 | Body Height| 49.00| | cm|\n", 102 | "| 222| 2019-01-26|72514-3 |Pain severity - 0...| 1.70| |{score}|\n", 103 | "| 222| 2019-01-26|29463-7 | Body Weight| 4.50| | kg|\n", 104 | "| 222| 2019-01-26|6690-2 |Leukocytes [#/vol...| 5.10| |10*3/uL|\n", 105 | "| 222| 2019-01-26|789-8 |Erythrocytes [#/v...| 5.10| |10*6/uL|\n", 106 | "+---------+-----------------+--------+--------------------+------------+--------------+-------+\n", 107 | "only showing top 5 rows\n", 108 | "\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "observations_df = load_data_from_database(\"OBSERVATIONS\")\n", 114 | "\n", 115 | "observations_df.show(5)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### The observations table has a generalized format with a separate row per observation\n", 123 | "\n", 124 | "Let's collect the observations that may be of interest in making a diabetes prediction.\n", 125 | "First, select systolic blood pressure readings from the observations. These have code 8480-6." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 8, 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "name": "stdout", 135 | "output_type": "stream", 136 | "text": [ 137 | "+---------+-----------------+--------+\n", 138 | "|patientid|dateofobservation|systolic|\n", 139 | "+---------+-----------------+--------+\n", 140 | "| 222| 2019-03-02| 101.30|\n", 141 | "| 72| 2009-05-16| 122.70|\n", 142 | "| 72| 2010-05-22| 129.10|\n", 143 | "| 72| 2011-05-28| 109.00|\n", 144 | "| 72| 2012-06-02| 135.40|\n", 145 | "+---------+-----------------+--------+\n", 146 | "only showing top 5 rows\n", 147 | "\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "from pyspark.sql.functions import col\n", 153 | "\n", 154 | "systolic_observations_df = (\n", 155 | " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", 156 | " .withColumnRenamed(\"numericvalue\", \"systolic\")\n", 157 | " .filter((col(\"code\") == \"8480-6\"))\n", 158 | " )\n", 159 | "\n", 160 | "\n", 161 | "systolic_observations_df.show(5)" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "### Select other observations of potential interest\n", 169 | "\n", 170 | "* Select diastolic blood pressure readings (code 8462-4).\n", 171 | "* Select HDL cholesterol readings (code 2085-9).\n", 172 | "* Select LDL cholesterol readings (code 18262-6).\n", 173 | "* Select BMI (body mass index) readings (code 39156-5)." 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 9, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "diastolic_observations_df = (\n", 183 | " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", 184 | " .withColumnRenamed('numericvalue', 'diastolic')\n", 185 | " .filter((col(\"code\") == \"8462-4\"))\n", 186 | " )\n", 187 | "\n", 188 | "hdl_observations_df = (\n", 189 | " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", 190 | " .withColumnRenamed('numericvalue', 'hdl')\n", 191 | " .filter((col(\"code\") == \"2085-9\"))\n", 192 | " )\n", 193 | "\n", 194 | "ldl_observations_df = (\n", 195 | " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", 196 | " .withColumnRenamed('numericvalue', 'ldl')\n", 197 | " .filter((col(\"code\") == \"18262-6\"))\n", 198 | " )\n", 199 | "\n", 200 | "bmi_observations_df = (\n", 201 | " observations_df.select(\"patientid\", \"dateofobservation\", \"numericvalue\")\n", 202 | " .withColumnRenamed('numericvalue', 'bmi')\n", 203 | " .filter((col(\"code\") == \"39156-5\"))\n", 204 | " )" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "### Join the observations for each patient by date into one dataframe" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 10, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "name": "stdout", 221 | "output_type": "stream", 222 | "text": [ 223 | "+---------+-----------------+--------+---------+-----+------+-----+\n", 224 | "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi|\n", 225 | "+---------+-----------------+--------+---------+-----+------+-----+\n", 226 | "| 4| 2011-12-17| 105.10| 77.10|71.00| 86.50|57.70|\n", 227 | "| 157| 2014-07-16| 138.00| 83.70|21.10|181.40|37.90|\n", 228 | "| 230| 2010-04-23| 164.70| 117.90|26.20|147.90|35.20|\n", 229 | "| 244| 2015-04-01| 119.00| 84.30|77.60| 96.20|25.50|\n", 230 | "| 290| 2018-08-21| 130.60| 70.90|73.90| 77.80|47.10|\n", 231 | "+---------+-----------------+--------+---------+-----+------+-----+\n", 232 | "only showing top 5 rows\n", 233 | "\n" 234 | ] 235 | } 236 | ], 237 | "source": [ 238 | "merged_observations_df = (\n", 239 | " systolic_observations_df.join(diastolic_observations_df, [\"patientid\", \"dateofobservation\"])\n", 240 | " .join(hdl_observations_df, [\"patientid\", \"dateofobservation\"])\n", 241 | " .join(ldl_observations_df, [\"patientid\", \"dateofobservation\"])\n", 242 | " .join(bmi_observations_df, [\"patientid\", \"dateofobservation\"])\n", 243 | ")\n", 244 | "\n", 245 | "merged_observations_df.show(5)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "### Another possible feature is the patient's age at the time of observation\n", 253 | "\n", 254 | "Load the patients' birth dates from the database into a dataframe." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 11, 260 | "metadata": {}, 261 | "outputs": [ 262 | { 263 | "name": "stdout", 264 | "output_type": "stream", 265 | "text": [ 266 | "+---------+-----------+\n", 267 | "|patientid|dateofbirth|\n", 268 | "+---------+-----------+\n", 269 | "| 1| 2017-07-04|\n", 270 | "| 2| 1965-04-14|\n", 271 | "| 3| 1996-09-14|\n", 272 | "| 4| 1958-11-29|\n", 273 | "| 5| 1979-01-28|\n", 274 | "+---------+-----------+\n", 275 | "only showing top 5 rows\n", 276 | "\n" 277 | ] 278 | } 279 | ], 280 | "source": [ 281 | "patients_df = load_data_from_database(\"PATIENT\").select(\"patientid\", \"dateofbirth\")\n", 282 | "\n", 283 | "patients_df.show(5)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "Add a column containing the patient's age to the merged observations." 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 12, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "name": "stdout", 300 | "output_type": "stream", 301 | "text": [ 302 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", 303 | "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi| age|\n", 304 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", 305 | "| 463| 2016-02-13| 136.90| 81.10|66.60|76.20|35.80|55.57808219178082|\n", 306 | "| 463| 2013-01-26| 113.40| 77.50|77.30|91.40|35.80|52.52876712328767|\n", 307 | "| 463| 2019-03-02| 123.60| 71.60|73.80|95.50|35.80|58.62739726027397|\n", 308 | "| 463| 2010-01-09| 113.50| 70.60|71.20|76.00|35.80|49.47945205479452|\n", 309 | "| 471| 2017-07-12| 155.60| 99.00|59.00|83.70|38.30|35.19178082191781|\n", 310 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+\n", 311 | "only showing top 5 rows\n", 312 | "\n" 313 | ] 314 | } 315 | ], 316 | "source": [ 317 | "from pyspark.sql.functions import datediff\n", 318 | "\n", 319 | "merged_observations_with_age_df = (\n", 320 | " merged_observations_df.join(patients_df, \"patientid\")\n", 321 | " .withColumn(\"age\", datediff(col(\"dateofobservation\"), col(\"dateofbirth\"))/365)\n", 322 | " .drop(\"dateofbirth\")\n", 323 | " )\n", 324 | "\n", 325 | "merged_observations_with_age_df.show(5)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "### Find the patients that have been diagnosed with type 2 diabetes\n", 333 | "\n", 334 | "The conditions table contains the conditions that patients have and the date they were diagnosed.\n", 335 | "Load the patient conditions table and select the patients that have been diagnosed with type 2 diabetes.\n", 336 | "Keep the date they were diagnosed (\"start\" column)." 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 13, 342 | "metadata": {}, 343 | "outputs": [ 344 | { 345 | "name": "stdout", 346 | "output_type": "stream", 347 | "text": [ 348 | "+---------+----------+\n", 349 | "|patientid| start|\n", 350 | "+---------+----------+\n", 351 | "| 66|2003-06-28|\n", 352 | "| 281|2012-07-20|\n", 353 | "| 230|2008-04-18|\n", 354 | "| 157|1994-12-28|\n", 355 | "| 251|2011-02-11|\n", 356 | "+---------+----------+\n", 357 | "only showing top 5 rows\n", 358 | "\n" 359 | ] 360 | } 361 | ], 362 | "source": [ 363 | "diabetics_df = (\n", 364 | " load_data_from_database(\"CONDITIONS\")\n", 365 | " .select(\"patientid\", \"start\")\n", 366 | " .filter(col(\"description\") == \"Diabetes\")\n", 367 | ")\n", 368 | "\n", 369 | "diabetics_df.show(5)" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "### Create a \"diabetic\" column which is the \"label\" for the model to predict\n", 377 | "\n", 378 | "Join the merged observations with the diabetic patients.\n", 379 | "This is a left join so that we keep all observations for both diabetic and non-diabetic patients.\n", 380 | "Create a new column with a binary value, 1=diabetic, 0=non-diabetic.\n", 381 | "This will be the label for the model (the value it is trying to predict)." 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 14, 387 | "metadata": {}, 388 | "outputs": [ 389 | { 390 | "name": "stdout", 391 | "output_type": "stream", 392 | "text": [ 393 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", 394 | "|patientid|dateofobservation|systolic|diastolic| hdl| ldl| bmi| age|start|diabetic|\n", 395 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", 396 | "| 463| 2013-01-26| 113.40| 77.50|77.30|91.40|35.80|52.52876712328767| null| 0|\n", 397 | "| 463| 2010-01-09| 113.50| 70.60|71.20|76.00|35.80|49.47945205479452| null| 0|\n", 398 | "| 463| 2016-02-13| 136.90| 81.10|66.60|76.20|35.80|55.57808219178082| null| 0|\n", 399 | "| 463| 2019-03-02| 123.60| 71.60|73.80|95.50|35.80|58.62739726027397| null| 0|\n", 400 | "| 471| 2017-07-12| 155.60| 99.00|59.00|83.70|38.30|35.19178082191781| null| 0|\n", 401 | "+---------+-----------------+--------+---------+-----+-----+-----+-----------------+-----+--------+\n", 402 | "only showing top 5 rows\n", 403 | "\n" 404 | ] 405 | } 406 | ], 407 | "source": [ 408 | "from pyspark.sql.functions import when\n", 409 | "\n", 410 | "observations_and_condition_df = (\n", 411 | " merged_observations_with_age_df.join(diabetics_df, \"patientid\", \"left_outer\")\n", 412 | " .withColumn(\"diabetic\", when(col(\"start\").isNotNull(), 1).otherwise(0))\n", 413 | ")\n", 414 | "\n", 415 | "observations_and_condition_df.show(5)" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "### Filter the observations for diabetics to remove those taken before diagnosis\n", 423 | "\n", 424 | "This is driven by the way that the diabetes simulation works in Synthea. The impact of the condition (diabetes) is not reflected in the observations until the patient is diagnosed with the condition in a wellness visit. Prior to that the patient's observations won't be any different from a non-diabetic patient. Therefore we want only the observations at the time the patients were diabetic." 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 15, 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [ 433 | "observations_and_condition_df = (\n", 434 | " observations_and_condition_df.filter((col(\"diabetic\") == 0) | ((col(\"dateofobservation\") >= col(\"start\"))))\n", 435 | ")" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "### Reduce the observations to a single observation per patient (the earliest available observation)" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 16, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "from pyspark.sql.window import Window\n", 452 | "from pyspark.sql.functions import row_number\n", 453 | "\n", 454 | "w = Window.partitionBy(observations_and_condition_df[\"patientid\"]).orderBy(merged_observations_df[\"dateofobservation\"].asc())\n", 455 | "\n", 456 | "first_observation_df = observations_and_condition_df.withColumn(\"rn\", row_number().over(w)).where(col(\"rn\") == 1).drop(\"rn\")" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": {}, 462 | "source": [ 463 | "## Visualize data\n", 464 | "\n", 465 | "At this point we have collected some observations which might be relevant to making a diabetes prediction. The next step is to look for relationships between those observations and having diabetes. There are many tools that help visualize data to look for relationships. One of the easiest ones to use is called Pixiedust (https://github.com/pixiedust/pixiedust).\n", 466 | "\n", 467 | "Install the pixiedust visualization tool." 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": 17, 473 | "metadata": {}, 474 | "outputs": [], 475 | "source": [ 476 | "# !pip install --upgrade pixiedust" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "### Use Pixiedust to visualize whether observations correlate with diabetes\n", 484 | "\n", 485 | "The PixieDust interactive widget appears when you run this cell.\n", 486 | "* Click the chart button and choose Scatter Plot.\n", 487 | "* Click the chart options button. Drag \"ldl\" into the Keys box and drag \"hdl\" into the Values box.\n", 488 | "Set the # of Rows to Display to 5000. Click OK to close the chart options.\n", 489 | "* Select bokeh from the Renderer dropdown menu.\n", 490 | "* Select diabetic from the Color dropdown menu.\n", 491 | "\n", 492 | "The scatter plot chart appears.\n", 493 | "\n", 494 | "Click Options and try replacing \"ldl\" and \"hdl\" with other attributes." 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 18, 500 | "metadata": { 501 | "pixiedust": { 502 | "displayParams": { 503 | "chartsize": "100", 504 | "color": "diabetic", 505 | "handlerId": "scatterPlot", 506 | "keyFields": "ldl", 507 | "rendererId": "bokeh", 508 | "rowCount": "1000", 509 | "valueFields": "hdl" 510 | } 511 | } 512 | }, 513 | "outputs": [ 514 | { 515 | "data": { 516 | "text/html": [ 517 | "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", 518 | "
\n", 519 | " \n", 520 | "
\n", 521 | "
\n", 522 | " \n", 523 | " \n", 717 | "
\n", 753 | "
\n", 754 | "
Pan
Box Zoom
Wheel Zoom
Save
Reset
Hover
Click the question mark to learn more about Bokeh plot tools.
\n", 755 | "
\n", 756 | " \n", 757 | " \n", 758 | "
" 759 | ], 760 | "text/plain": [ 761 | "" 762 | ] 763 | }, 764 | "metadata": {}, 765 | "output_type": "display_data" 766 | } 767 | ], 768 | "source": [ 769 | "import pixiedust\n", 770 | "\n", 771 | "display(first_observation_df)" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "## Build and train the model\n", 779 | "\n", 780 | "The visualization of the data showed that the strongest predictors of diabetes are the cholesterol observations. This is an artifact of the diabetes simulation used to create the synthesized data. The simulation uses a distinct range of HDL readings for diabetic vs. non-diabetic patients.\n", 781 | "\n", 782 | "The simulation increases the chance of high blood pressure (hypertension) for diabetics but the non-diabetic patients also can have high blood pressure. Therefore the correlation of high blood pressure to diabetes isn't very strong.\n", 783 | "\n", 784 | "The simulation does not change the weight of any diabetic patients so BMI has no correlation.\n", 785 | "\n", 786 | "Let's continue using HDL and systolic blood pressure as the features for the model. In reality more features would be needed to build a usable model.\n", 787 | "\n", 788 | "Create a pipeline that assembles the feature columns and runs a logistic regression algorithm. Then use the observation data to train the model." 789 | ] 790 | }, 791 | { 792 | "cell_type": "code", 793 | "execution_count": 19, 794 | "metadata": {}, 795 | "outputs": [], 796 | "source": [ 797 | "from pyspark.ml.feature import VectorAssembler\n", 798 | "from pyspark.ml.classification import LogisticRegression\n", 799 | "from pyspark.ml import Pipeline\n", 800 | "\n", 801 | "vectorAssembler_features = VectorAssembler(inputCols=[\"hdl\", \"systolic\"], outputCol=\"features\")\n", 802 | "\n", 803 | "lr = LogisticRegression(featuresCol = 'features', labelCol = 'diabetic', maxIter=10)\n", 804 | "\n", 805 | "pipeline = Pipeline(stages=[vectorAssembler_features, lr])" 806 | ] 807 | }, 808 | { 809 | "cell_type": "markdown", 810 | "metadata": {}, 811 | "source": [ 812 | "### Split the observation data into two portions\n", 813 | "\n", 814 | "The larger portion (80% of the data) is used to train the model.\n", 815 | "The smaller portion (20% of the data) is used to test the model." 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": 20, 821 | "metadata": {}, 822 | "outputs": [], 823 | "source": [ 824 | "split_data = first_observation_df.randomSplit([0.8, 0.2], 24)\n", 825 | "train_data = split_data[0]\n", 826 | "test_data = split_data[1]" 827 | ] 828 | }, 829 | { 830 | "cell_type": "markdown", 831 | "metadata": {}, 832 | "source": [ 833 | "### Train the model" 834 | ] 835 | }, 836 | { 837 | "cell_type": "code", 838 | "execution_count": 21, 839 | "metadata": {}, 840 | "outputs": [], 841 | "source": [ 842 | "model = pipeline.fit(train_data)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "markdown", 847 | "metadata": {}, 848 | "source": [ 849 | "## Evaluate the model\n", 850 | "\n", 851 | "One way to evaluate the model is to plot a precision/recall curve.\n", 852 | "\n", 853 | "Precision measures the percentage of the predicted true outcomes that are actually true.\n", 854 | "\n", 855 | "Recall measures the percentage of the actual true conditions that are predicted as true.\n", 856 | "\n", 857 | "Ideally we want both precision and recall to be 100%.\n", 858 | "We want all of the diabetes predictions to actually have diabetes (precision = 1.0).\n", 859 | "We want all of the actual diabetics to be predicted to be diabetic (recall = 1.0).\n", 860 | "\n", 861 | "The model computes the probability of a true condition and then compares that to a threshold\n", 862 | "(by default 0.5) to make a final true of false determination. The precision/recall curve plots\n", 863 | "precision and recall at various threhold values." 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 22, 869 | "metadata": {}, 870 | "outputs": [ 871 | { 872 | "data": { 873 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl8VPW9//HXJ/tCEghJWAMRZBFRUALugnUpWiu1WrdatbXVbtrWaq/t7f3Va2+XW9vaq9ZWXKr1ttpqby0qlqpVQQUkFEFAdgKENWELJGT//P6YIU0hkAEycyaZ9/PxmEdmOTN5HxLmnfM9Z77H3B0RERGApKADiIhI/FApiIhIK5WCiIi0UimIiEgrlYKIiLRSKYiISCuVgoiItFIpiIhIK5WCiIi0Sgk6wJEqKCjwkpKSoGOIiHQp8+fPr3L3wo6W63KlUFJSQllZWdAxRES6FDNbF8lyGj4SEZFWKgUREWmlUhARkVYqBRERaaVSEBGRVlErBTN7wsy2mdniQzxuZvaAma0ys0Vmdmq0soiISGSiuaXwJDD5MI9fDAwLX24BfhXFLCIiEoGofU7B3WeaWclhFpkC/NZD5wOdY2Y9zayfu2+ORp555TuYtaLyiJ+XnprMNeOL6d0jPQqpRETiS5AfXhsAbGhzuyJ830GlYGa3ENqaYNCgQUf1zf6xbicPvrHqiJ/nDk+9W84vrh7LmccXHNX3FhHpKoIsBWvnPm9vQXefCkwFKC0tbXeZjtw6cSi3Thx6xM9buqma2575B59+fC5fmjiUb1w4nNRk7Z8Xke4pyHe3CqC4ze2BwKaAshzSqP65vHjb2VxdWszDb67mqkdms2FHbdCxRESiIshSmAbcED4K6XRgd7T2JxyrrLQUfnzFyTx47Sms2rqXS/5nFi8ujLv+EhE5ZlEbPjKzZ4BJQIGZVQDfA1IB3P3XwHTgEmAVUAt8NlpZOsvHx/RnbHFPbn92Abc9s4C3V1bxvctGkZUWm1G4puYWtu6pZ9OufWzatY+N4a+1Dc185vTBnDKoV0xyiEj3ZaGDf7qO0tJSD3qW1MbmFn7x2goefnM1xxVk8+C1p3Bi/7xjft3qusY2b/h1rddDlzq2VNfR3PKvP69eWak0tzjVdU1cenI/vvXRkQzqnXXMWUSkezGz+e5e2uFyKoWj9+6qKr7+h/fZVdvIdy4ZyY1nlmDW3v7zgzW3OB9urmZe+Q7Kyncyr3wH2/bU/8syqclGv7xM+vfMoH/PTAb0zKR/+BK6nkFWWgp765uY+tZqps5aQ3OLc8MZJdz2kePpmZUWjdWOipYWZ09dEztrG9hZ28Cu2kZqG5oZN7gXffMygo4n0uWpFGJk+9567np+EX9fto0LTijiJ1eOIT/74DfjfQ3NvL9hF/PKdzCvfAcL1u9ib30TAAN6ZlJa0otR/XIZ0Oufb/qFPdJJSoqsZAC27K7j568u57n5FeSkp3DbR4Zxw5mDSU9J7rT1jVRNfRPb9tSztbqO7Xv3v9E3sLO2kV21jeHroTf/nbUN7N7XSMshfhVPGdSTi0f35eLR/SjO11aQyNFQKcSQu/Pku+X8aPoyemWncv/VYxnZN5ey8h2UrdvJe2t3sHjjbppaHDMY0SeH0pJejC/Jp7QknwE9Mzs1z7It1fxo+jLeWlFJcX4md310JB8/uV/EWzGH4h4apqrcU8e26nq2hr9u2xO+VNe1fq1paG73NbLSkumVlUbPrFR6ZaWRl5VKr/D1nllp9MxMpVd2Kj2z0khJMmauqOSVxVtYsqkagBP753Lx6L5MHt2P44t6HNP6iCQSlUIAlmzazW3PLGBNZU3rfWnJSYwpzqO0JJ/xJb0YNyifvKzUmOSZtbKSH05fxoebqxkzMI/vXHICpw3pfdjn1DU2s35HLeVVNazfUcu67bWUb69hw45atlTXUdfYctBzMlOT6ZObTlFOBoW56fTJyaAoN52inHT65GZQ0COdXlmp5GWlHvVWy/rttfx1yWZeWbyFBet3AXB8UY9wQfRlVL/cYy49ke5MpRCQ2oYmHp25ltQUY3xJPicNyCMjNfbDN/s1tzh/XrCRn85YzpbqOi4c1YevXzCMlhYo317TWgDrdtSyfnvojb+tnIwUSnpnM6h3Fv1yM+iTG3rDLwy/4RflpNMjPSWmb8hbdtcxY8kWXlm8mffW7qDFYUhhNvdeNpqzh+lT5yLtUSnIv9jX0MwT76zlV2+ubt2XsV9hTjolvbMYlJ/N4N5Z4Us2g/Oz6JmVGtd/gVftrefVpVt5dNYa1lTW8JnTB/PtS0bG7DBhka5CpSDtqtpbz4wlW+idnU5JQRaD8rO6xRtoXWMz981YzhPvrGVQfhY//dQYxpfkBx1LJG6oFCQhzV2znTufX0jFzn184Zwh3HHh8ECH70TiRaSloJndpFs5bUhv/vq1c7luwiCmzlzDpQ++zaKKXUHHEukyVArS7WSnp/CDy0/iqc9NYG9dE5c//C4/f3UFDU0HHzklIv9KpSDd1sThhcz4xrlMGdufB15fyeUPv8PKrXuCjiUS11QK0q3lZaby86vG8shnxrFldx23/u98utp+NJFYUilIQvjoiX25++KRrKms4b21O4KOIxK3VAqSMD52cj9y0lN4dt6GjhcWSVAqBUkYWWkpTDmlP9M/2Mzu2sag44jEJZWCJJRrxg+ivqmFF97fGHQUkbikUpCEMnpAHqMH5PLMe+u1w1mkHSoFSTjXjB/Esi17WFixO+goInFHpSAJ57Kx/clMTeYP89YHHUUk7qgUJOHkZqTysZP7Me39TdQcMGOsSKJTKUhCunZCMTUNzby4cFPQUUTiikpBEtKpg3oxrKiHPrMgcgCVgiQkM+Pq8cW8v2EXy7ZUBx1HJG6oFCRhXXHqQFKTjRcWaAhJZD+VgiSsXtlpDOiZycZd+4KOIhI3VAqS0Ap6pFO1pz7oGCJxQ6UgCa13jzSq9qoURPZTKUhCK+iRzvaahqBjiMQNlYIktIIe6eysbaCpWafqFAGVgiS4gh5puMMObS2IACoFSXAFPdIBqNqrUhABlYIkuIKc/aWgnc0ioFKQBNc7Ow1QKYjsp1KQhLZ/S2G7ho9EAJWCJLic9BTSUpK0pSASFtVSMLPJZrbczFaZ2d3tPD7IzN4wswVmtsjMLolmHpEDmRkF2WlUqhREgCiWgpklA78ELgZGAdea2agDFvsu8Ed3PwW4Bng4WnlEDqUgJ11HH4mERXNLYQKwyt3XuHsD8Cww5YBlHMgNX88DNF2lxNyIPjnMXl3Fu6urgo4iErholsIAoO0ZTCrC97V1D3C9mVUA04HbophHpF3fvXQUJb2zufXp+SzfsifoOCKBimYpWDv3+QG3rwWedPeBwCXA02Z2UCYzu8XMysysrLKyMgpRJZHlZaby5OcmkJmazGd/8x5bq+uCjiQSmGiWQgVQ3Ob2QA4eHroZ+COAu88GMoCCA1/I3ae6e6m7lxYWFkYpriSyAT0zeeKm8eze18hNv5nHnrrGoCOJBCKapTAPGGZmx5lZGqEdydMOWGY9cD6AmZ1AqBS0KSCBGD0gj4evH8eKrXv48u/+QaMmyZMEFLVScPcm4KvADOBDQkcZLTGze83ssvBi3wS+YGYLgWeAm9z9wCEmkZiZOLyQH11+ErNWVvGd//sA/TpKokmJ5ou7+3RCO5Db3vf/2lxfCpwVzQwiR+qq8cVU7NrHA6+vZFB+FredPyzoSCIxo080i7TjGxcM4xNj+3P/aytYsH5n0HFEYkalINIOM+PeT4ymT24Gdz63kLrG5qAjicSESkHkEHIzUvnxFSezurKG+19bEXQckZhQKYgcxsThhVwzvphHZ67hHxpGkgSgUhDpwL9/7AT65mZwl4aRJAGoFEQ6kNN2GOlVDSNJ96ZSEInAucMLuXbCIB6dtYbXlm7V5xek24rq5xREupPvXDKSd1dX8fnfljGkIJsrSwdyxakD6ZObEXQ0kU5jXe0vntLSUi8rKws6hiSomvompn+wmefKKnivfAdJBpNGFHFV6UA+MrIPaSna+Jb4ZGbz3b20w+VUCiJHZ21VDc/P38Dz8yvYWl1PfnYak0f35cyhvTntuN4Uhs//LBIPVAoiMdLc4sxcWclzZRt4a3klNQ2hI5SGFmZz+pDenDakN6cfl0+RhpkkQCoFkQA0NbeweFM1c9ZsZ+6a7cwr38ne+iYAhhRkc9bxBdx98Uiy07U7T2Ir0lLQb6ZIJ0pJTmJscU/GFvfkixOH0tTcwpJN1cxdu503l1fy9Jx1TBxeyAWj+gQdVaRd2ismEkUpyUmMKe7JLecO5aHrTgWgfHtNwKlEDk2lIBIjvbJSyclIYd322qCjiBySSkEkRsyMkt7Z2lKQuKZSEImhwb2ztKUgcU2lIBJDJb2zqdhZS0OTzv8s8UmlIBJDg3tn0eKwcde+oKOItEulIBJDJQXZgI5AkvilUhCJoZLeoVJ4vqxC52aQuKRSEImhwpx0vnb+MF7+YDNX/vpdNuzQTmeJLyoFkRj7xoXDeeyGUtZtr+XSB9/m78u2Bh1JpJVKQSQAF4zqw8u3ncPAXpl87skyfjpjOS0tXWseMumeVAoiARnUO4s/felMri4t5qE3VvHwm6uCjiSiCfFEgpSRmsyPrziJ2sZm7n9tJWcMLWDc4F5Bx5IEpi0FkYCZGT+4fDT98jL42rMLqK5rDDqSJDCVgkgcyM1I5YFrT2Hz7jq+++fFdLXznEj3oVIQiROnDurFNy4YxrSFm/jTPzYGHUcSVMSlYGYDzOxMMzt3/yWawUQS0ZcmHc+E4/L5/ktLqW1oCjqOJKCISsHM/ht4B/gucFf4cmcUc4kkpOQk498mj2D3vkb+NL8i6DiSgCI9+ugTwAh3r49mGBEJDSONKe7JE++U8+nTBpOUZEFHkgQS6fDRGiA1mkFEJMTMuPns41hbVcPfl20LOo4kmEi3FGqB983sdaB1a8Hdb49KKpEEd/HovvTPy+Cxt9dwwag+QceRBBJpKUwLX0QkBlKTk7jxzBJ+9Moylm2pZmTf3KAjSYKIaPjI3Z8CngHmhy+/D993WGY22cyWm9kqM7v7EMtcZWZLzWyJmf3+SMKLdGefOGUAAG+vrAo4iSSSiLYUzGwS8BRQDhhQbGY3uvvMwzwnGfglcCFQAcwzs2nuvrTNMsOAbwNnuftOMys62hUR6W765GYwsFcm/1i/M+gokkAiHT76GXCRuy8HMLPhhLYcxh3mOROAVe6+JvycZ4EpwNI2y3wB+KW77wRwd+1VE2mjdHAv3l29HXfHTEchSfRFevRR6v5CAHD3FXR8NNIAYEOb2xXh+9oaDgw3s3fMbI6ZTY4wj0hCGDe4F9v21FOxU+d0ltiItBTKzOxxM5sUvjxKaN/C4bT3Z82BE7qkAMOAScC1wGNm1vOgFzK7xczKzKyssrIywsgiXd+4wfkATP9gs+ZDkpiItBS+BCwBbge+RmgI6IsdPKcCKG5zeyCwqZ1l/uLuje6+FlhOqCT+hbtPdfdSdy8tLCyMMLJI1zeibw4nD8zjR68s49OPzWXJpt1BR5JuzqL114eZpQArgPOBjcA84Dp3X9JmmcnAte5+o5kVAAuAse6+/VCvW1pa6mVlZVHJLBKPGptb+P3c9dz/2gp272vkU+MGcudFIyjKzQg6mnQhZjbf3Us7Wu6wO5rN7I/ufpWZfcDBQz+4+8mHeq67N5nZV4EZQDLwhLsvMbN7gTJ3nxZ+7CIzWwo0A3cdrhBEEtH+zyx8YuwAHnpjJU++W86LCzdzzrACzhleyLnDChjcOzvomNJNHHZLwcz6uftmMxvc3uPuvi5qyQ5BWwqS6NZtr+GRmWt4a3klG3eFdkAPys/inGEFTBpRxPkjizRfkhwk0i2FiIaPzCwb2OfuLeHDUUcCr7h7zE8RpVIQCXF31lbVMGtlFbNWVjF7dRU1Dc08ffMEzhmmfW/yrzpl+KiNmcA5ZtYLeB0oA64GPn30EUXkWJgZQwp7MKSwBzeeWcIHFbv5+ENvs6+hOeho0oVFevSRuXst8EngQXe/HBgVvVgicqT02TbpDBGXgpmdQWjL4OXwfZFuZYiISBcRaSl8ndAcRX8OH0E0BHgjerFERCQIEf217+5vAW+1ub2G0AfZRCRO5GaEZp55anY5pSX55GenBRtIuqTDbimY2S/CX180s2kHXmITUUQiMah3Fj+8/CTmrd3Jxx6YpdlV5ah0tKXwdPjrT6MdRESO3XWnDeLkgXl86XfzuerXs/nixKGcNiSfkX1zKcxJDzqedAFH/DmF8O1kID18RFJM6XMKIh3bXdvIt/60kBlLtrbeV9AjjRF9cxjZN5cJx+Vz0ag+mo47gXT2h9fmABe4+97w7R7A39z9zGNOeoRUCiKR21HTwLIt1SzbvCf0dcselm/ZQ31TC2OLe/Ifl57QOhOrdG+d/eG1jP2FAODue80s66jTiUhM5GencebQAs4cWtB6X1NzC39esJH7Ziznil/N5mMn9+PbF49kYC/9l5bID0mtMbNT998ws3GAzvoh0gWlJCfxqdJi3rxrEl87fxivf7iVC38+k0feWk1jc0vQ8SRgkQ4fjQee5Z/nQ+gHXO3uHZ1op9Np+Eikc23ctY/v/WUJr324lZF9c/jhJ0/i1EG9go4lnaxT9ymEXzAVGEHojGrLgpgMD1QKItEyY8kW7pm2hM276/jUuIHcNXkERTk6Z0N3EWkpRDR8FN5/8G/A19z9A6DEzC49xowiEkc+emJfXr1jIrdOHMIL72/kvPve5Ndvraa+SRPsJZJI9yn8BmgAzgjfrgD+KyqJRCQwPdJT+PbFJ/C3b0zkjKEF/PiVZXzy4Xep2lsfdDSJkUhLYai7/wRoBHD3fYSGkUSkGzquIJvHbixl6mfGsbpyL1c9MptNu3RsSSKItBQazCyT8Ck5zWwooD8dRLq5i07sy9M3n0ZldT1X/updXlq0iZaW6JzXXeJDpKXwPeCvQLGZ/Y7QiXa+FbVUIhI3xpfk88wtp5OVnsJXf7+Ayf8zk5cXbVY5dFMdHn1koc/BDwRqgdMJDRvNcfeq6Mc7mI4+EglGc4vz8gebeeD1lazatpdLT+7HQ9ed2vETJS502tFHHmqNF9x9u7u/7O4vBVUIIhKc5CTjsjH9mfH1c7n57ON4adFm1lTu7fiJ0qVEOnw0J/wBNhFJcMlJxq0Th5CabPzvnPVBx5FOFuncR+cBXzSzcqCG0BCSu/vJ0QomIvGrKCeDyaP78dvZ5azfUcOUsQO44IQ+ZKYlBx1NjlGkpXBxVFOISJfzn5edSN/cdKYt3MRrH24jOy2ZL593PF+eNFRTcndhh93RbGYZwBeB44EPgMfdvSlG2dqlHc0i8aW5xZm7djtPvVvOjCVbmTK2P/99xclkpGqrIZ501tTZTxH6wNosQlsLo4CvHXs8EekukpOMM4cWcMaQ3jz85mrum7GcHTUNPH3zaUFHk6PQUSmMcveTAMzsceC96EcSka7IzPjKecdTuaeep2aXBx1HjlJHRx+1zoQa9LCRiHQNuRmR7qqUeNTRT2+MmVWHrxuQGb69/+ij3KimE5EuyT10hreU5EiPepd4cdifmLsnu3tu+JLj7iltrqsQROQgYwf1BOB/56wLOIkcDdW4iHSq80YUcfbxBfz81RVU1wVyLi45BioFEelUZsb1pw+iuq6J9dtrg44jR0ilICKdLjlJby1dlX5yItLpUpJDn2h+YcFGnc6zi1EpiEinO2toAVPG9uext9dy6QNvs3RTdcdPkrigUhCRTpeWksT/XHMKv7lpPLv3NXLL02Xs0U7nLiGqpWBmk81suZmtMrO7D7PclWbmZtbhvBwi0nWcN7KIX11/Kpt27eOOPy5k+16dxTfeRa0UzCwZ+CX/nDPpWjMb1c5yOcDtwNxoZRGR4IwbnM93LjmBvy/bxqSfvsnUmavZsKOWjs76KMGI5ufRJwCr3H0NgJk9C0wBlh6w3PeBnwB3RjGLiATo8+cMYdKIQr7/0of8cPoyfjh9Gf3yMigtyeeKUwcwaURR0BElLJrDRwOADW1uV4Tva2VmpwDF7v7S4V7IzG4xszIzK6usrOz8pCISdccX5fDU5ybw16+fw71TTmTc4F7MWbOdzz9VxrzyHUHHk7BolkJ7Z9lo3V40syTgfuCbHb2Qu09191J3Ly0sLOzEiCISayP75nLDGSU8dN2pvHbHRAb2yuT2ZxbQ0qLhpHgQzVKoAIrb3B4IbGpzOwcYDbwZPs3n6cA07WwWSRx5malcOW4gm3fX0aRSiAvRLIV5wDAzO87M0oBrgGn7H3T33e5e4O4l7l4CzAEuc3edVk0kgejUnfElaqUQPv/CV4EZwIfAH919iZnda2aXRev7iojI0Yvq2TDcfTow/YD7/t8hlp0UzSwiEt+219TTLy8z6BgJT59oFpFAnX9CEVlpyXzuyTLqGjVPUtBUCiISqJF9c7nnshP5cHM1SzRHUuBUCiISuL65GQDaUogDKgURCdzIvjnkZabyHy8s1vxIAVMpiEjginIzePzGUjbu2senH5urYgiQSkFE4kJpST6P3zietVU1XPfoXKpUDIFQKYhI3Dh7WAFP3DSedTtq+OYfFwYdJyGpFEQkrpx1fAEfGVlExc7aoKMkJJWCiMSdfnmZrKmq4fG31+qIpBhTKYhI3LnroyM4b0QR339pKWPv/Ruff2oeizfuDjpWQlApiEjcyUhN5tEbSvnt5yZwdWkx72/YxVWPzOa1pVuDjtbtqRREJC4lJxnnDi/kP6eMZvrt5zC0sAef/20ZU2euDjpat6ZSEJG4V5SbwXNfPINxg3vxzHsbOn6CHDWVgoh0CRmpyYzun8uGHbU8V6ZiiBaVgoh0GXdcNILThuRz1/OLuP2ZBTQ1twQdqdtRKYhIl5GXmcqTn53A588+jmkLNzF/3c6gI3U7KgUR6VJSk5O4anzo9O8/mbFc8yR1MpWCiHQ5w/vk8LNPjWH+up288P6moON0KyoFEemSLj6pL4D2K3QylYKIdGnb9tSzs6YBdw86SregUhCRLiklKYmM1CQef3stp3z/VS74+VvMXFEZdKwuLyXoACIiRyMtJYnXvzmJpZuqKa+q4Xdz13HDE+9x4ag+DCvq0bpccX4Wnzx1AOkpyQGm7Tqsq21ylZaWellZWdAxRCTO1DU28+jMNUydtaZ1ZlV3aGpxTuiXy7SvnkVqcuIOjpjZfHcv7XA5lYKIdFfuzv2vruCBv6/i1olD+MI5QyjokR50rEBEWgqJW5si0u2ZGZ8+fTAXjerD1JlrOOvHf+eNZduCjhXXVAoi0q31yc1g6g2l/O3r51Lf1MKiCp2X4XC0o1lEEsLQwh5kpCbxyMzVLKzYRZ/cdApzMijKSacwJ52inHSKcjPok5NOSgLve1ApiEhCSEoyXvjKWTw+ay2LN1WzqGIX22saOHC36pjinvzlK2cFEzIOqBREJGGM7JvLfZ8a03q7qbmF7TUNbKuuZ9ueOh6btZZFFbtYuqmaUf1zA0wanMTdRhKRhJeSnESf3AxOGpjH+Sf04arxA2lqcS59cBY7axqCjhcIlYKISNjlpwzknstOpMWhpqEp6DiB0PCRiEgb2emht8VLH3ybj4woYnjfHPIyUzlpQB6jB+QFnC76VAoiIm187KR+pCUbf1uyldeXbeP/FmwEoE9uOnO/c0HA6aJPpSAi0kZykjF5dD8mj+6Hu7OvsZnv/WUJr364NehoMaF9CiIih2BmZKWlkJWWTH1jCx8kwAffoloKZjbZzJab2Sozu7udx+8ws6VmtsjMXjezwdHMIyJyNEb2y2VfYzMff+htfvDyUmq78U7oqJWCmSUDvwQuBkYB15rZqAMWWwCUuvvJwPPAT6KVR0TkaF07YRDPf/EMBvfO4tFZa5m5oiroSFETzS2FCcAqd1/j7g3As8CUtgu4+xvuXhu+OQcYGMU8IiJHrbQkn8dvDE0yet+MZcwr3xFwouiIZikMADa0uV0Rvu9QbgZeae8BM7vFzMrMrKyyUmdWEpFgHFfQg7s+OoLVlTVc/chs3l3V/bYYolkK1s597Z68wcyuB0qB+9p73N2nunupu5cWFhZ2YkQRkcglJxlfOe94bj77OFocrntsLuf/7E1Wbt0TdLROE81SqACK29weCGw6cCEzuwD4d+Ayd6+PYh4RkU7xH5eO4qXbzmZoYTarK2u48P6ZXP3IbN5Yvo3FG3dTU991d0RH7cxrZpYCrADOBzYC84Dr3H1Jm2VOIbSDebK7r4zkdXXmNRGJF+7O3LU7+K+Xl7J4Y3Xr/Rec0IfHbuzwJGcxFRen4zSzS4BfAMnAE+7+AzO7Fyhz92lm9hpwErA5/JT17n7Z4V5TpSAi8Wj+uh3srGnk/tdW8OHmagb3zmZoYQ9G9s3hC+cMIS8rNdB8cVEK0aBSEJF4tnRTNX9dvJmV2/YyZ812dtY2kpmazIWj+pBkkGQGBvWNLZw2JJ8+uRmMHpDHgJ6ZUc0VaSlomgsRkU40qn9u67kYGptb+Nbzi/hwc+ikPg60uLO3romdtY28/EFokOSMIb155pbTA0z9TyoFEZEoSU1O4v6rx7b72KZd+9hZ28B/vLCYfY3NMU52aCoFEZEA9O+ZSf+emfTISKV6X2PQcVppQjwREWmlUhARCdiW3XU8+c5aZq2sJOiDf1QKIiIBGjswjx01Ddzz4lI+8/h7rKmqCTSPSkFEJEB3XDSC5f81mfuvHgNAXcA7nVUKIiIB238yH4A7n1vE8i3BzaWkUhARiQPjS/KZMrY/H26uZsH6nYHlUCmIiMSB/Ow07r54JHCI6aRjRKUgIhInkpNCZxz49v99wD3TllDfFPv9CyoFEZE4UZSTwU+uPBmAJ98t55bfzqelJbbbDSoFEZE4clVpMdNvP4cxA/N4a0UlJ35vBuVVNWzbUxeT769SEBGJM6P653Lfp8bQNzeDfY3NTPrpm0z4wevMWLIl6t9bcx+JiMSh4X1yeO2bE5m5opJt1XXc8+JSKvdE/+SUKgURkTjVIz2FS07qx+7aRt4r38Gg/Kyof0+VgohInMsQCzr8AAAFx0lEQVTLSuXhT4+LyffSPgUREWmlUhARkVYqBRERaaVSEBGRVioFERFppVIQEZFWKgUREWmlUhARkVYW9Emij5SZVQLrjvLpBUBVJ8bpCrTOiUHrnBiOZZ0Hu3thRwt1uVI4FmZW5u6lQeeIJa1zYtA6J4ZYrLOGj0REpJVKQUREWiVaKUwNOkAAtM6JQeucGKK+zgm1T0FERA4v0bYURETkMLplKZjZZDNbbmarzOzudh5PN7M/hB+fa2YlsU/ZuSJY5zvMbKmZLTKz181scBA5O1NH69xmuSvNzM2syx+pEsk6m9lV4Z/1EjP7fawzdrYIfrcHmdkbZrYg/Pt9SRA5O4uZPWFm28xs8SEeNzN7IPzvscjMTu3UAO7erS5AMrAaGAKkAQuBUQcs82Xg1+Hr1wB/CDp3DNb5PCArfP1LibDO4eVygJnAHKA06Nwx+DkPAxYAvcK3i4LOHYN1ngp8KXx9FFAedO5jXOdzgVOBxYd4/BLgFcCA04G5nfn9u+OWwgRglbuvcfcG4FlgygHLTAGeCl9/HjjfzCyGGTtbh+vs7m+4e2345hxgYIwzdrZIfs4A3wd+AtTFMlyURLLOXwB+6e47Adx9W4wzdrZI1tmB3PD1PGBTDPN1OnefCew4zCJTgN96yBygp5n166zv3x1LYQCwoc3tivB97S7j7k3AbqB3TNJFRyTr3NbNhP7S6Mo6XGczOwUodveXYhksiiL5OQ8HhpvZO2Y2x8wmxyxddESyzvcA15tZBTAduC020QJzpP/fj0h3PEdze3/xH3iIVSTLdCURr4+ZXQ+UAhOjmij6DrvOZpYE3A/cFKtAMRDJzzmF0BDSJEJbg7PMbLS774pytmiJZJ2vBZ5095+Z2RnA0+F1bol+vEBE9f2rO24pVADFbW4P5ODNydZlzCyF0Cbn4TbX4l0k64yZXQD8O3CZu9fHKFu0dLTOOcBo4E0zKyc09jqti+9sjvR3+y/u3ujua4HlhEqiq4pknW8G/gjg7rOBDEJzBHVXEf1/P1rdsRTmAcPM7DgzSyO0I3naActMA24MX78S+LuH9+B0UR2uc3go5RFChdDVx5mhg3V2993uXuDuJe5eQmg/ymXuXhZM3E4Rye/2C4QOKsDMCggNJ62JacrOFck6rwfOBzCzEwiVQmVMU8bWNOCG8FFIpwO73X1zZ714txs+cvcmM/sqMIPQkQtPuPsSM7sXKHP3acDjhDYxVxHaQrgmuMTHLsJ1vg/oATwX3qe+3t0vCyz0MYpwnbuVCNd5BnCRmS0FmoG73H17cKmPTYTr/E3gUTP7BqFhlJu68h95ZvYMoeG/gvB+ku8BqQDu/mtC+00uAVYBtcBnO/X7d+F/OxER6WTdcfhIRESOkkpBRERaqRRERKSVSkFERFqpFEREpJVKQeQAZtZsZu+b2WIze9HMenby699kZg+Fr99jZnd25uuLHAuVgsjB9rn7WHcfTehzLF8JOpBIrKgURA5vNm0mGzOzu8xsXnge+/9sc/8N4fsWmtnT4fs+Hj5fxwIze83M+gSQX+SIdLtPNIt0FjNLJjR9wuPh2xcRmkdoAqFJyaaZ2bnAdkJzSp3l7lVmlh9+ibeB093dzezzwLcIffpWJG6pFEQOlmlm7wMlwHzg1fD9F4UvC8K3exAqiTHA8+5eBeDu+ydXHAj8ITzXfRqwNibpRY6Bho9EDrbP3ccCgwm9me/fp2DAj8L7G8a6+/Hu/nj4/vbmi3kQeMjdTwJuJTRRm0hcUymIHIK77wZuB+40s1RCk7J9zsx6AJjZADMrAl4HrjKz3uH79w8f5QEbw9dvRKQL0PCRyGG4+wIzWwhc4+5Ph6dmnh2eaXYvcH141s4fAG+ZWTOh4aWbCJ0R7Dkz20ho6u7jglgHkSOhWVJFRKSVho9ERKSVSkFERFqpFEREpJVKQUREWqkURESklUpBRERaqRRERKSVSkFERFr9f0i/G+V6u5xxAAAAAElFTkSuQmCC\n", 874 | "text/plain": [ 875 | "" 876 | ] 877 | }, 878 | "metadata": {}, 879 | "output_type": "display_data" 880 | } 881 | ], 882 | "source": [ 883 | "# Plot the model's precision/recall curve.\n", 884 | "\n", 885 | "%matplotlib inline\n", 886 | "import matplotlib.pyplot as plt\n", 887 | "\n", 888 | "trainingSummary = model.stages[-1].summary\n", 889 | "\n", 890 | "pr = trainingSummary.pr.toPandas()\n", 891 | "plt.plot(pr['recall'],pr['precision'])\n", 892 | "plt.ylabel('Precision')\n", 893 | "plt.xlabel('Recall')\n", 894 | "plt.show()" 895 | ] 896 | }, 897 | { 898 | "cell_type": "markdown", 899 | "metadata": {}, 900 | "source": [ 901 | "Let's use the model to make predictions using the test data. We'll leave the threshold for deciding between a true or false result at the default value of 0.5." 902 | ] 903 | }, 904 | { 905 | "cell_type": "code", 906 | "execution_count": 23, 907 | "metadata": {}, 908 | "outputs": [], 909 | "source": [ 910 | "predictions = model.transform(test_data)" 911 | ] 912 | }, 913 | { 914 | "cell_type": "markdown", 915 | "metadata": {}, 916 | "source": [ 917 | "Compute recall and precision for the test predictions to see how well the model does." 918 | ] 919 | }, 920 | { 921 | "cell_type": "code", 922 | "execution_count": 24, 923 | "metadata": {}, 924 | "outputs": [ 925 | { 926 | "name": "stdout", 927 | "output_type": "stream", 928 | "text": [ 929 | "True positives = 21\n", 930 | "False positives = 9\n", 931 | "False negatives = 29\n", 932 | "Recall = 0.42\n", 933 | "Precision = 0.7\n" 934 | ] 935 | } 936 | ], 937 | "source": [ 938 | "pred_and_label = predictions.select(\"prediction\", \"diabetic\").toPandas()\n", 939 | "\n", 940 | "tp = pred_and_label[(pred_and_label.prediction == 1) & (pred_and_label.diabetic == 1)].count().tolist()[1]\n", 941 | "fp = pred_and_label[(pred_and_label.prediction == 1) & (pred_and_label.diabetic == 0)].count().tolist()[1]\n", 942 | "fn = pred_and_label[(pred_and_label.prediction == 0) & (pred_and_label.diabetic == 1)].count().tolist()[1]\n", 943 | "\n", 944 | "print(\"True positives = %s\" % tp)\n", 945 | "print(\"False positives = %s\" % fp)\n", 946 | "print(\"False negatives = %s\" % fn)\n", 947 | "\n", 948 | "print(\"Recall = %s\" % (tp / (tp + fn)))\n", 949 | "print(\"Precision = %s\" % (tp / (tp + fp)))" 950 | ] 951 | }, 952 | { 953 | "cell_type": "markdown", 954 | "metadata": {}, 955 | "source": [ 956 | "## Publish and deploy the model\n", 957 | "\n", 958 | "In this section you will learn how to store the model in the Watson Machine Learning repository by using the repository client.\n", 959 | "\n", 960 | "First install the client library." 961 | ] 962 | }, 963 | { 964 | "cell_type": "code", 965 | "execution_count": 25, 966 | "metadata": {}, 967 | "outputs": [ 968 | { 969 | "name": "stdout", 970 | "output_type": "stream", 971 | "text": [ 972 | "Collecting watson-machine-learning-client\n", 973 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/7b/d4/cdde5b202b1c38ef124c2b147bce32004635d0ea19c2807301b2f4ffa459/watson_machine_learning_client-1.0.363-py3-none-any.whl (935kB)\n", 974 | "\u001b[K 100% |################################| 942kB 3.1MB/s eta 0:00:01\n", 975 | "\u001b[?25hCollecting certifi (from watson-machine-learning-client)\n", 976 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)\n", 977 | "\u001b[K 100% |################################| 163kB 4.2MB/s eta 0:00:01\n", 978 | "\u001b[?25hCollecting ibm-cos-sdk (from watson-machine-learning-client)\n", 979 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/d4/7e1fe33819b80d47dafa5c02c905f7acbbdff7e6cca9af668aaeaa127990/ibm-cos-sdk-2.4.4.tar.gz (50kB)\n", 980 | "\u001b[K 100% |################################| 51kB 2.3MB/s eta 0:00:01\n", 981 | "\u001b[?25hCollecting urllib3 (from watson-machine-learning-client)\n", 982 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)\n", 983 | "\u001b[K 100% |################################| 122kB 3.7MB/s eta 0:00:01\n", 984 | "\u001b[?25hCollecting tqdm (from watson-machine-learning-client)\n", 985 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6c/4b/c38b5144cf167c4f52288517436ccafefe9dc01b8d1c190e18a6b154cd4a/tqdm-4.31.1-py2.py3-none-any.whl (48kB)\n", 986 | "\u001b[K 100% |################################| 51kB 2.1MB/s eta 0:00:01\n", 987 | "\u001b[?25hCollecting tabulate (from watson-machine-learning-client)\n", 988 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/c2/fd/202954b3f0eb896c53b7b6f07390851b1fd2ca84aa95880d7ae4f434c4ac/tabulate-0.8.3.tar.gz (46kB)\n", 989 | "\u001b[K 100% |################################| 51kB 2.1MB/s eta 0:00:01\n", 990 | "\u001b[?25hCollecting pandas (from watson-machine-learning-client)\n", 991 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/e2/a3/c42cd52e40527ba35aed53a988c485ffeddbae0722b8b756da82464baa73/pandas-0.24.1-cp35-cp35m-manylinux1_x86_64.whl (10.0MB)\n", 992 | "\u001b[K 100% |################################| 10.0MB 1.1MB/s eta 0:00:01\n", 993 | "\u001b[?25hCollecting requests (from watson-machine-learning-client)\n", 994 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)\n", 995 | "\u001b[K 100% |################################| 61kB 2.8MB/s eta 0:00:01\n", 996 | "\u001b[?25hCollecting lomond (from watson-machine-learning-client)\n", 997 | " Downloading https://files.pythonhosted.org/packages/0f/b1/02eebed49c754b01b17de7705caa8c4ceecfb4f926cdafc220c863584360/lomond-0.3.3-py2.py3-none-any.whl\n", 998 | "Collecting ibm-cos-sdk-core==2.*,>=2.0.0 (from ibm-cos-sdk->watson-machine-learning-client)\n", 999 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/85/72/99afcdf6b92840d47c8765533ef6093e43059424e3b35dd31049f09c8d7a/ibm-cos-sdk-core-2.4.4.tar.gz (1.1MB)\n", 1000 | "\u001b[K 100% |################################| 1.1MB 3.8MB/s eta 0:00:01\n", 1001 | "\u001b[?25hCollecting ibm-cos-sdk-s3transfer==2.*,>=2.0.0 (from ibm-cos-sdk->watson-machine-learning-client)\n", 1002 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/27/44/c71a4595d311772953775b3588307ac8dd5a36501b3dfda6324173b963cc/ibm-cos-sdk-s3transfer-2.4.4.tar.gz (214kB)\n", 1003 | "\u001b[K 100% |################################| 215kB 3.9MB/s eta 0:00:01\n", 1004 | "\u001b[?25hCollecting pytz>=2011k (from pandas->watson-machine-learning-client)\n", 1005 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl (510kB)\n", 1006 | "\u001b[K 100% |################################| 512kB 3.9MB/s eta 0:00:01\n", 1007 | "\u001b[?25hCollecting python-dateutil>=2.5.0 (from pandas->watson-machine-learning-client)\n", 1008 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)\n", 1009 | "\u001b[K 100% |################################| 235kB 3.9MB/s eta 0:00:01\n", 1010 | "\u001b[?25hCollecting numpy>=1.12.0 (from pandas->watson-machine-learning-client)\n", 1011 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/e3/18/4f013c3c3051f4e0ffbaa4bf247050d6d5e527fe9cb1907f5975b172f23f/numpy-1.16.2-cp35-cp35m-manylinux1_x86_64.whl (17.2MB)\n", 1012 | "\u001b[K 100% |################################| 17.2MB 831kB/s eta 0:00:01\n", 1013 | "\u001b[?25hCollecting chardet<3.1.0,>=3.0.2 (from requests->watson-machine-learning-client)\n", 1014 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)\n", 1015 | "\u001b[K 100% |################################| 143kB 4.5MB/s eta 0:00:01\n", 1016 | "\u001b[?25hCollecting idna<2.9,>=2.5 (from requests->watson-machine-learning-client)\n", 1017 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)\n", 1018 | "\u001b[K 100% |################################| 61kB 2.7MB/s eta 0:00:01\n", 1019 | "\u001b[?25hCollecting six>=1.10.0 (from lomond->watson-machine-learning-client)\n", 1020 | " Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl\n", 1021 | "Collecting jmespath<1.0.0,>=0.7.1 (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client)\n", 1022 | " Downloading https://files.pythonhosted.org/packages/83/94/7179c3832a6d45b266ddb2aac329e101367fbdb11f425f13771d27f225bb/jmespath-0.9.4-py2.py3-none-any.whl\n", 1023 | "Collecting docutils>=0.10 (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client)\n", 1024 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/fa/08e9e6e0e3cbd1d362c3bbee8d01d0aedb2155c4ac112b19ef3cae8eed8d/docutils-0.14-py3-none-any.whl (543kB)\n", 1025 | "\u001b[K 100% |################################| 552kB 4.2MB/s eta 0:00:01\n", 1026 | "\u001b[?25hBuilding wheels for collected packages: ibm-cos-sdk, tabulate, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer\n", 1027 | " Running setup.py bdist_wheel for ibm-cos-sdk ... \u001b[?25ldone\n", 1028 | "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/e5/dc/54/f601cc8263513665653fbf124f6989dcbaeb218fcf1a8fd4d1\n", 1029 | " Running setup.py bdist_wheel for tabulate ... \u001b[?25ldone\n", 1030 | "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/2b/67/89/414471314a2d15de625d184d8be6d38a03ae1e983dbda91e84\n", 1031 | " Running setup.py bdist_wheel for ibm-cos-sdk-core ... \u001b[?25ldone\n", 1032 | "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/43/73/3e/79ee45c864491743309c46837d617c0550e58978659b8f742e\n", 1033 | " Running setup.py bdist_wheel for ibm-cos-sdk-s3transfer ... \u001b[?25ldone\n", 1034 | "\u001b[?25h Stored in directory: /home/spark/shared/.cache/pip/wheels/45/52/14/5239d330c7bd818043a3c578329f1ecff4f1d09694b4c7aa41\n", 1035 | "Successfully built ibm-cos-sdk tabulate ibm-cos-sdk-core ibm-cos-sdk-s3transfer\n", 1036 | "\u001b[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.\u001b[0m\n", 1037 | "\u001b[31mpyspark 2.3.0 requires py4j==0.10.6, which is not installed.\u001b[0m\n", 1038 | "Installing collected packages: certifi, jmespath, docutils, urllib3, six, python-dateutil, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer, ibm-cos-sdk, tqdm, tabulate, pytz, numpy, pandas, chardet, idna, requests, lomond, watson-machine-learning-client\n", 1039 | "Successfully installed certifi-2018.11.29 chardet-3.0.4 docutils-0.14 ibm-cos-sdk-2.4.4 ibm-cos-sdk-core-2.4.4 ibm-cos-sdk-s3transfer-2.4.4 idna-2.8 jmespath-0.9.4 lomond-0.3.3 numpy-1.16.2 pandas-0.24.1 python-dateutil-2.8.0 pytz-2018.9 requests-2.21.0 six-1.12.0 tabulate-0.8.3 tqdm-4.31.1 urllib3-1.24.1 watson-machine-learning-client-1.0.363\n" 1040 | ] 1041 | } 1042 | ], 1043 | "source": [ 1044 | "!rm -rf $PIP_BUILD/watson-machine-learning-client\n", 1045 | "!pip install watson-machine-learning-client --upgrade" 1046 | ] 1047 | }, 1048 | { 1049 | "cell_type": "markdown", 1050 | "metadata": {}, 1051 | "source": [ 1052 | "### Enter your Watson Machine Learning service instance credentials here\n", 1053 | "They can be found in the Service Credentials tab of the Watson Machine Learning service instance that you created on IBM Cloud." 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "code", 1058 | "execution_count": 26, 1059 | "metadata": {}, 1060 | "outputs": [], 1061 | "source": [ 1062 | "wml_credentials={\n", 1063 | " \"url\": \"https://xxx.ibm.com\",\n", 1064 | " \"username\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\",\n", 1065 | " \"password\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\",\n", 1066 | " \"instance_id\": \"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\"\n", 1067 | "}" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "markdown", 1072 | "metadata": {}, 1073 | "source": [ 1074 | "### Publish the model to the repository using the client" 1075 | ] 1076 | }, 1077 | { 1078 | "cell_type": "code", 1079 | "execution_count": 27, 1080 | "metadata": {}, 1081 | "outputs": [ 1082 | { 1083 | "name": "stdout", 1084 | "output_type": "stream", 1085 | "text": [ 1086 | "model_uid: e3be3fe1-3bd9-4670-b97a-03af983cdb40\n" 1087 | ] 1088 | } 1089 | ], 1090 | "source": [ 1091 | "from watson_machine_learning_client import WatsonMachineLearningAPIClient\n", 1092 | "\n", 1093 | "client = WatsonMachineLearningAPIClient(wml_credentials)\n", 1094 | "\n", 1095 | "model_props = {\n", 1096 | " client.repository.ModelMetaNames.NAME: \"diabetes-prediction-1\",\n", 1097 | "}\n", 1098 | "\n", 1099 | "stored_model_details = client.repository.store_model(model, meta_props=model_props, training_data=train_data, pipeline=pipeline)\n", 1100 | "\n", 1101 | "model_uid = client.repository.get_model_uid( stored_model_details )\n", 1102 | "print( \"model_uid: \", model_uid )" 1103 | ] 1104 | }, 1105 | { 1106 | "cell_type": "markdown", 1107 | "metadata": {}, 1108 | "source": [ 1109 | "### Deploy the model as a web service" 1110 | ] 1111 | }, 1112 | { 1113 | "cell_type": "code", 1114 | "execution_count": 28, 1115 | "metadata": {}, 1116 | "outputs": [ 1117 | { 1118 | "name": "stdout", 1119 | "output_type": "stream", 1120 | "text": [ 1121 | "\n", 1122 | "\n", 1123 | "#######################################################################################\n", 1124 | "\n", 1125 | "Synchronous deployment creation for uid: 'e3be3fe1-3bd9-4670-b97a-03af983cdb40' started\n", 1126 | "\n", 1127 | "#######################################################################################\n", 1128 | "\n", 1129 | "\n", 1130 | "INITIALIZING\n", 1131 | "DEPLOY_SUCCESS\n", 1132 | "\n", 1133 | "\n", 1134 | "------------------------------------------------------------------------------------------------\n", 1135 | "Successfully finished deployment creation, deployment_uid='f22520a9-8518-459f-8613-5b50e16b08f2'\n", 1136 | "------------------------------------------------------------------------------------------------\n", 1137 | "\n", 1138 | "\n", 1139 | "https://us-south.ml.cloud.ibm.com/v3/wml_instances/4625e647-f20e-4d7c-b23c-f287445a8f23/deployments/f22520a9-8518-459f-8613-5b50e16b08f2/online\n" 1140 | ] 1141 | } 1142 | ], 1143 | "source": [ 1144 | "deployment_details = client.deployments.create(model_uid, 'diabetes-prediction-1 deployment')\n", 1145 | "\n", 1146 | "scoring_endpoint = client.deployments.get_scoring_url(deployment_details)\n", 1147 | "print(scoring_endpoint)" 1148 | ] 1149 | }, 1150 | { 1151 | "cell_type": "markdown", 1152 | "metadata": {}, 1153 | "source": [ 1154 | "### Call the web service to make a prediction from some sample data" 1155 | ] 1156 | }, 1157 | { 1158 | "cell_type": "code", 1159 | "execution_count": 29, 1160 | "metadata": {}, 1161 | "outputs": [ 1162 | { 1163 | "name": "stdout", 1164 | "output_type": "stream", 1165 | "text": [ 1166 | "{'values': [[45.0, 156.6, [45.0, 156.6], [-0.3141354817235511, 0.3141354817235511], [0.4221056369793351, 0.5778943630206649], 1.0]], 'fields': ['hdl', 'systolic', 'features', 'rawPrediction', 'probability', 'prediction']}\n" 1167 | ] 1168 | } 1169 | ], 1170 | "source": [ 1171 | "scoring_payload = {\n", 1172 | " \"fields\": [\"hdl\", \"systolic\"],\n", 1173 | " \"values\": [[45.0, 156.6]]\n", 1174 | "}\n", 1175 | "\n", 1176 | "score = client.deployments.score(scoring_endpoint, scoring_payload)\n", 1177 | "\n", 1178 | "print(str(score))" 1179 | ] 1180 | }, 1181 | { 1182 | "cell_type": "code", 1183 | "execution_count": null, 1184 | "metadata": {}, 1185 | "outputs": [], 1186 | "source": [] 1187 | } 1188 | ], 1189 | "metadata": { 1190 | "kernelspec": { 1191 | "display_name": "Python 3", 1192 | "language": "python", 1193 | "name": "python3" 1194 | }, 1195 | "language_info": { 1196 | "codemirror_mode": { 1197 | "name": "ipython", 1198 | "version": 3 1199 | }, 1200 | "file_extension": ".py", 1201 | "mimetype": "text/x-python", 1202 | "name": "python", 1203 | "nbconvert_exporter": "python", 1204 | "pygments_lexer": "ipython3", 1205 | "version": "3.7.2" 1206 | } 1207 | }, 1208 | "nbformat": 4, 1209 | "nbformat_minor": 1 1210 | } 1211 | -------------------------------------------------------------------------------- /doc/source/images/flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-machine-learning/dd53ca5dd433cee4929bda14dad13af8f57e94b8/doc/source/images/flow.png -------------------------------------------------------------------------------- /doc/source/images/pixiedust_age_bmi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-machine-learning/dd53ca5dd433cee4929bda14dad13af8f57e94b8/doc/source/images/pixiedust_age_bmi.png -------------------------------------------------------------------------------- /doc/source/images/pixiedust_hdl_ldl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-machine-learning/dd53ca5dd433cee4929bda14dad13af8f57e94b8/doc/source/images/pixiedust_hdl_ldl.png -------------------------------------------------------------------------------- /doc/source/images/pixiedust_systolic_diastolic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/example-health-machine-learning/dd53ca5dd433cee4929bda14dad13af8f57e94b8/doc/source/images/pixiedust_systolic_diastolic.png --------------------------------------------------------------------------------