├── .DS_Store
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── images
├── .DS_Store
├── AutoEncoder.png
├── Autoencoder_structure.png
├── function.png
├── image-01.png
├── image-02.png
├── image-03.png
├── image-04.png
├── image-05.png
├── image-06.png
├── image-07.png
├── image-08.png
├── image-09.png
├── image-10.png
├── image-11.png
├── image-12.png
├── image-13.png
└── image-14.png
└── notebooks
├── AutoEncoders.ipynb
├── NeuralNetOverSampling.ipynb
├── sagemaker_fraud_detection_xgb.ipynb
└── train_nn.py
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/.DS_Store
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing Guidelines
2 |
3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
4 | documentation, we greatly value feedback and contributions from our community.
5 |
6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
7 | information to effectively respond to your bug report or contribution.
8 |
9 |
10 | ## Reporting Bugs/Feature Requests
11 |
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 |
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 |
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 |
22 |
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 |
26 | 1. You are working against the latest source on the *master* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 |
30 | To send us a pull request, please:
31 |
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 |
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 |
42 |
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 |
46 |
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 |
52 |
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 |
56 |
57 | ## Licensing
58 |
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 |
61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.
62 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
4 | this software and associated documentation files (the "Software"), to deal in
5 | the Software without restriction, including without limitation the rights to
6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7 | the Software, and to permit persons to whom the Software is furnished to do so.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 |
16 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Sagemaker Fraud Detection Workshop
2 |
3 | ### Lab description
4 |
5 | This lab demonstrates three different ML algorithms used for identifying fraudelent transactions on the same dataset:
6 | - SageMaker XGBoost
7 | - AutoEncoders
8 | - Neural Networks
9 |
10 | ### Steps for launching the workshop environment using EVENT ENGINE
11 | Note: these steps were tested on Chrome browser using Mac OS
12 | #### open a browser and navigate to https://dashboard.eventengine.run/login
13 | #### Enter a 12-character "hash" provided to you by workshop organizer.
14 | #### Click on "Accpet Terms & Login"
15 | 
16 |
17 | #### Click on "AWS Console"
18 | 
19 |
20 | #### Please, log off from any other AWS accounts you are currently logged into
21 |
22 | #### Click on "Open AWS Console"
23 | 
24 |
25 | #### You should see a screen like this.
26 | #### We now need select the correct Identity Role for the workshop
27 | #### Type "IAM" into the search bar and click on IAM
28 | (Identity and Access Management).
29 | 
30 |
31 | #### Click on "Roles"
32 | 
33 |
34 | #### Scroll down past "Create Role" and Click on "TeamRole"
35 | 
36 |
37 | #### Copy "Role ARN" by selecting the copy icon on the right
38 | #### You may want to temporariliy paste this role ARN into a notepad
39 | #### Once you copied TeamRole ARN, click on "Services" in the upper left corner
40 | 
41 |
42 | #### Enter "SageMaker" in the search bar and click on it
43 | 
44 |
45 | #### You should see a screen like this.
46 | #### Click on the orange button "Create Notebook Instance"
47 | 
48 |
49 | #### On the next webpage,
50 | #### - Give your notebook a name (no underscores, please)
51 | #### - Under Notebook instance type, select "ml.c5.2xlarge"
52 | #### - Under "Permission and encryption" select "Enter a custom IAM role ARN";
53 | #### - Paste your TeamRole ARN in the cell below labled "Custom IAM role ARN"
54 | #### Note: your TeamRole ARN will have different AWS account number than what you see here
55 | #### - Scroll down to the bottom of the page and click on "Create Notebook instance"
56 | 
57 |
58 | #### You should see your notebook being created. In a couple of minutes, its status will change
59 | #### from "Pending" to "In Service", at which point, please click on "Open Jupyter"
60 | 
61 |
62 | #### In Jupyter Notebook console, please, click on 'New' -> 'Terminal' on the right-hand side
63 | 
64 |
65 | #### A new Chrome browser tab will open displaying a command prompt terminal
66 | #### In the terminal tap, please, issue these two commands:
67 | #### $ cd SageMaker
68 | #### $ git clone https://github.com/aws-samples/amazon-sagemaker-fraud-detection
69 | #### You should see output similar to this:
70 | 
71 |
72 | #### You may now close the browser tab with command prompt terminal,
73 | #### return to Jupyter console and navigate the created folder structure to
74 | #### amazon-sagemaker-fraud-detection -> notebooks
75 | #### launch and run each one of the three Jupyter notebooks
76 | 
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 | #### Open SageMaker Console by clicking on "Services" and searching for Sagemaker
89 | 
90 |
91 | ## License
92 |
93 | This library is licensed under the MIT-0 License. See the LICENSE file.
94 |
95 |
--------------------------------------------------------------------------------
/images/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/.DS_Store
--------------------------------------------------------------------------------
/images/AutoEncoder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/AutoEncoder.png
--------------------------------------------------------------------------------
/images/Autoencoder_structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/Autoencoder_structure.png
--------------------------------------------------------------------------------
/images/function.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/function.png
--------------------------------------------------------------------------------
/images/image-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-01.png
--------------------------------------------------------------------------------
/images/image-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-02.png
--------------------------------------------------------------------------------
/images/image-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-03.png
--------------------------------------------------------------------------------
/images/image-04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-04.png
--------------------------------------------------------------------------------
/images/image-05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-05.png
--------------------------------------------------------------------------------
/images/image-06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-06.png
--------------------------------------------------------------------------------
/images/image-07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-07.png
--------------------------------------------------------------------------------
/images/image-08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-08.png
--------------------------------------------------------------------------------
/images/image-09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-09.png
--------------------------------------------------------------------------------
/images/image-10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-10.png
--------------------------------------------------------------------------------
/images/image-11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-11.png
--------------------------------------------------------------------------------
/images/image-12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-12.png
--------------------------------------------------------------------------------
/images/image-13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-13.png
--------------------------------------------------------------------------------
/images/image-14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-fraud-detection/66c88ab1f2b63686d052fe2febb9324b7847607d/images/image-14.png
--------------------------------------------------------------------------------
/notebooks/NeuralNetOverSampling.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Fraud Detection Using Neural Network - A Supervised Deep Learning Method"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Introduction\n",
15 | "In this lab, we are going to use the Neural Network to perform fraud detection. We are going to use the same dataset i.e. credit card data set. \n",
16 | "\n",
17 | "From previous labs we know that our dataset is highly imbalanced. The class column corresponds to whether or not a transaction is fradulent. We see that the majority of data is non-fraudulant with only $492$ ($.173\\%$) of the data corresponding to fraudulant examples.\n",
18 | "\n",
19 | "For unbalanced data sets like ours where the positive (fraudulent) examples occur much less frequently than the negative (legitimate) examples, we may try “over-sampling” the minority dataset by generating synthetic data (read about SMOTE in Data Mining for Imbalanced Datasets: An Overview (https://link.springer.com/chapter/10.1007%2F0-387-25465-X_40) or undersampling the majority class by using ensemble methods (see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.6858&rep=rep1&type=pdfor).\n",
20 | "\n",
21 | "Let's start by installing one of the liabraries for SMOTE technique."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "!pip install -U imbalanced-learn"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "# creating directory structure\n",
40 | "!mkdir ../data\n",
41 | "!mkdir ../model\n",
42 | "!mkdir ../logs"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 3,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "name": "stderr",
52 | "output_type": "stream",
53 | "text": [
54 | "Using TensorFlow backend.\n"
55 | ]
56 | }
57 | ],
58 | "source": [
59 | "# first neural network with keras tutorial\n",
60 | "from numpy import loadtxt\n",
61 | "from keras.models import Sequential\n",
62 | "from keras.layers import Dense\n",
63 | "import pandas as pd\n",
64 | "import numpy as np\n",
65 | "from imblearn.over_sampling import SMOTE, ADASYN\n",
66 | "from sagemaker.tensorflow import TensorFlow\n",
67 | "from collections import Counter\n",
68 | "import matplotlib.pyplot as plt\n",
69 | "import seaborn as sns"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "## Downloading data"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": null,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "!curl https://s3-us-west-2.amazonaws.com/sagemaker-e2e-solutions/fraud-detection/creditcardfraud.zip -o ../data/creditcardfraud.zip"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 5,
91 | "metadata": {},
92 | "outputs": [
93 | {
94 | "name": "stdout",
95 | "output_type": "stream",
96 | "text": [
97 | "Archive: ../data/creditcardfraud.zip\n",
98 | " inflating: ../data/creditcard.csv \n"
99 | ]
100 | }
101 | ],
102 | "source": [
103 | "!unzip -o ../data/creditcardfraud.zip -d ../data/"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "## Load and Visualize"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 6,
116 | "metadata": {},
117 | "outputs": [],
118 | "source": [
119 | "data = pd.read_csv('../data/creditcard.csv', delimiter=',')"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 7,
125 | "metadata": {},
126 | "outputs": [
127 | {
128 | "data": {
129 | "text/html": [
130 | "
"
360 | ],
361 | "text/plain": [
362 | " Time V1 V2 V3 V4 V5 V6 V7 \\\n",
363 | "0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n",
364 | "1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n",
365 | "2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n",
366 | "3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n",
367 | "4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n",
368 | "5 2.0 -0.425966 0.960523 1.141109 -0.168252 0.420987 -0.029728 0.476201 \n",
369 | "6 4.0 1.229658 0.141004 0.045371 1.202613 0.191881 0.272708 -0.005159 \n",
370 | "7 7.0 -0.644269 1.417964 1.074380 -0.492199 0.948934 0.428118 1.120631 \n",
371 | "8 7.0 -0.894286 0.286157 -0.113192 -0.271526 2.669599 3.721818 0.370145 \n",
372 | "9 9.0 -0.338262 1.119593 1.044367 -0.222187 0.499361 -0.246761 0.651583 \n",
373 | "\n",
374 | " V8 V9 ... V21 V22 V23 V24 V25 \\\n",
375 | "0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 \n",
376 | "1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 \n",
377 | "2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 \n",
378 | "3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 \n",
379 | "4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 \n",
380 | "5 0.260314 -0.568671 ... -0.208254 -0.559825 -0.026398 -0.371427 -0.232794 \n",
381 | "6 0.081213 0.464960 ... -0.167716 -0.270710 -0.154104 -0.780055 0.750137 \n",
382 | "7 -3.807864 0.615375 ... 1.943465 -1.015455 0.057504 -0.649709 -0.415267 \n",
383 | "8 0.851084 -0.392048 ... -0.073425 -0.268092 -0.204233 1.011592 0.373205 \n",
384 | "9 0.069539 -0.736727 ... -0.246914 -0.633753 -0.120794 -0.385050 -0.069733 \n",
385 | "\n",
386 | " V26 V27 V28 Amount Class \n",
387 | "0 -0.189115 0.133558 -0.021053 149.62 0 \n",
388 | "1 0.125895 -0.008983 0.014724 2.69 0 \n",
389 | "2 -0.139097 -0.055353 -0.059752 378.66 0 \n",
390 | "3 -0.221929 0.062723 0.061458 123.50 0 \n",
391 | "4 0.502292 0.219422 0.215153 69.99 0 \n",
392 | "5 0.105915 0.253844 0.081080 3.67 0 \n",
393 | "6 -0.257237 0.034507 0.005168 4.99 0 \n",
394 | "7 -0.051634 -1.206921 -1.085339 40.80 0 \n",
395 | "8 -0.384157 0.011747 0.142404 93.20 0 \n",
396 | "9 0.094199 0.246219 0.083076 3.68 0 \n",
397 | "\n",
398 | "[10 rows x 31 columns]"
399 | ]
400 | },
401 | "execution_count": 27,
402 | "metadata": {},
403 | "output_type": "execute_result"
404 | }
405 | ],
406 | "source": [
407 | "print(data.columns)\n",
408 | "data[['Time', 'V1', 'V2', 'V27', 'V28', 'Amount', 'Class']].describe()\n",
409 | "data.head(10)"
410 | ]
411 | },
412 | {
413 | "cell_type": "markdown",
414 | "metadata": {},
415 | "source": [
416 | "The class column corresponds to whether or not a transaction is fradulent. We see that the majority of data is non-fraudulant with only $492$ ($.173\\%$) of the data corresponding to fraudulant examples."
417 | ]
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": 28,
422 | "metadata": {},
423 | "outputs": [
424 | {
425 | "name": "stdout",
426 | "output_type": "stream",
427 | "text": [
428 | "Number of frauds: 492\n",
429 | "Number of non-frauds: 284315\n",
430 | "Percentage of fradulent data: 0.1727485630620034\n"
431 | ]
432 | }
433 | ],
434 | "source": [
435 | "nonfrauds, frauds = data.groupby('Class').size()\n",
436 | "print('Number of frauds: ', frauds)\n",
437 | "print('Number of non-frauds: ', nonfrauds)\n",
438 | "print('Percentage of fradulent data:', 100.*frauds/(frauds + nonfrauds))"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "This dataset has 28 columns, $V_i$ for $i=1..28$ of anonymized features along with columns for time, amount, and class. We already know that the columns $V_i$ have been normalized to have $0$ mean and unit standard deviation as the result of a PCA. You can read more about PCA here:. \n",
446 | "\n",
447 | "Tip: For our dataset this amount of preprocessing will give us reasonable accuracy, but it's important to note that there are more preprocessing steps one can use to improve accuracy . For unbalanced data sets like ours where the positive (fraudulent) examples occur much less frequently than the negative (legitimate) examples, we may try “over-sampling” the minority dataset by generating synthetic data (read about SMOTE in Data Mining for Imbalanced Datasets: An Overview (https://link.springer.com/chapter/10.1007%2F0-387-25465-X_40) or undersampling the majority class by using ensemble methods (see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.6858&rep=rep1&type=pdfor)."
448 | ]
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": 29,
453 | "metadata": {},
454 | "outputs": [],
455 | "source": [
456 | "feature_columns = data.columns[:-1]\n",
457 | "label_column = data.columns[-1]\n",
458 | "\n",
459 | "features = data[feature_columns].values.astype('float32')\n",
460 | "labels = (data[label_column].values).astype('float32')"
461 | ]
462 | },
463 | {
464 | "cell_type": "markdown",
465 | "metadata": {},
466 | "source": [
467 | "Let's do some analysis and discuss different ways we can preprocess our data. Let's discuss the way in which this data was preprocessed."
468 | ]
469 | },
470 | {
471 | "cell_type": "markdown",
472 | "metadata": {},
473 | "source": [
474 | "## SageMaker XGB"
475 | ]
476 | },
477 | {
478 | "cell_type": "markdown",
479 | "metadata": {},
480 | "source": [
481 | "### Prepare Data and Upload to S3"
482 | ]
483 | },
484 | {
485 | "cell_type": "markdown",
486 | "metadata": {},
487 | "source": [
488 | "The Amazon common libraries provide utilities to convert NumPy n-dimensional arrays into a the Record-IO format which SageMaker uses for a concise representation of features and labels. The Record-IO format is implemented via protocol buffer so the serialization is very efficient."
489 | ]
490 | },
491 | {
492 | "cell_type": "code",
493 | "execution_count": 30,
494 | "metadata": {},
495 | "outputs": [
496 | {
497 | "data": {
498 | "text/html": [
499 | "
\n",
500 | "\n",
513 | "
\n",
514 | " \n",
515 | "
\n",
516 | "
\n",
517 | "
Class
\n",
518 | "
Time
\n",
519 | "
V1
\n",
520 | "
V2
\n",
521 | "
V3
\n",
522 | "
V4
\n",
523 | "
V5
\n",
524 | "
V6
\n",
525 | "
V7
\n",
526 | "
V8
\n",
527 | "
...
\n",
528 | "
V20
\n",
529 | "
V21
\n",
530 | "
V22
\n",
531 | "
V23
\n",
532 | "
V24
\n",
533 | "
V25
\n",
534 | "
V26
\n",
535 | "
V27
\n",
536 | "
V28
\n",
537 | "
Amount
\n",
538 | "
\n",
539 | " \n",
540 | " \n",
541 | "
\n",
542 | "
0
\n",
543 | "
0
\n",
544 | "
0.0
\n",
545 | "
-1.359807
\n",
546 | "
-0.072781
\n",
547 | "
2.536347
\n",
548 | "
1.378155
\n",
549 | "
-0.338321
\n",
550 | "
0.462388
\n",
551 | "
0.239599
\n",
552 | "
0.098698
\n",
553 | "
...
\n",
554 | "
0.251412
\n",
555 | "
-0.018307
\n",
556 | "
0.277838
\n",
557 | "
-0.110474
\n",
558 | "
0.066928
\n",
559 | "
0.128539
\n",
560 | "
-0.189115
\n",
561 | "
0.133558
\n",
562 | "
-0.021053
\n",
563 | "
149.62
\n",
564 | "
\n",
565 | "
\n",
566 | "
1
\n",
567 | "
0
\n",
568 | "
0.0
\n",
569 | "
1.191857
\n",
570 | "
0.266151
\n",
571 | "
0.166480
\n",
572 | "
0.448154
\n",
573 | "
0.060018
\n",
574 | "
-0.082361
\n",
575 | "
-0.078803
\n",
576 | "
0.085102
\n",
577 | "
...
\n",
578 | "
-0.069083
\n",
579 | "
-0.225775
\n",
580 | "
-0.638672
\n",
581 | "
0.101288
\n",
582 | "
-0.339846
\n",
583 | "
0.167170
\n",
584 | "
0.125895
\n",
585 | "
-0.008983
\n",
586 | "
0.014724
\n",
587 | "
2.69
\n",
588 | "
\n",
589 | "
\n",
590 | "
2
\n",
591 | "
0
\n",
592 | "
1.0
\n",
593 | "
-1.358354
\n",
594 | "
-1.340163
\n",
595 | "
1.773209
\n",
596 | "
0.379780
\n",
597 | "
-0.503198
\n",
598 | "
1.800499
\n",
599 | "
0.791461
\n",
600 | "
0.247676
\n",
601 | "
...
\n",
602 | "
0.524980
\n",
603 | "
0.247998
\n",
604 | "
0.771679
\n",
605 | "
0.909412
\n",
606 | "
-0.689281
\n",
607 | "
-0.327642
\n",
608 | "
-0.139097
\n",
609 | "
-0.055353
\n",
610 | "
-0.059752
\n",
611 | "
378.66
\n",
612 | "
\n",
613 | "
\n",
614 | "
3
\n",
615 | "
0
\n",
616 | "
1.0
\n",
617 | "
-0.966272
\n",
618 | "
-0.185226
\n",
619 | "
1.792993
\n",
620 | "
-0.863291
\n",
621 | "
-0.010309
\n",
622 | "
1.247203
\n",
623 | "
0.237609
\n",
624 | "
0.377436
\n",
625 | "
...
\n",
626 | "
-0.208038
\n",
627 | "
-0.108300
\n",
628 | "
0.005274
\n",
629 | "
-0.190321
\n",
630 | "
-1.175575
\n",
631 | "
0.647376
\n",
632 | "
-0.221929
\n",
633 | "
0.062723
\n",
634 | "
0.061458
\n",
635 | "
123.50
\n",
636 | "
\n",
637 | "
\n",
638 | "
4
\n",
639 | "
0
\n",
640 | "
2.0
\n",
641 | "
-1.158233
\n",
642 | "
0.877737
\n",
643 | "
1.548718
\n",
644 | "
0.403034
\n",
645 | "
-0.407193
\n",
646 | "
0.095921
\n",
647 | "
0.592941
\n",
648 | "
-0.270533
\n",
649 | "
...
\n",
650 | "
0.408542
\n",
651 | "
-0.009431
\n",
652 | "
0.798278
\n",
653 | "
-0.137458
\n",
654 | "
0.141267
\n",
655 | "
-0.206010
\n",
656 | "
0.502292
\n",
657 | "
0.219422
\n",
658 | "
0.215153
\n",
659 | "
69.99
\n",
660 | "
\n",
661 | " \n",
662 | "
\n",
663 | "
5 rows × 31 columns
\n",
664 | "
"
665 | ],
666 | "text/plain": [
667 | " Class Time V1 V2 V3 V4 V5 V6 \\\n",
668 | "0 0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 \n",
669 | "1 0 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 \n",
670 | "2 0 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 \n",
671 | "3 0 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 \n",
672 | "4 0 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 \n",
673 | "\n",
674 | " V7 V8 ... V20 V21 V22 V23 V24 \\\n",
675 | "0 0.239599 0.098698 ... 0.251412 -0.018307 0.277838 -0.110474 0.066928 \n",
676 | "1 -0.078803 0.085102 ... -0.069083 -0.225775 -0.638672 0.101288 -0.339846 \n",
677 | "2 0.791461 0.247676 ... 0.524980 0.247998 0.771679 0.909412 -0.689281 \n",
678 | "3 0.237609 0.377436 ... -0.208038 -0.108300 0.005274 -0.190321 -1.175575 \n",
679 | "4 0.592941 -0.270533 ... 0.408542 -0.009431 0.798278 -0.137458 0.141267 \n",
680 | "\n",
681 | " V25 V26 V27 V28 Amount \n",
682 | "0 0.128539 -0.189115 0.133558 -0.021053 149.62 \n",
683 | "1 0.167170 0.125895 -0.008983 0.014724 2.69 \n",
684 | "2 -0.327642 -0.139097 -0.055353 -0.059752 378.66 \n",
685 | "3 0.647376 -0.221929 0.062723 0.061458 123.50 \n",
686 | "4 -0.206010 0.502292 0.219422 0.215153 69.99 \n",
687 | "\n",
688 | "[5 rows x 31 columns]"
689 | ]
690 | },
691 | "execution_count": 30,
692 | "metadata": {},
693 | "output_type": "execute_result"
694 | }
695 | ],
696 | "source": [
697 | "model_data = data\n",
698 | "model_data.head()\n",
699 | "model_data = pd.concat([model_data['Class'], model_data.drop(['Class'], axis=1)], axis=1)\n",
700 | "model_data.head()\n"
701 | ]
702 | },
703 | {
704 | "cell_type": "markdown",
705 | "metadata": {},
706 | "source": [
707 | "### Now we upload the data to S3 using boto3."
708 | ]
709 | },
710 | {
711 | "cell_type": "code",
712 | "execution_count": 32,
713 | "metadata": {},
714 | "outputs": [
715 | {
716 | "name": "stdout",
717 | "output_type": "stream",
718 | "text": [
719 | "Uploaded training data location: s3://sagemaker-us-east-1-282128611277/sagemaker/DEMO-xgboost-fraud/train/train.csv\n",
720 | "Uploaded training data location: s3://sagemaker-us-east-1-282128611277/sagemaker/DEMO-xgboost-fraud/validation/validation.csv\n",
721 | "Training artifacts will be uploaded to: s3://sagemaker-us-east-1-282128611277/sagemaker/DEMO-xgboost-fraud/output\n"
722 | ]
723 | }
724 | ],
725 | "source": [
726 | "import boto3\n",
727 | "import os\n",
728 | "import sagemaker\n",
729 | "\n",
730 | "session = sagemaker.Session()\n",
731 | "\n",
732 | "bucket = session.default_bucket()\n",
733 | "sagemaker_iam_role = sagemaker.get_execution_role()\n",
734 | "\n",
735 | "prefix = 'sagemaker/DEMO-xgboost-fraud'\n",
736 | "\n",
737 | "train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), \n",
738 | " [int(0.7 * len(model_data)), int(0.9 * len(model_data))])\n",
739 | "train_data.to_csv('train.csv', header=False, index=False)\n",
740 | "validation_data.to_csv('validation.csv', header=False, index=False)\n",
741 | "\n",
742 | "\n",
743 | "boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')) \\\n",
744 | " .upload_file('train.csv')\n",
745 | "boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')) \\\n",
746 | " .upload_file('validation.csv')\n",
747 | "s3_train_data = 's3://{}/{}/train/train.csv'.format(bucket, prefix)\n",
748 | "s3_validation_data = 's3://{}/{}/validation/validation.csv'.format(bucket, prefix)\n",
749 | "print('Uploaded training data location: {}'.format(s3_train_data))\n",
750 | "print('Uploaded training data location: {}'.format(s3_validation_data))\n",
751 | "\n",
752 | "output_location = 's3://{}/{}/output'.format(bucket, prefix)\n",
753 | "print('Training artifacts will be uploaded to: {}'.format(output_location))"
754 | ]
755 | },
756 | {
757 | "cell_type": "markdown",
758 | "metadata": {},
759 | "source": [
760 | "---\n",
761 | "## Train\n",
762 | "\n",
763 | "Moving onto training, first we'll need to specify the locations of the XGBoost algorithm containers.\n",
764 | "To specify the Linear Learner algorithm, we use a utility function to obtain it's URI. A complete list of build-in algorithms is found here: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html"
765 | ]
766 | },
767 | {
768 | "cell_type": "code",
769 | "execution_count": 33,
770 | "metadata": {},
771 | "outputs": [
772 | {
773 | "name": "stderr",
774 | "output_type": "stream",
775 | "text": [
776 | "WARNING:root:There is a more up to date SageMaker XGBoost image.To use the newer image, please set 'repo_version'='0.90-1. For example:\n",
777 | "\tget_image_uri(region, 'xgboost', '0.90-1').\n"
778 | ]
779 | }
780 | ],
781 | "source": [
782 | "from sagemaker.amazon.amazon_estimator import get_image_uri\n",
783 | "container = get_image_uri(boto3.Session().region_name, 'xgboost')"
784 | ]
785 | },
786 | {
787 | "cell_type": "markdown",
788 | "metadata": {},
789 | "source": [
790 | "Then, because we're training with the CSV file format, we'll create s3_inputs that our training function can use as a pointer to the files in S3."
791 | ]
792 | },
793 | {
794 | "cell_type": "code",
795 | "execution_count": 34,
796 | "metadata": {},
797 | "outputs": [],
798 | "source": [
799 | "s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')\n",
800 | "s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')"
801 | ]
802 | },
803 | {
804 | "cell_type": "markdown",
805 | "metadata": {},
806 | "source": [
807 | "Now, we can specify a few parameters like what type of training instances we'd like to use and how many, as well as our XGBoost hyperparameters. A few key hyperparameters are:\n",
808 | "- `max_depth` controls how deep each tree within the algorithm can be built. Deeper trees can lead to better fit, but are more computationally expensive and can lead to overfitting. There is typically some trade-off in model performance that needs to be explored between a large number of shallow trees and a smaller number of deeper trees.\n",
809 | "- `subsample` controls sampling of the training data. This technique can help reduce overfitting, but setting it too low can also starve the model of data.\n",
810 | "- `num_round` controls the number of boosting rounds. This is essentially the subsequent models that are trained using the residuals of previous iterations. Again, more rounds should produce a better fit on the training data, but can be computationally expensive or lead to overfitting.\n",
811 | "- `eta` controls how aggressive each round of boosting is. Larger values lead to more conservative boosting.\n",
812 | "- `gamma` controls how aggressively trees are grown. Larger values lead to more conservative models.\n",
813 | "\n",
814 | "More detail on XGBoost's hyperparmeters can be found on their GitHub [page](https://github.com/dmlc/xgboost/blob/master/doc/parameter.md)."
815 | ]
816 | },
817 | {
818 | "cell_type": "markdown",
819 | "metadata": {},
820 | "source": [
821 | "SageMaker abstracts training with Estimators. We can pass container, and all parameters to the estimator, as well as the hyperparameters for the linear learner and fit the estimator to the data in S3.\n",
822 | "Note: For IP protection reasons, SageMaker built-in algorithms, such as XGBoost, can't be run locally, i.e. on the same instance where this Jupyter Notebook code is running. "
823 | ]
824 | },
825 | {
826 | "cell_type": "code",
827 | "execution_count": 35,
828 | "metadata": {},
829 | "outputs": [],
830 | "source": [
831 | "xgb = sagemaker.estimator.Estimator(container,\n",
832 | " role=sagemaker_iam_role, \n",
833 | " train_instance_count=1, \n",
834 | " train_instance_type='ml.m4.xlarge',\n",
835 | " output_path=output_location,\n",
836 | " sagemaker_session=session)\n",
837 | "xgb.set_hyperparameters(max_depth=5,\n",
838 | " eta=0.2,\n",
839 | " gamma=4,\n",
840 | " min_child_weight=6,\n",
841 | " subsample=0.8,\n",
842 | " silent=0,\n",
843 | " objective='binary:logistic',\n",
844 | " num_round=100)"
845 | ]
846 | },
847 | {
848 | "cell_type": "code",
849 | "execution_count": null,
850 | "metadata": {},
851 | "outputs": [],
852 | "source": [
853 | "xgb.fit({'train': s3_input_train, 'validation': s3_input_validation}) "
854 | ]
855 | },
856 | {
857 | "cell_type": "markdown",
858 | "metadata": {},
859 | "source": [
860 | "### Host XGBoost Model"
861 | ]
862 | },
863 | {
864 | "cell_type": "markdown",
865 | "metadata": {},
866 | "source": [
867 | "Now we deploy the estimator to and endpoint."
868 | ]
869 | },
870 | {
871 | "cell_type": "code",
872 | "execution_count": 38,
873 | "metadata": {},
874 | "outputs": [
875 | {
876 | "name": "stderr",
877 | "output_type": "stream",
878 | "text": [
879 | "WARNING:sagemaker:Using already existing model: xgboost-2019-12-03-21-58-34-726\n"
880 | ]
881 | },
882 | {
883 | "name": "stdout",
884 | "output_type": "stream",
885 | "text": [
886 | "---------------------------------------------------------------------------------------------------------------!"
887 | ]
888 | }
889 | ],
890 | "source": [
891 | "xgb.name = 'deployed-xgboost-fraud-prediction'\n",
892 | "xgb_predictor = xgb.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge',\n",
893 | " endpoint_name='deployed-xgboost-fraud-prediction')"
894 | ]
895 | },
896 | {
897 | "cell_type": "markdown",
898 | "metadata": {},
899 | "source": [
900 | "### Evaluate\n",
901 | "\n",
902 | "Now that we have a hosted endpoint running, we can make real-time predictions from our model very easily, \n",
903 | "simply by making an http POST request. But first, we'll need to setup serializers and deserializers for passing our `test_data` NumPy arrays to the model behind the endpoint."
904 | ]
905 | },
906 | {
907 | "cell_type": "code",
908 | "execution_count": 39,
909 | "metadata": {},
910 | "outputs": [],
911 | "source": [
912 | "from sagemaker.predictor import csv_serializer \n",
913 | "\n",
914 | "xgb_predictor.content_type = 'text/csv'\n",
915 | "xgb_predictor.serializer = csv_serializer\n",
916 | "xgb_predictor.deserializer = None"
917 | ]
918 | },
919 | {
920 | "cell_type": "markdown",
921 | "metadata": {},
922 | "source": [
923 | "Now, we'll use a simple function to:\n",
924 | "1. Loop over our test dataset\n",
925 | "1. Split it into mini-batches of rows \n",
926 | "1. Convert those mini-batchs to CSV string payloads\n",
927 | "1. Retrieve mini-batch predictions by invoking the XGBoost endpoint\n",
928 | "1. Collect predictions and convert from the CSV output our model provides into a NumPy array"
929 | ]
930 | },
931 | {
932 | "cell_type": "code",
933 | "execution_count": 40,
934 | "metadata": {},
935 | "outputs": [
936 | {
937 | "name": "stderr",
938 | "output_type": "stream",
939 | "text": [
940 | "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ipykernel/__main__.py:9: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n"
941 | ]
942 | }
943 | ],
944 | "source": [
945 | "def predict(data, rows=500):\n",
946 | " split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))\n",
947 | " predictions = ''\n",
948 | " for array in split_array:\n",
949 | " predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])\n",
950 | "\n",
951 | " return np.fromstring(predictions[1:], sep=',')\n",
952 | "\n",
953 | "predictions = predict(test_data.as_matrix()[:, 1:])"
954 | ]
955 | },
956 | {
957 | "cell_type": "markdown",
958 | "metadata": {},
959 | "source": [
960 | "There are many ways to compare the performance of a machine learning model, but let's start by simply by comparing actual to predicted values. In this case, we're simply predicting whether the customer churned (1) or not (0), which produces a simple confusion matrix."
961 | ]
962 | },
963 | {
964 | "cell_type": "code",
965 | "execution_count": 41,
966 | "metadata": {},
967 | "outputs": [
968 | {
969 | "name": "stdout",
970 | "output_type": "stream",
971 | "text": [
972 | "Number of frauds: 54\n",
973 | "Number of non-frauds: 28427\n",
974 | "Percentage of fradulent data: 0.1896000842667041\n"
975 | ]
976 | }
977 | ],
978 | "source": [
979 | "test_nonfrauds, test_frauds = test_data.groupby('Class').size()\n",
980 | "print('Number of frauds: ', test_frauds)\n",
981 | "print('Number of non-frauds: ', test_nonfrauds)\n",
982 | "print('Percentage of fradulent data:', 100.*test_frauds/(test_frauds + test_nonfrauds))"
983 | ]
984 | },
985 | {
986 | "cell_type": "code",
987 | "execution_count": 42,
988 | "metadata": {},
989 | "outputs": [
990 | {
991 | "data": {
992 | "text/html": [
993 | "
\n",
994 | "\n",
1007 | "
\n",
1008 | " \n",
1009 | "
\n",
1010 | "
predictions
\n",
1011 | "
0.0
\n",
1012 | "
1.0
\n",
1013 | "
\n",
1014 | "
\n",
1015 | "
actual
\n",
1016 | "
\n",
1017 | "
\n",
1018 | "
\n",
1019 | " \n",
1020 | " \n",
1021 | "
\n",
1022 | "
0
\n",
1023 | "
28426
\n",
1024 | "
1
\n",
1025 | "
\n",
1026 | "
\n",
1027 | "
1
\n",
1028 | "
14
\n",
1029 | "
40
\n",
1030 | "
\n",
1031 | " \n",
1032 | "
\n",
1033 | "
"
1034 | ],
1035 | "text/plain": [
1036 | "predictions 0.0 1.0\n",
1037 | "actual \n",
1038 | "0 28426 1\n",
1039 | "1 14 40"
1040 | ]
1041 | },
1042 | "execution_count": 42,
1043 | "metadata": {},
1044 | "output_type": "execute_result"
1045 | }
1046 | ],
1047 | "source": [
1048 | "pd.crosstab(index=test_data.iloc[:, 0], columns=np.round(predictions), rownames=['actual'], colnames=['predictions'])"
1049 | ]
1050 | },
1051 | {
1052 | "cell_type": "code",
1053 | "execution_count": 43,
1054 | "metadata": {},
1055 | "outputs": [
1056 | {
1057 | "name": "stdout",
1058 | "output_type": "stream",
1059 | "text": [
1060 | "precision: 0.98\n",
1061 | "recall: 0.74\n"
1062 | ]
1063 | }
1064 | ],
1065 | "source": [
1066 | "#precision: tp / (tp + fp)\n",
1067 | "#recall: tp / (tp + fn)\n",
1068 | "from sklearn.metrics import precision_recall_fscore_support\n",
1069 | "results = precision_recall_fscore_support(test_data.iloc[:, 0],\n",
1070 | " np.round(predictions))\n",
1071 | "print('precision: ', round(results[0][1], 2))\n",
1072 | "print('recall: ', round(results[1][1], 2))"
1073 | ]
1074 | },
1075 | {
1076 | "cell_type": "markdown",
1077 | "metadata": {},
1078 | "source": [
1079 | "Note, due to randomized elements of the algorithm, you results may differ slightly.\n",
1080 | "\n",
1081 | "Of the 54 fraudsters, we've correctly predicted 40 of them (true positives). And, we incorrectly predicted 1 case of fraud (false positive). There are also 14 cases of fraud that the model classified as benign transaction (false negatives) - which can get really expensive.\n",
1082 | "\n",
1083 | "An important point here is that because of the np.round() function above we are using a simple threshold (or cutoff) of 0.5. Our predictions from xgboost come out as continuous values between 0 and 1 and we force them into the binary classes that we began with. So, we should consider adjusting this cutoff. That will almost certainly increase the number of false positives, but it can also be expected to increase the number of true positives and reduce the number of false negatives.\n",
1084 | "\n",
1085 | "To get a rough intuition here, let's look at the continuous values of our predictions."
1086 | ]
1087 | },
1088 | {
1089 | "cell_type": "code",
1090 | "execution_count": 44,
1091 | "metadata": {},
1092 | "outputs": [
1093 | {
1094 | "data": {
1095 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEOpJREFUeJzt3H+s3XV9x/HnSyrOTR3VVkKgW5nWZJVliA12cdlQFigssZgZAolSCbFGYdHNLKL7AwOSSBY1IUFcDQ1lUYH5YzSxrmsYC3FZkTth/BzjDlHaVegogguZDnzvj/OpHvq5l3u49/ae3vb5SE7u97y/n+/3+/60hdf9/jgnVYUkScNeNu4GJEmHHsNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJnSXjbmC2li1bVitXrhx3G5K0aCxbtozt27dvr6p1M41dtOGwcuVKJiYmxt2GJC0qSZaNMs7LSpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkzqL9hPRcrLz0W2M57qOf+eOxHFeSXirPHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktSZMRySrEhyW5IHktyf5COt/qkku5Pc3V5nD23ziSSTSR5KcuZQfV2rTSa5dKh+YpI7Wv2mJEfP90QlSaMb5czhOeBjVbUaWAtcnGR1W/f5qjq5vbYBtHXnAW8G1gFfSHJUkqOAa4CzgNXA+UP7uart643AU8BF8zQ/SdIszBgOVbWnqr7Xln8CPAgc/yKbrAdurKqfVtX3gUng1PaarKpHqupnwI3A+iQB3gl8rW2/BThnthOSJM3dS7rnkGQl8Bbgjla6JMk9STYnWdpqxwOPDW22q9Wmq78O+HFVPXdAXZI0JiOHQ5JXAV8HPlpVzwDXAm8ATgb2AJ89KB2+sIeNSSaSTOzdu/dgH06SjlgjhUOSlzMIhi9X1TcAqurxqnq+qn4OfInBZSOA3cCKoc1PaLXp6k8CxyRZckC9U1WbqmpNVa1Zvnz5KK1LkmZhlKeVAlwHPFhVnxuqHzc07N3AfW15K3BeklckORFYBXwXuBNY1Z5MOprBTeutVVXAbcB72vYbgFvmNi1J0lwsmXkIbwfeB9yb5O5W+ySDp41OBgp4FPggQFXdn+Rm4AEGTzpdXFXPAyS5BNgOHAVsrqr72/4+DtyY5NPAXQzCSJI0JjOGQ1V9B8gUq7a9yDZXAldOUd821XZV9Qi/vCwlSRozPyEtSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSerMGA5JViS5LckDSe5P8pFWf22SHUkebj+XtnqSXJ1kMsk9SU4Z2teGNv7hJBuG6m9Ncm/b5uokORiTlSSNZpQzh+eAj1XVamAtcHGS1cClwK1VtQq4tb0HOAtY1V4bgWthECbAZcDbgFOBy/YHShvzgaHt1s19apKk2ZoxHKpqT1V9ry3/BHgQOB5YD2xpw7YA57Tl9cANNbATOCbJccCZwI6q2ldVTwE7gHVt3WuqamdVFXDD0L4kSWPwku45JFkJvAW4Azi2qva0VT8Cjm3LxwOPDW22q9VerL5rirokaUxGDockrwK+Dny0qp4ZXtd+46957m2qHjYmmUgysXfv3oN9OEk6Yo0UDkleziAYvlxV32jlx9slIdrPJ1p9N7BiaPMTWu3F6idMUe9U1aaqWlNVa5YvXz5K65KkWRjlaaUA1wEPVtXnhlZtBfY/cbQBuGWofkF7amkt8HS7/LQdOCPJ0nYj+gxge1v3TJK17VgXDO1LkjQGS0YY83bgfcC9Se5utU8CnwFuTnIR8APg3LZuG3A2MAk8C1wIUFX7klwB3NnGXV5V+9ryh4HrgVcC324vSdKYzBgOVfUdYLrPHZw+xfgCLp5mX5uBzVPUJ4CTZupFkrQw/IS0JKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOjOGQ5LNSZ5Ict9Q7VNJdie5u73OHlr3iSSTSR5KcuZQfV2rTSa5dKh+YpI7Wv2mJEfP5wQlSS/dKGcO1wPrpqh/vqpObq9tAElWA+cBb27bfCHJUUmOAq4BzgJWA+e3sQBXtX29EXgKuGguE5Ikzd2M4VBVtwP7RtzfeuDGqvppVX0fmAROba/Jqnqkqn4G3AisTxLgncDX2vZbgHNe4hwkSfNsLvccLklyT7vstLTVjgceGxqzq9Wmq78O+HFVPXdAXZI0RrMNh2uBNwAnA3uAz85bRy8iycYkE0km9u7duxCHlKQj0qzCoaoer6rnq+rnwJcYXDYC2A2sGBp6QqtNV38SOCbJkgPq0x13U1Wtqao1y5cvn03rkqQRzCockhw39PbdwP4nmbYC5yV5RZITgVXAd4E7gVXtyaSjGdy03lpVBdwGvKdtvwG4ZTY9SZLmz5KZBiT5KnAasCzJLuAy4LQkJwMFPAp8EKCq7k9yM/AA8BxwcVU93/ZzCbAdOArYXFX3t0N8HLgxyaeBu4Dr5m12kqRZmTEcqur8KcrT/g+8qq4Erpyivg3YNkX9EX55WUqSdAjwE9KSpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpM6M4ZBkc5Inktw3VHttkh1JHm4/l7Z6klydZDLJPUlOGdpmQxv/cJINQ/W3Jrm3bXN1ksz3JCVJL80oZw7XA+sOqF0K3FpVq4Bb23uAs4BV7bURuBYGYQJcBrwNOBW4bH+gtDEfGNruwGNJkhbYjOFQVbcD+w4orwe2tOUtwDlD9RtqYCdwTJLjgDOBHVW1r6qeAnYA69q611TVzqoq4IahfUmSxmS29xyOrao9bflHwLFt+XjgsaFxu1rtxeq7pqhLksZozjek22/8NQ+9zCjJxiQTSSb27t27EIeUpCPSbMPh8XZJiPbziVbfDawYGndCq71Y/YQp6lOqqk1Vtaaq1ixfvnyWrUuSZjLbcNgK7H/iaANwy1D9gvbU0lrg6Xb5aTtwRpKl7Ub0GcD2tu6ZJGvbU0oXDO1LkjQmS2YakOSrwGnAsiS7GDx19Bng5iQXAT8Azm3DtwFnA5PAs8CFAFW1L8kVwJ1t3OVVtf8m94cZPBH1SuDb7SVJGqMZw6Gqzp9m1elTjC3g4mn2sxnYPEV9Ajhppj4kSQvHT0hLkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpM6dwSPJoknuT3J1kotVem2RHkofbz6WtniRXJ5lMck+SU4b2s6GNfzjJhrlNSZI0V/Nx5vCOqjq5qta095cCt1bVKuDW9h7gLGBVe20EroVBmACXAW8DTgUu2x8okqTxOBiXldYDW9ryFuCcofoNNbATOCbJccCZwI6q2ldVTwE7gHUHoS9J0ojmGg4F/EOSf02ysdWOrao9bflHwLFt+XjgsaFtd7XadPVOko1JJpJM7N27d46tS5Kms2SO2/9+Ve1O8npgR5J/H15ZVZWk5niM4f1tAjYBrFmzZt72K0l6oTmdOVTV7vbzCeCbDO4ZPN4uF9F+PtGG7wZWDG1+QqtNV5ckjcmswyHJryV59f5l4AzgPmArsP+Jow3ALW15K3BBe2ppLfB0u/y0HTgjydJ2I/qMVpMkjclcLisdC3wzyf79fKWq/j7JncDNSS4CfgCc28ZvA84GJoFngQsBqmpfkiuAO9u4y6tq3xz6kiTN0azDoaoeAX53ivqTwOlT1Au4eJp9bQY2z7YXSdL88hPSkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqTOIRMOSdYleSjJZJJLx92PJB3JDolwSHIUcA1wFrAaOD/J6vF2JUlHrkMiHIBTgcmqeqSqfgbcCKwfc0+SdMQ6VMLheOCxofe7Wk2SNAZLxt3AS5FkI7Cxvf2fJA/NclfLgP+en65Gl6sW+ogvMJY5j5lzPvwdafOFuc155O0OlXDYDawYen9Cq71AVW0CNs31YEkmqmrNXPezmDjnI8ORNucjbb6wcHM+VC4r3QmsSnJikqOB84CtY+5Jko5Yh8SZQ1U9l+QSYDtwFLC5qu4fc1uSdMQ6JMIBoKq2AdsW6HBzvjS1CDnnI8ORNucjbb6wQHNOVS3EcSRJi8ihcs9BknQIOazDYaav5EjyiiQ3tfV3JFm58F3OnxHm++dJHkhyT5Jbk/zmOPqcT6N+7UqSP0lSSRb9ky2jzDnJue3v+v4kX1noHufbCP+2fyPJbUnuav++zx5Hn/MpyeYkTyS5b5r1SXJ1+zO5J8kp89pAVR2WLwY3tv8T+C3gaODfgNUHjPkw8MW2fB5w07j7PsjzfQfwq235Q4t5vqPOuY17NXA7sBNYM+6+F+DveRVwF7C0vX/9uPtegDlvAj7UllcDj46773mY9x8ApwD3TbP+bODbQIC1wB3zefzD+cxhlK/kWA9sactfA05PkgXscT7NON+quq2qnm1vdzL4PMliNurXrlwBXAX870I2d5CMMucPANdU1VMAVfXEAvc430aZcwGvacu/DvzXAvZ3UFTV7cC+FxmyHrihBnYCxyQ5br6OfziHwyhfyfGLMVX1HPA08LoF6W7+vdSvILmIwW8di9mMc26n2iuq6lsL2dhBNMrf85uANyX55yQ7k6xbsO4OjlHm/CngvUl2MXjq8U8XprWxOqhfO3TIPMqqhZPkvcAa4A/H3cvBlORlwOeA94+5lYW2hMGlpdMYnB3enuR3qurHY+3q4DofuL6qPpvk94C/SXJSVf183I0tVofzmcMoX8nxizFJljA4HX1yQbqbfyN9BUmSPwL+EnhXVf10gXo7WGaa86uBk4B/SvIog+uyWxf5TelR/p53AVur6v+q6vvAfzAIi8VqlDlfBNwMUFX/AvwKg+8gOpyN9N/8bB3O4TDKV3JsBTa05fcA/1jtTs8iNON8k7wF+GsGwbDYr0PDDHOuqqerallVrayqlQzus7yrqibG0+68GOXf9d8xOGsgyTIGl5keWcgm59koc/4hcDpAkt9mEA57F7TLhbcVuKA9tbQWeLqq9szXzg/by0o1zVdyJLkcmKiqrcB1DE4/Jxnc+DlvfB3PzYjz/SvgVcDftvvuP6yqd42t6Tkacc6HlRHnvB04I8kDwPPAX1TVYj0jHnXOHwO+lOTPGNycfv8i/kUPgCRfZRDyy9q9lMuAlwNU1RcZ3Fs5G5gEngUunNfjL/I/P0nSQXA4X1aSJM2S4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6vw/asn5cMei2roAAAAASUVORK5CYII=\n",
1096 | "text/plain": [
1097 | "