├── .DS_Store ├── .gitignore ├── Dockerfile ├── README.md ├── docker-compose.yml └── notebooks ├── 1. Introduction └── 1.2 Getting Rocket Off The Ground.ipynb ├── 2. Basics ├── 2.1 Boosting - wisdom of the crowd (theory).ipynb ├── 2.1.5 Boosting - wisdom of the crowd (practice).ipynb ├── 2.3 Using standard interface.ipynb └── 2.4 Using Scikit-learn Interface.ipynb ├── 3. Going deeper ├── 3.1 Spotting Most Important Features.ipynb ├── 3.2 Bias-variance tradeoff.ipynb ├── 3.3 Hyper-parameter tuning.ipynb ├── 3.4 Evaluate results.ipynb ├── 3.5 Deal with missing values.ipynb └── 3.6 Handle Imbalanced Datasets.ipynb ├── data ├── agaricus.txt.test ├── agaricus.txt.train └── featmap.txt └── images ├── ada-t1.dot ├── ada-t1.png ├── bias-variance.png ├── boosting.png ├── dt.dot ├── dt.png ├── gbc-t1.dot ├── gbc-t1.png ├── practical_xgboost_in_python_notebook_header.png └── underfitting_overfitting.png /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | notebooks/*/.ipynb_checkpoints/ 2 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Parrot Prediction Ltd. 2 | # Distributed under the terms of the Modified BSD License. 3 | FROM jupyter/minimal-notebook 4 | 5 | MAINTAINER Parrot Prediction 6 | 7 | USER root 8 | 9 | # libav-tools for matplotlib anim 10 | RUN apt-get update && \ 11 | apt-get install -y --no-install-recommends libav-tools git && \ 12 | apt-get clean && \ 13 | rm -rf /var/lib/apt/lists/* 14 | 15 | USER jovyan 16 | 17 | # Install Python 3 packages 18 | RUN conda install --quiet --yes \ 19 | 'pandas=0.17*' \ 20 | 'matplotlib=1.5*' \ 21 | 'seaborn=0.7*' \ 22 | 'graphviz=2.38.*' \ 23 | 'scikit-learn=0.17*' 24 | 25 | # Add shortcuts to distinguish pip for python2 and python3 envs 26 | RUN ln -s $CONDA_DIR/envs/python2/bin/pip $CONDA_DIR/bin/pip2 && \ 27 | ln -s $CONDA_DIR/bin/pip $CONDA_DIR/bin/pip3 28 | 29 | # Install XGBoost library 30 | RUN git clone --recursive https://github.com/dmlc/xgboost && \ 31 | cd xgboost && \ 32 | make -j4 && \ 33 | cd python-package; python setup.py install 34 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Practical XGBoost in Python Docker Container 2 | 3 | The following container comes with pre-installed tools required to complete the free [Practical XGBoost in Python](http://education.parrotprediction.teachable.com/courses/practical-xgboost-in-python) online course. 4 | 5 | ## What's included 6 | 7 | - Python 3.5 8 | - Jupyter Notebook 9 | - Numpy 10 | - Pandas 11 | - Scikit-learn 12 | - XGBoost 13 | - Matplotlib 14 | - Seaborn 15 | 16 | [![](https://images.microbadger.com/badges/image/parrotprediction/course-xgboost.svg)](https://microbadger.com/images/parrotprediction/course-xgboost "Get your own image badge on microbadger.com") 17 | 18 | ### Requirements 19 | To successfully spin up container please make sure that `Docker Engine` and `Docker Compose` are properly installed. If you have troubles please refer to [Docker homepage](https://www.docker.com/). 20 | 21 | Verify if everything looks good by typing: 22 | 23 | ```bash 24 | $ docker -v 25 | Docker version 1.11.1, build 5604cbe 26 | 27 | $ docker-compose -v 28 | docker-compose version 1.7.1, build 0a9ab35 29 | ``` 30 | 31 | ### Running container 32 | Use created Docker Compose [file](docker-compose.yml) to provision new container. It will expose Notebook on port 8888 and mount `notebooks/` directory. 33 | 34 | Inside directory: 35 | ```bash 36 | $ docker-compose up -d 37 | ``` 38 | 39 | And in your browser proceed to `localhost:8888`. 40 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '2' 2 | 3 | services: 4 | jupyter: 5 | container_name: course-xgboost 6 | image: parrotprediction/course-xgboost 7 | working_dir: /notebooks 8 | ports: 9 | - "8888:8888" 10 | volumes: 11 | - ./notebooks/:/notebooks 12 | -------------------------------------------------------------------------------- /notebooks/1. Introduction/1.2 Getting Rocket Off The Ground.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "\n", 9 | "# Getting Rocket Off The Ground\n", 10 | "If you are reading this it means that you have successfully run a Jupyter notebook. Well done.\n", 11 | "\n", 12 | "Check if everything works fine by verify libraries version available on this container image.\n", 13 | "\n", 14 | "> **Tip**: To execute the cell press `CTRL+ENTER`." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### Python version" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": false 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "import sys\n", 33 | "print(sys.version)" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "### Numpy version" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": { 47 | "collapsed": false 48 | }, 49 | "outputs": [], 50 | "source": [ 51 | "import numpy\n", 52 | "print(numpy.version.version)" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "### Scikit-learn version" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": { 66 | "collapsed": false 67 | }, 68 | "outputs": [], 69 | "source": [ 70 | "import sklearn\n", 71 | "print(sklearn.__version__)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "\n", 79 | "### XGBoost version" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": { 86 | "collapsed": false 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "import xgboost\n", 91 | "print(xgboost.__version__)" 92 | ] 93 | } 94 | ], 95 | "metadata": { 96 | "kernelspec": { 97 | "display_name": "Python 3", 98 | "language": "python", 99 | "name": "python3" 100 | }, 101 | "language_info": { 102 | "codemirror_mode": { 103 | "name": "ipython", 104 | "version": 3 105 | }, 106 | "file_extension": ".py", 107 | "mimetype": "text/x-python", 108 | "name": "python", 109 | "nbconvert_exporter": "python", 110 | "pygments_lexer": "ipython3", 111 | "version": "3.5.2" 112 | } 113 | }, 114 | "nbformat": 4, 115 | "nbformat_minor": 0 116 | } 117 | -------------------------------------------------------------------------------- /notebooks/2. Basics/2.1 Boosting - wisdom of the crowd (theory).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Boosting - Wisdom of the Crowd (theory)\n", 15 | "**What you will learn**:\n", 16 | "- What is the idea of boosting\n", 17 | "- Why use tree as a weak classifier\n", 18 | "- What are some common boosting implementations\n", 19 | "- How XGBoost helps" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Idea of boosting \n", 27 | "Let's start with intuitive definition of the concept:\n", 28 | "> **Boosting** (*Freud and Shapire, 1996*) - algorithm allowing to fit **many** weak classifiers to **reweighted** versions of the training data. Classify final examples by majority voting.\n", 29 | "\n", 30 | "When using boosting techinque all instance in dataset are assigned a score that tells *how difficult to classify* they are. In each following iteration the algorithm pays more attention (assign bigger weights) to instances that were wrongly classified previously.\n", 31 | "\n", 32 | "boosting\n", 33 | "\n", 34 | "In the first iteration all instance weights are equal.\n", 35 | "\n", 36 | "Ensemble parameters are optimized in **stagewise way** which means that we are calculating optimal parameters for the next classifier holding fixed what was already calculated. This might sound like a limitation but turns out it's a very resonable way of regularizing the model." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "### Weak classifier - why tree? \n", 44 | "First what is a weak classifier?\n", 45 | "> **Weak classifier** - an algorithm **slightly better** than random guessing.\n", 46 | "\n", 47 | "Every algorithm can be used as a base for boosting techinique, but trees have some nice properties that makes them more suitable candidates.\n", 48 | "\n", 49 | "#### Pro's\n", 50 | "- computational scalability,\n", 51 | "- handling missing values,\n", 52 | "- robust to outliers,\n", 53 | "- does not require feature scalling,\n", 54 | "- can deal with irrelevant inputs,\n", 55 | "- interpretable (if small),\n", 56 | "- can handle mixed predictors (quantitive and qualitative)\n", 57 | "\n", 58 | "#### Con's\n", 59 | "- can't extract linear combination of features\n", 60 | "- small predictive power (high variance)\n", 61 | "\n", 62 | "Boosting techinque can try to reduce the variance by **averaging** many **different** trees (where each one is solving the same problem)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": { 68 | "collapsed": true 69 | }, 70 | "source": [ 71 | "### Common Algorithms (warning MATH INCLUDED) \n", 72 | "\n", 73 | "In every machine learning model the training objective is a sum of a loss function $L$ and regularization $\\Omega$:\n", 74 | "\n", 75 | "$$\n", 76 | "obj = L + \\Omega\n", 77 | "$$\n", 78 | "\n", 79 | "The loss function controls the predictive power of an algorithm and regularization term controls it's simplicity.\n", 80 | "\n", 81 | "#### AdaBoost (Adaptive Boosting)\n", 82 | "The implementation of boosting technique using decision tress (it's a *meta-estimator* which means you can fit any classifier in). The intuitive recipie is presented below:\n", 83 | "\n", 84 | "**Algorithm**:\n", 85 | "\n", 86 | "Assume that the number of training samples is denoted by $N$, and the number of iterations (created trees) is $M$. Notice that possible class outputs are $Y=\\{-1,1\\}$\n", 87 | "\n", 88 | "1. Initialize the observation weights $w_i=\\frac{1}{N}$ where $i = 1,2, \\dots, N$\n", 89 | "2. For $m=1$ to $M$:\n", 90 | " - fit a classifier $G_m(x)$ to the training data using weights $w_i$,\n", 91 | " - compute $err_m = \\frac{\\sum_{i=1}^{N} w_i I (y_i \\neq G_m(x))}{\\sum_{i=1}^{N}w_i}$,\n", 92 | " - compute $\\alpha_m = \\log ((1-err_m)/err_m)$,\n", 93 | " - set $w_i \\leftarrow w_i \\cdot \\exp [\\alpha_m \\cdot I (y_i \\neq G_m(x)]$, where $i = 1,2, \\dots, N$\n", 94 | "3. Output $G_m(x) = sign [\\sum_{m=1}^{M} \\alpha_m G_m(x)]$\n", 95 | "\n", 96 | "#### Generalized Boosted Models\n", 97 | "We can take advantage of the fact that the loss function can be represented with a form suitable for optimalization (due to the stage-wise additivity). This creates a class of general boosting algorithms named simply **generalized boosted model (GBM)**.\n", 98 | "\n", 99 | "An example of a GBM is **Gradient Boosted Tree** which uses decision tree as an estimator. It can work with different loss functions (regression, classification, risk modeling etc.), evaluate it's gradient and approximates it with a simple tree (stage-wisely, that minimizes the overall error).\n", 100 | "\n", 101 | "AdaBoost is a special case of Gradient Boosted Tree that uses exponential loss function. You can learn more about GBM in this [video](https://www.youtube.com/watch?v=wPqtzj5VZus&feature=youtu.be)." 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "### How XGBoost helps \n", 109 | "The problem with most tree packages is that they don't take regularization issues very seriously - they allow to grow many very similar trees that can be also sometimes quite bushy. \n", 110 | "\n", 111 | "GBT tries to approach this problem by adding some regularization parameters. We can:\n", 112 | "- control tree structure (maximum depth, minimum samples per leaf),\n", 113 | "- control learning rate (shrinkage),\n", 114 | "- reduce variance by introducing randomness (stochastic gradient boosting - using random subsamples of instances and features)\n", 115 | "\n", 116 | "But it could be improved even further. Enter XGBoost.\n", 117 | "\n", 118 | "> **XGBoost** (*extreme gradient boosting*) is a **more regularized** version of Gradient Boosted Trees.\n", 119 | "\n", 120 | "It was develop by Tianqi Chen in C++ but also enables interfaces for Python, R, Julia. Used for supervised learning problem gave win to [many Kaggle competitions](https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions).\n", 121 | "\n", 122 | "The main advantages:\n", 123 | "- good bias-variance (simple-predictive) trade-off \"out of the box\",\n", 124 | "- great computation speed,\n", 125 | "- package is evolving (author is willing to accept many PR from community)\n", 126 | "\n", 127 | "XGBoost's objective function is a sum of a specific loss function evaluated over all predictions and a sum of regularization term for all predictors ($K$ trees). In the formula $f_k$ means a prediction coming from k-th tree.\n", 128 | "\n", 129 | "$$\n", 130 | "obj(\\theta) = \\sum_{i}^{n} l(y_i - \\hat{y_i}) + \\sum_{k=1}^{K} \\Omega (f_k)\n", 131 | "$$\n", 132 | "\n", 133 | "Loss function depends on the task being performed (classification, regression, etc.) and a regularization term is described by the following equation:\n", 134 | "\n", 135 | "$$\n", 136 | "\\Omega(f) = \\gamma T + \\frac{1}{2} \\lambda \\sum_{j=1}^{T}w_j^2\n", 137 | "$$\n", 138 | "\n", 139 | "First part ($\\gamma T$) is responsible for controlling the overall number of created leaves, and the second term ($\\frac{1}{2} \\lambda \\sum_{j=1}^{T}w_j^2$) watches over the their's scores.\n", 140 | "\n", 141 | "To optimize the objective a gradient descent is used, this leads to a problem of finding an optimal structure of the successive tree. More mathematics about the algorithm is not included in the scope of this course, but pretty decent informations can be found on the package [docs page](http://xgboost.readthedocs.io/) and in [this](http://www.slideshare.net/ShangxuanZhang/xgboost) presentation." 142 | ] 143 | } 144 | ], 145 | "metadata": { 146 | "kernelspec": { 147 | "display_name": "Python 3", 148 | "language": "python", 149 | "name": "python3" 150 | }, 151 | "language_info": { 152 | "codemirror_mode": { 153 | "name": "ipython", 154 | "version": 3 155 | }, 156 | "file_extension": ".py", 157 | "mimetype": "text/x-python", 158 | "name": "python", 159 | "nbconvert_exporter": "python", 160 | "pygments_lexer": "ipython3", 161 | "version": "3.5.2" 162 | } 163 | }, 164 | "nbformat": 4, 165 | "nbformat_minor": 0 166 | } 167 | -------------------------------------------------------------------------------- /notebooks/2. Basics/2.3 Using standard interface.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Using standard interface\n", 15 | "The following notebooks presents the basic usage of native XGBoost Python interface.\n", 16 | "\n", 17 | "**Flight-plan**:\n", 18 | "- load libraries and prepare data,\n", 19 | "- specify parameters,\n", 20 | "- train classifier,\n", 21 | "- make predictions" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Loading libraries\n", 29 | "Begin with loading all required libraries in one place:" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": { 36 | "collapsed": false 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "import numpy as np\n", 41 | "import xgboost as xgb" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "### Loading data\n", 49 | "We are going to use bundled [Agaricus](https://archive.ics.uci.edu/ml/datasets/Mushroom) dataset which can be downloaded [here](https://github.com/dmlc/xgboost/tree/master/demo/data).\n", 50 | "\n", 51 | "> This data set records biological attributes of different mushroom species, and the target is to predict whether it is poisonous\n", 52 | "\n", 53 | "> This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom;\n", 54 | "\n", 55 | "It consist of 8124 instances, characterized by 22 attributes (both numeric and categorical). The target class is either 0 or 1 which means binary classification problem.\n", 56 | "\n", 57 | "> **Important**: XGBoost handles only numeric variables.\n", 58 | "\n", 59 | "Lucily all the data have alreay been pre-process for us. Categorical variables have been encoded, and all instances divided into train and test datasets. You will know how to do this on your own in later lectures.\n", 60 | "\n", 61 | "Data needs to be stored in `DMatrix` object which is designed to handle sparse datasets. It can be populated in couple ways:\n", 62 | "- using libsvm format txt file,\n", 63 | "- using Numpy 2D array (most popular),\n", 64 | "- using XGBoost binary buffer file\n", 65 | "\n", 66 | "In this case we'll use first option.\n", 67 | "\n", 68 | "> Libsvm files stores only non-zero elements in format \n", 69 | "> \n", 70 | "> `\n", 147 | "Let's make the following assuptions and adjust algorithm parameters to it:\n", 148 | "- we are dealing with binary classification problem (`'objective':'binary:logistic'`),\n", 149 | "- we want shallow single trees with no more than 2 levels (`'max_depth':2`),\n", 150 | "- we don't any oupout (`'silent':1`),\n", 151 | "- we want algorithm to learn fast and aggressively (`'eta':1`),\n", 152 | "- we want to iterate only 5 rounds" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 5, 158 | "metadata": { 159 | "collapsed": true 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "params = {\n", 164 | " 'objective':'binary:logistic',\n", 165 | " 'max_depth':2,\n", 166 | " 'silent':1,\n", 167 | " 'eta':1\n", 168 | "}\n", 169 | "\n", 170 | "num_rounds = 5" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "### Training classifier\n", 178 | "To train the classifier we simply pass to it a training dataset, parameters list and information about number of iterations." 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 6, 184 | "metadata": { 185 | "collapsed": false 186 | }, 187 | "outputs": [], 188 | "source": [ 189 | "bst = xgb.train(params, dtrain, num_rounds)" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "We can also observe performance on test dataset using `watchlist`" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 7, 202 | "metadata": { 203 | "collapsed": false 204 | }, 205 | "outputs": [ 206 | { 207 | "name": "stdout", 208 | "output_type": "stream", 209 | "text": [ 210 | "[0]\ttest-error:0.042831\ttrain-error:0.046522\n", 211 | "[1]\ttest-error:0.021726\ttrain-error:0.022263\n", 212 | "[2]\ttest-error:0.006207\ttrain-error:0.007063\n", 213 | "[3]\ttest-error:0.018001\ttrain-error:0.0152\n", 214 | "[4]\ttest-error:0.006207\ttrain-error:0.007063\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "watchlist = [(dtest,'test'), (dtrain,'train')] # native interface only\n", 220 | "bst = xgb.train(params, dtrain, num_rounds, watchlist)" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "### Make predictions" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 8, 233 | "metadata": { 234 | "collapsed": false 235 | }, 236 | "outputs": [ 237 | { 238 | "data": { 239 | "text/plain": [ 240 | "array([ 0.08073306, 0.92217326, 0.08073306, ..., 0.98059034,\n", 241 | " 0.01182149, 0.98059034], dtype=float32)" 242 | ] 243 | }, 244 | "execution_count": 8, 245 | "metadata": {}, 246 | "output_type": "execute_result" 247 | } 248 | ], 249 | "source": [ 250 | "preds_prob = bst.predict(dtest)\n", 251 | "preds_prob" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "Calculate simple accuracy metric to verify the results. Of course validation should be performed accordingly to the dataset, but in this case accuracy is sufficient." 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 9, 264 | "metadata": { 265 | "collapsed": false 266 | }, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "Predicted correctly: 1601/1611\n", 273 | "Error: 0.0062\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "labels = dtest.get_label()\n", 279 | "preds = preds_prob > 0.5 # threshold\n", 280 | "correct = 0\n", 281 | "\n", 282 | "for i in range(len(preds)):\n", 283 | " if (labels[i] == preds[i]):\n", 284 | " correct += 1\n", 285 | "\n", 286 | "print('Predicted correctly: {0}/{1}'.format(correct, len(preds)))\n", 287 | "print('Error: {0:.4f}'.format(1-correct/len(preds)))" 288 | ] 289 | } 290 | ], 291 | "metadata": { 292 | "kernelspec": { 293 | "display_name": "Python 3", 294 | "language": "python", 295 | "name": "python3" 296 | }, 297 | "language_info": { 298 | "codemirror_mode": { 299 | "name": "ipython", 300 | "version": 3 301 | }, 302 | "file_extension": ".py", 303 | "mimetype": "text/x-python", 304 | "name": "python", 305 | "nbconvert_exporter": "python", 306 | "pygments_lexer": "ipython3", 307 | "version": "3.5.2" 308 | } 309 | }, 310 | "nbformat": 4, 311 | "nbformat_minor": 0 312 | } 313 | -------------------------------------------------------------------------------- /notebooks/2. Basics/2.4 Using Scikit-learn Interface.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Using Scikit-learn Interface\n", 15 | "The following notebook presents the alternative approach for using XGBoost algorithm.\n", 16 | "\n", 17 | "**What's included**:\n", 18 | "- load libraries and prepare data,\n", 19 | "- specify parameters,\n", 20 | "- train classifier,\n", 21 | "- make predictions" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Loading libraries\n", 29 | "Begin with loading all required libraries." 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "import numpy as np\n", 41 | "\n", 42 | "from sklearn.datasets import load_svmlight_files\n", 43 | "from sklearn.metrics import accuracy_score\n", 44 | "\n", 45 | "from xgboost.sklearn import XGBClassifier" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "### Loading data\n", 53 | "We are going to use the same dataset as in previous lecture. The scikit-learn package provides a convenient function `load_svmlight` capable of reading many libsvm files at once and storing them as Scipy's sparse matrices. " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": { 60 | "collapsed": false 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "X_train, y_train, X_test, y_test = load_svmlight_files(('../data/agaricus.txt.train', '../data/agaricus.txt.test'))" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "Examine what was loaded" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": { 78 | "collapsed": false 79 | }, 80 | "outputs": [ 81 | { 82 | "name": "stdout", 83 | "output_type": "stream", 84 | "text": [ 85 | "Train dataset contains 6513 rows and 126 columns\n", 86 | "Test dataset contains 1611 rows and 126 columns\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "print(\"Train dataset contains {0} rows and {1} columns\".format(X_train.shape[0], X_train.shape[1]))\n", 92 | "print(\"Test dataset contains {0} rows and {1} columns\".format(X_test.shape[0], X_test.shape[1]))" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 4, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "Train possible labels: \n", 107 | "[ 0. 1.]\n", 108 | "\n", 109 | "Test possible labels: \n", 110 | "[ 0. 1.]\n" 111 | ] 112 | } 113 | ], 114 | "source": [ 115 | "print(\"Train possible labels: \")\n", 116 | "print(np.unique(y_train))\n", 117 | "\n", 118 | "print(\"\\nTest possible labels: \")\n", 119 | "print(np.unique(y_test))" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "### Specify training parameters\n", 127 | "All the parameters are set like in the previous example\n", 128 | "- we are dealing with binary classification problem (`'objective':'binary:logistic'`),\n", 129 | "- we want shallow single trees with no more than 2 levels (`'max_depth':2`),\n", 130 | "- we don't any oupout (`'silent':1`),\n", 131 | "- we want algorithm to learn fast and aggressively (`'learning_rate':1`), (in naive named `eta`)\n", 132 | "- we want to iterate only 5 rounds (`n_estimators`)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": { 139 | "collapsed": true 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "params = {\n", 144 | " 'objective': 'binary:logistic',\n", 145 | " 'max_depth': 2,\n", 146 | " 'learning_rate': 1.0,\n", 147 | " 'silent': 1.0,\n", 148 | " 'n_estimators': 5\n", 149 | "}" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "### Training classifier" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 6, 162 | "metadata": { 163 | "collapsed": false 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "bst = XGBClassifier(**params).fit(X_train, y_train)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "### Make predictions" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 7, 180 | "metadata": { 181 | "collapsed": false 182 | }, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "array([ 0., 1., 0., ..., 1., 0., 1.])" 188 | ] 189 | }, 190 | "execution_count": 7, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "preds = bst.predict(X_test)\n", 197 | "preds" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "Calculate obtained error" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 8, 210 | "metadata": { 211 | "collapsed": false 212 | }, 213 | "outputs": [ 214 | { 215 | "name": "stdout", 216 | "output_type": "stream", 217 | "text": [ 218 | "Predicted correctly: 1601/1611\n", 219 | "Error: 0.0062\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "correct = 0\n", 225 | "\n", 226 | "for i in range(len(preds)):\n", 227 | " if (y_test[i] == preds[i]):\n", 228 | " correct += 1\n", 229 | " \n", 230 | "acc = accuracy_score(y_test, preds)\n", 231 | "\n", 232 | "print('Predicted correctly: {0}/{1}'.format(correct, len(preds)))\n", 233 | "print('Error: {0:.4f}'.format(1-acc))" 234 | ] 235 | } 236 | ], 237 | "metadata": { 238 | "kernelspec": { 239 | "display_name": "Python 3", 240 | "language": "python", 241 | "name": "python3" 242 | }, 243 | "language_info": { 244 | "codemirror_mode": { 245 | "name": "ipython", 246 | "version": 3 247 | }, 248 | "file_extension": ".py", 249 | "mimetype": "text/x-python", 250 | "name": "python", 251 | "nbconvert_exporter": "python", 252 | "pygments_lexer": "ipython3", 253 | "version": "3.5.2" 254 | } 255 | }, 256 | "nbformat": 4, 257 | "nbformat_minor": 0 258 | } 259 | -------------------------------------------------------------------------------- /notebooks/3. Going deeper/3.1 Spotting Most Important Features.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Spotting Most Important Features\n", 15 | "The following notebook presents how to distinguish the relative importance of features in the dataset.\n", 16 | "Using this knowledge will help you to figure out what is driving the splits most for the trees and where we may be able to make some improvements in feature engineering if possible.\n", 17 | "\n", 18 | "**What we'll be doing**:\n", 19 | "- loading libraries and data,\n", 20 | "- training a model,\n", 21 | "- knowing how a tree is represented,\n", 22 | "- plotting feature importance" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "source": [ 31 | "### Load libraries\n", 32 | "The purpose of this step is to train simple model.\n", 33 | "Let's begin with loading all libraries in one place." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": { 40 | "collapsed": false 41 | }, 42 | "outputs": [ 43 | { 44 | "name": "stderr", 45 | "output_type": "stream", 46 | "text": [ 47 | "/opt/conda/lib/python3.5/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.\n", 48 | " \"`IPython.html.widgets` has moved to `ipywidgets`.\", ShimWarning)\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "%matplotlib inline\n", 54 | "\n", 55 | "import xgboost as xgb\n", 56 | "import seaborn as sns\n", 57 | "import pandas as pd\n", 58 | "\n", 59 | "sns.set(font_scale = 1.5)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Load data" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "Load agaricus dataset from file" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 2, 79 | "metadata": { 80 | "collapsed": false 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "dtrain = xgb.DMatrix('../data/agaricus.txt.train')\n", 85 | "dtest = xgb.DMatrix('../data/agaricus.txt.test')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "### Train the model" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "Specify training parameters - we are going to use 5 stump decision trees with average learning rate." 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 3, 105 | "metadata": { 106 | "collapsed": true 107 | }, 108 | "outputs": [], 109 | "source": [ 110 | "# specify training parameters\n", 111 | "params = {\n", 112 | " 'objective':'binary:logistic',\n", 113 | " 'max_depth':1,\n", 114 | " 'silent':1,\n", 115 | " 'eta':0.5\n", 116 | "}\n", 117 | "\n", 118 | "num_rounds = 5" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "Train the model. In the same time specify `watchlist` to observe it's performance on the test set." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": { 132 | "collapsed": false, 133 | "scrolled": false 134 | }, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "[0]\ttest-error:0.11049\ttrain-error:0.113926\n", 141 | "[1]\ttest-error:0.11049\ttrain-error:0.113926\n", 142 | "[2]\ttest-error:0.03352\ttrain-error:0.030401\n", 143 | "[3]\ttest-error:0.027312\ttrain-error:0.021495\n", 144 | "[4]\ttest-error:0.031037\ttrain-error:0.025487\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "# see how does it perform\n", 150 | "watchlist = [(dtest,'test'), (dtrain,'train')] # native interface only\n", 151 | "bst = xgb.train(params, dtrain, num_rounds, watchlist)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### Representation of a tree" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "Before moving on it's good to understand the intuition about how trees are grown.\n", 166 | "\n", 167 | "> *While building a tree is divided recursively several times (in this example only once) - this operation is called **split**. To perform a split the algorithm must figure out which is the best (one) feature to use*.\n", 168 | "\n", 169 | "> *After that, at the bottom of the we get groups of observations packed in the **leaves**.*\n", 170 | "\n", 171 | "> *In the final model, these leafs are supposed to be **as pure as possible** for each tree, meaning in our case that each leaf should be made of one label class.*\n", 172 | "\n", 173 | "> *Not all splits are equally important. Basically the first split of a tree will have more impact on the purity that, for instance, the deepest split. Intuitively, we understand that the first split makes most of the work, and the following splits focus on smaller parts of the dataset which have been missclassified by the first tree.*\n", 174 | "\n", 175 | "> *In the same way, in Boosting we try to optimize the missclassification at each round (it is called the loss). So the first tree will do the big work and the following trees will focus on the remaining, on the parts not correctly learned by the previous trees.*\n", 176 | "\n", 177 | "> *The improvement brought by each split can be measured, it is the gain.*\n", 178 | "\n", 179 | "> ~ Quoted from the Kaggle Tianqi Chen's Kaggle [notebook](https://www.kaggle.com/tqchen/otto-group-product-classification-challenge/understanding-xgboost-model-on-otto-data).\n", 180 | "\n", 181 | "Let's investigate how trees look like on our case:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 5, 187 | "metadata": { 188 | "collapsed": false 189 | }, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "0:[odor=pungent] yes=2,no=1,gain=4000.53,cover=1628.25\n", 196 | "\t1:leaf=0.647758,cover=924.5\n", 197 | "\t2:leaf=-0.93331,cover=703.75\n", 198 | "\n", 199 | "0:[odor=musty] yes=2,no=1,gain=1377.22,cover=1404.2\n", 200 | "\t1:leaf=-0.339609,cover=1008.21\n", 201 | "\t2:leaf=0.75969,cover=395.989\n", 202 | "\n", 203 | "0:[gill-size=narrow] yes=2,no=1,gain=1210.77,cover=1232.64\n", 204 | "\t1:leaf=0.673358,cover=430.293\n", 205 | "\t2:leaf=-0.365203,cover=802.35\n", 206 | "\n", 207 | "0:[stalk-surface-above-ring=smooth] yes=2,no=1,gain=791.959,cover=1111.84\n", 208 | "\t1:leaf=-0.277529,cover=765.906\n", 209 | "\t2:leaf=0.632881,cover=345.937\n", 210 | "\n", 211 | "0:[odor=pungent] yes=2,no=1,gain=493.704,cover=981.683\n", 212 | "\t1:leaf=0.275961,cover=638.373\n", 213 | "\t2:leaf=-0.46668,cover=343.31\n", 214 | "\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "trees_dump = bst.get_dump(fmap='../data/featmap.txt', with_stats=True)\n", 220 | "\n", 221 | "for tree in trees_dump:\n", 222 | " print(tree)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "For each split we are getting the following details:\n", 230 | "\n", 231 | "- which feature was used to make split,\n", 232 | "- possible choices to make (branches)\n", 233 | "- **gain** which is the actual improvement in accuracy brough by that feature. The idea is that before adding a new split on a feature X to the branch there was some wrongly classified elements, after adding the split on this feature, there are two new branches, and each of these branch is more accurate (one branch saying if your observation is on this branch then it should be classified as 1, and the other branch saying the exact opposite),\n", 234 | "- **cover** measuring the relative quantity of observations concerned by that feature\n" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "### Plotting" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "Hopefully there are better ways to figure out which features really matter. We can use built-in function `plot_importance` that will create a plot presenting most important features due to some criterias. We will analyze the impact of each feature for all splits and all trees and visualize results.\n", 249 | "\n", 250 | "See which feature provided the most gain:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 6, 256 | "metadata": { 257 | "collapsed": false 258 | }, 259 | "outputs": [ 260 | { 261 | "data": { 262 | "text/plain": [ 263 | "" 264 | ] 265 | }, 266 | "execution_count": 6, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | }, 270 | { 271 | "data": { 272 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjEAAAGBCAYAAACXTeAcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XlYVNX/B/D3ZQBZZFVkURBFATfEBckd3DVxzSUVzdJc\nIZfKPSsrNTVMSTFDMVwxc0PJNDWzcFfSxA35CoogCLLv3N8f/JichmUGR/Di+/U8Pcm55577mcMo\nb+6ce68giqIIIiIiIonRqu4CiIiIiCqDIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiI\nJIkhhogqrbCwEM7OzliyZEl1l0JEryHt6i6A6FVz4cIFjB8/vtRtgiBgz549cHFxeak1BAUFwdTU\nFEOGDHmpx9EEQRAgCEJ1l6Ex586dw6VLlzBx4kQYGhpWdzlEVA6GGKIyDBw4EN27d1dqt7Oze+nH\n3rp1KxwcHF75ECOTyRAREQFt7ZrzT8m5c+ewadMmjBgxgiGG6BVXc/7lIdKwFi1awMvLq7rL0Lj8\n/HyIoghdXV2NjKepcapbZmYmDA0NwZuYE0kH18QQvaDQ0FC8/fbbaNu2LVxdXTFq1CgcP35cqd/h\nw4cxdepUeHp6olWrVujYsSN8fHxw9+5deZ+SNSZPnjzBX3/9BWdnZzg7O6NZs2ZISEgodw3K3r17\n4ezsjCtXrsjb/Pz84OzsjPv37+PLL79Et27d4Orqihs3bsj7/PHHH3j33XfRvn17uLi4YPDgwQgJ\nCVHptZdWz/Nt4eHhGDVqFFxdXeHh4YHAwEAAQGpqKhYsWICOHTvC1dUV06dPR1JSksLYz9f++eef\no0uXLmjdujVGjx6N8+fPl1rP7t27MXToULRu3Rpubm547733cPXq1TJr/uuvv/D222+jTZs28PHx\nwUcffYRNmzYBALp37y6f/4CAAABAQkICli9fjsGDB8PNzQ0uLi4YOHAgAgMDUVRUVOr349KlS9i8\neTN69eqFVq1aoV+/fjh48GCp9YeHh2PSpElwd3eHi4sLevXqhSVLliAtLU2hn6rvOaKajmdiiMqQ\nnZ2NlJQUhTZdXV2FjxhWr16NH374AR4eHpg1axa0tLRw7Ngx+Pj44LPPPsOoUaPkfXfs2AELCwuM\nHj0aderUQUxMDPbs2YO3334b+/fvh62tLWQyGVatWoUvvvgC9erVw5QpU+RnBkxNTSus+b9rU0rW\nq8yZMwcGBgZ47733IAgC6tatCwDYuXMnli1bhrZt22LGjBnQ09PD2bNn8cknn+DRo0eYPXt2pefv\n+vXrOH78OEaNGoUhQ4bg6NGjWL16NfT09BASEgJ7e3v4+vrif//7H7Zv344FCxZg8+bNSrV/+OGH\n0NXVxfvvv4/09HTs3r0b7733HrZs2YIOHTrI+69YsQJBQUFo06YN5s6di/T0dOzZswfe3t74/vvv\n0alTJ4X6rl27hrCwMIwYMQLDhg2DlpYWHBwckJWVhZMnT2LJkiUwNjYGADRr1gwAEBkZiZMnT6JX\nr16ws7NDfn4+fv/9d6xatQpxcXGlhstVq1YhLy8PY8aMgba2Nnbu3In58+ejUaNGCmurduzYgS++\n+ALW1tYYM2YMbGxsEBcXh5MnT+LJkyfyWtR5zxHVeCIRKTh//rzo5OQkOjs7i05OTgr/zZkzR94v\nIiJCdHJyEtevX680xpQpU0Q3NzcxOztb3vb8n0vcvXtXbNGihfjFF18otHfr1k2cOHGiUv+CggLR\nyclJXLx4sdK2kJAQ0dnZWbx8+bK8zc/PT3RychInTpwoFhUVKfSPj48XW7ZsKc6fP19prM8++0xs\n0aKFGBcXp7StonpK2po3by7evHlT3p6bmyt27NhRdHZ2FlesWKEwzrJly0RnZ2cxJiZGqfa3335b\nLCgokLc/evRIdHV1Fb28vORt9+7dE52cnERvb2+FvvHx8WLbtm3F3r17K9Xn7OwsXrx4Uek1+fn5\nic7OzmJ8fLzSttzc3FLnYc6cOWKLFi3Ep0+fyttCQkJEJycncfjw4Qo1xcXFiS1atBA//vhjhdfU\nokULcdCgQWJmZmapxxBF9d9zRDUdP04iKsPIkSOxdetWhf+mTZsm337o0CFoaWlh8ODBSElJUfjP\n09MT6enpiIiIkPfX09OT/zkjIwMpKSkwNzdHw4YN8ffff7+01yEIAiZMmKB0liYsLAwFBQUYNmyY\nUv0eHh4oKChAeHh4pY/brl07+RkMoPgsVqtWrQAA3t7eCn3bt28PAHjw4IFS7RMnToRMJpO32djY\nYMCAAbh79y5iYmIAAMePH4cgCJg8ebJCX0tLSwwdOhSxsbG4deuWwtgtWrSQH1dVz6//yc/PR2pq\nKlJSUtC5c2cUFhbin3/+Uap/3LhxCjVZW1vDzs5O4bUePXoUhYWF8PHxgYGBQZnHV/c9R1TT8eMk\nojLY29ujY8eOZW6Pjo5GUVERevfuXep2QRDw9OlT+dc3btzAt99+i0uXLiE7O1vpWC9TaePfv38f\noigqBYoSgiAorVNRR4MGDZTajI2NoaWlBRsbG4V2ExMTiKKIZ8+eKe3TuHFjpbYmTZoAAGJjY2Fn\nZ4dHjx4ptJfW9+HDh3B2dpa3V2bOCwoKEBAQgMOHDyMmJkZhEbAgCEprV4DS58HU1FThvVESxp6v\nrzTqvueIajqGGKJKEkUR2traCus4/svR0REA8OjRI3h7e8PU1BQ+Pj5o2LCh/DfuZcuWobCwUKVj\nlnc/lvLGeP4s0PP1C4KANWvWwMzMrNT9GjZsqFJdpXn+7MPzynsNYhVeGaSvr6/2Pl988QV2794N\nLy8vTJ8+Hebm5tDW1sb169fh5+entLgXALS0NHfCW533HNHrgCGGqJIaNmyI8PBw1K9fv8J7x/z6\n66/IycnBmjVr0LZtW4VtKSkp8kWbJcr6Qa+lpQUjIyOkpqYqbSv5bV6d+gHAzMys3DNO1S0qKgoO\nDg4KbSVXdNna2ir8/+7du7C2tlboe+/ePQClnxEpTXkh6/Dhw+jYsSNWrVqlVOOLKDkrdOvWrXLr\nVOc9R/Q64JoYokoaPHgwRFHEmjVrSv0N/PnT+iW/jf+3386dO5WugAIAAwODUj9aAYp/kF25cgV5\neXnytpSUlDIv2y3LgAEDoK2tjW+//VZhrBLp6enIz89Xa0xNE0URW7duRUFBgbzt0aNHCAsLQ9Om\nTeU/yHv27AlRFBEYGKhwRiohIQEHDhyAnZ1dhR/VlCg5Q1ZaUJTJZEpnizIyMvDjjz+q/dqe169f\nP8hkMqxfvx6ZmZll9lPnPUf0OuCZGKJKKrm/ycaNGzF06FD07dsXFhYWSExMxPXr1xEeHo5r164B\nADw8PODn54cPP/wQY8aMgZGRES5fvow///yz1N+8W7dujYMHD2L9+vVo1KgRtLS00KtXL+jq6mLc\nuHGYP38+xo8fDy8vL6SmpmLv3r2wtbVFcnKyyvXb2Njgk08+waeffooBAwZg0KBBsLa2RkpKCm7d\nuoVTp07hl19+gaWlpcbmrDLy8vIwduxYDBgwQH6JdX5+PhYtWiTv4+DggIkTJyIoKAjjxo1D//79\n5ZdY5+bmYunSpSofz9XVFaIoYuXKlfDy8oKuri6cnJzg4OCAPn36YN++fZg7dy7eeOMNPHnyBD//\n/DPMzc0RGxurNJaqH4/Z2Nhg/vz5+PLLL+Hl5YUhQ4bA2toa8fHx+O2337B69Wo0adJErfcc0euA\nIYaoFKo+D8jX1xetWrXC9u3bsW3bNmRnZ6Nu3bpo2rQpFi9eLO/XsGFDbN68GX5+fti0aRO0tbXR\ntm1bbN++HUuWLFH6DXru3LnIzMzE9u3bkZ6eDlEUcfr0aVhaWmLIkCFITEzErl27sGLFCjRs2BCz\nZs1CXl6e2lemjBgxAg4ODtiyZQt2796N9PR0mJmZoXHjxpg9ezbMzc0rNVeVeZ5Saf0FQcDq1aux\nfft2bN68Genp6WjWrBnWrFkDd3d3hb7z5s1Do0aNsGvXLqxZswY6OjpwdXXFzJkz4erqqnJ9bm5u\nmDNnDkJCQrB48WIUFhbigw8+gIODAxYtWgQjIyMcO3YMJ06cgLW1NcaOHQsnJydMmjRJpddUlnHj\nxsHe3h5btmxBcHAw8vPzUa9ePXTq1EkhSKr6niN6HQhiVa6kIyJS0dq1a7Fp0yZ5eCMi+i+uiSEi\nIiJJYoghIiIiSWKIISIiIknimhgiIiKSpNf26qSCgkKkpGRVdxk1npmZAee5CnCeqwbnuepwrjXP\nwsKoukvQuNf24yRt7dJviU6axXmuGpznqsF5rjqca1LFaxtiiIiISNoYYoiIiEiSGGKIiIhIkhhi\niIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKI\niIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiI\niEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiI\nSJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhI\nkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiS\nGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJIYYoiIiEiSGGKIiIhIkhhiiIiISJK0\nq7sAAPD390dISAiSkpLQv39/GBkZITw8HAkJCahbty48PT3h6+sLIyMjhf1CQkIQGBiIuLg42Nra\nYurUqRg0aJBKx7xz5w6SkzNexsuh56Sk1OY8VwHOc9XgPFcdqc21vX1jyGQyxMbG4Nixo7h48Twe\nPXqIvLw81K9fH56evTBy5Bjo6enJ9zl27Cj++ussbt+ORFJSIkxMTNG0qSPGj38XzZu3LPd4ubk5\nGDduFOLj4zB8+EjMmvWRwvauXd3K3X/27NmYMmVKuX0SExOxfft2/PPPP/jnn3+QkpKCoUOHYvny\n5aX2DwsLw5kzZ3Dz5k1ERUWhoKAAJ0+ehI2NjVJfb29vXLx4scxjd+7cGYGBgeXWB7wCIebGjRvw\n9/fH3Llz0aFDB/z999/4+eefMX78eDg6OiI2NhZ+fn6IiIhASEiIfL/Q0FAsXboU77//Ptzd3XHm\nzBnMmzcPhoaG6NmzZ4XH9V6wEwYm9V7mSyMiotdAVuoTfPvRIDg4NMWRI4ewf/9edOnSDX369Ie2\ntjauXLmEzZs34tSpE9i0KQi6urrIy8vDF18sRdOmTujVqy+srW3w9GkSDhzYh6lT38XixZ+jT59+\nZR5z8+YApKU9gyAIpW5fsmSZUpuxsR7Wr1+P2NhYeHp6Vvi6oqOjsWnTJtjY2MDFxQVnzpwpt//O\nnTtx/fp1ODs7w87ODtHR0WX2nT59OpKSkpTajxw5gt9//x09evSosD7gFQgxUVFREAQBY8aMgaGh\nIezt7eHt7S3f7ubmBktLS0yaNAmXLl1C+/btARSfvRk8eDBmz54NAOjUqRPi4uKwdu1alUKMgUk9\n1Dar/3JeFBERvZY8PXth/PiJMDAwlLcNHjwMDRrYIjh4K0JDD2LYsBGQyWTw9/8erVu3Udjfy2sI\nvL1H4rvv/MoMMbdv38LevbswY8YHWL/er9Q+pe1bVJSFhw8folWrVnB0dKzwtbRs2RLh4eEwMzND\nSkoKOnbsWG7/VatWoV69etDS0sKyZcvKDTFljbVhwwbo6urCy8urwvqAal4Ts2DBAsybNw8A0K5d\nOzRr1gx37txR6tesWTOIoognT54AAHJycvDgwQOlSejcuTPu3buHx48fv/ziiYiI/sPJyVkhwJTo\n2bM3RFHE/ftRAACZTKYUYADAzMwcrq5tkZKSgpSUZKXtRUVFWLnyC3Ts2BndulV8NuV5P/30E0RR\nxIgRI1Tqb2BgADMzM5XHt7KygpZW5WPFpUuXEB0djd69e8PY2Filfar1TMz06dNhZWWFgIAABAcH\no1atWnBwcFDqd/XqVQiCAHt7ewBAXl4eRFGEjo6OQr+Sr6OiomBtbf3S6yciIlJFQkICAMDc3LzC\nvk+ePIG2tg5q1zZS2rZ79w7ExsZg+fLVEEVRrRr2798PfX19DBgwQK39qspPP/0EQRBUDllANZ+J\nsbW1hZ2dHYDi01YuLi4wNFRMsDk5OVi9ejU6dOiA5s2bAwCMjY1hYmKCGzduKPSNiIgAAKSmplZB\n9URERBUrKirCtm2B0NbWRu/eZa9zAYDw8LOIjPwHvXr1UfpFPS7uEbZu/R4TJ06GpaWVWjVcunQB\nDx8+RP/+/ZV+zr4KMjIycOzYMTRo0ADu7u4q71fta2IqsnDhQqSkpOCHH35QaB89ejSCg4PRpk0b\n+cLeQ4cOAUCZC52IiIiq2rffrsbNmzcwZcoM2NraldkvNjYGy5YtRb16lpgx4wOl7atXL0f9+rYY\nNWqM2jUcPnwAgiDgrbfeUnvfqhAaGors7Gy163ulQ8zXX3+N3377DVu3bkX9+oqLcKdNm4aYmBj4\n+vpCFEWYmprC19cXX3/9NSwsLKqpYiIieh2Zm9eGhYXyxz9r167Fzz/vxejRozFr1swy94+NjcWc\nOTOgo6ONLVsC0aSJrcL2gwcP4vLli9ixYwesrEwBAHl5aQAAfX3dUo9dIjU1FWfP/g4HBwe0aaO8\nDudV8NNPP0FbWxtDhw5Va79XNsQEBQUhKCgIfn5+aNu2rdJ2PT09+Pn5YcmSJUhOToadnR1OnToF\nHR0d+cdOREREVSE5OQOJiekKbYGBmxAU9AMGDhyMGTPmKm0v8fhxHHx8piA7OwvffhsAExNLhb75\n+flYvnwF3nijEwRBD9euRQIAnjwpXmeTmJiMa9ciYWJiitq1ayuN/9NPIcjLy3tlz8LcuXMHN27c\ngKenJ+rVU+/WJ69kiDl06BBWrlyJhQsXom/fvuX2NTc3h7m5OYqKirB7927069fvlfy8j4iIXh8l\nAWbAAC/Mm7e4zH7PB5i1azegSZOmSn1yc3Px7FkKwsP/xF9/nVXYJggCjh07il9/DcP06b4YPXqc\n0v6hoYego6Oj8s1gq1pISIjaC3pLvHIh5sKFC1i4cCG6dOkCFxcX+WJdoPjyLUtLSwDA6dOn8ejR\nIzg4OODp06fYu3cvoqOjsXLlyuoqnYiICFu3bkZQ0A/o338gFiz4pMx+8fGP4es7FZmZmVi7dgOa\nNnUqtZ+enh6++EL5Z1tKSgrWrCk+QzNw4BA4ODRR6nPrViSiou7Cw6NnmVdGFRQUICYmBvr6+lV+\nZW9eXh4OHz6MOnXqwMPDQ+39X8kQU1hYiLNnz+LsWcXEOWPGDMycWfyZokwmw549exAbGwtdXV10\n7doVK1asUPtUFBERkabs2xeCLVu+h5WVNdq2bY9ffw1T2G5mZg43N3dkZWXBx2cqEhLiMXz4KDx4\nEI0HDxRvDufm9gbMzMygra2N7t2V72AbH198T7T69Ruge/fS7xkTGlq8oNfLa0iZNSckJGDAgAHo\n0KEDfvzxR4VtGzZsgCAIyM7OBgDcvn0bGzdu/P/63OQ3oAWK7/NS8iiBkquHg4OD5fd8mTZtmtKx\nT5w4gdTUVLz//vuVusdMtYeYoUOHKizkmTlzpjyolKdr167o2rVrpY+blfqk0vsSERGVeP7nye3b\nkRAEAQkJ8fjqq8+U+rq6toWbmzvS0lKRkFAcQvbt21PquOvWBVR4s7niq3FLvyI3NzcXJ078CktL\nK3To8EaF45R2Ze+6devk7YIgIDIyEpGRxWtyZsyYoRBizp07h++++05h/6CgIPm+pYWYffv2QUtL\nC8OHDy+3vjLrFtW9W04NwQdAVg1zc2k9xE2qOM9Vg/NcdaQ21yUPgHyVlXcFk1S9tiEGQJkrxUlz\nLCyMOM9VgPNcNTjPVYdzrXk1McRU6x17iYiIiCqLIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgk\niSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJ\nIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkh\nhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGG\niIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaI\niIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiI\niCRJu7oLAAB/f3+EhIQgKSkJ/fv3h5GREcLDw5GQkIC6devC09MTvr6+MDIyku/To0cPxMXFKY0l\nCAL++OMP1K1btypfAhG9goKDt+LOndu4fTsSjx/HwcrKBnv3Hiy17+7d2/Hnn38gJuYB0tPTYGRk\njIYN7fHWW6PRrZuHQt8tW77H1q2byzyutrY2Tp0Kl3/dtatbuXW+//50eHtPLLfPtWtX8Ntvx/H3\n31cRH/8YOjq6sLW1w/DhI9GrV1+Fvunp6QgLC8W5c3/if/+LRmrqM1haWsHVtS3eeWcS6tWzLPdY\nRFJR7SHmxo0b8Pf3x9y5c9GhQwf8/fff+PnnnzF+/Hg4OjoiNjYWfn5+iIiIQEhIiHy/DRs2IC8v\nT2GsRYsWQUdHR6UAc+fOHSQnZ2j89ZCilJTanOcqwHlWZG/fGDKZDN9/vwEmJiZwdHRGRkb58xMZ\n+Q+srW3QsWMXmJqaIi0tFadO/YZFiz7CpElTMWHCe/K+Hh490KCBndIY9+7dwa5dwejcuZtC+5Il\ny0o95pYtmxAX9widO3et8DVt3LgeiYlP0K2bJxwcmiA7OxsnTx7HZ58txpUrl/Dxx4vkfW/evIHv\nvluL9u3d8dZbo2BiYor796Nw8OA+nDx5Aps2bUXDhvYVHpPoVSeIoihWZwEHDx7E/PnzcenSJRga\nGiI1NRUmJiYKff78809MmjQJwcHBaN++fanjJCUloXv37pgzZw7ee++9Uvs8z334pzAwqaeR10BE\nr46s1Cf49qNBcHBoiseP42BtbQMAGD9+FLKzc8o8E1OaoqIivPvuODx+/Ai//HIa9eoZIzExvcz+\nX3/9JUJDD+Lrr9fijTc6lTt2YuITvPWWF5ycmuH774MqrCUi4ipcXFwhCIJC+8yZ7+Pvv69h27bd\naNSoMQAgPj4eRUWFsLGpr9D30qULmD17Bjw8emLZshUVHrM6WVgYlTvXpD4LC6OKO0lMtZ6JWbBg\nAfbv3w9BENCuXTsIgoAff/wRbm6Kp16bNWsGURTx5MmTMsc6evQoRFHEgAEDVDq2gUk91DarX3FH\nIpKskgBTWVpaWrCwsEB0dBQKCgrK7ZuTk4OTJ4/DwqIe3N07Vjj2kSOHIIoivLyGqFRL69ZtSm33\n9OyJv/++hvv3o+QhxsrKqtS+7dt3gLGxMaKjo1Q6JtGrrlpDzPTp02FlZYWAgAAEBwejVq1acHBw\nUOp39epVCIIAe3v7Msc6evQoXF1dYW1t/RIrJqKaLi0tDUVFRUhNfYaTJ4/j/PlwtGvnBh0dnXL3\nO3nyODIzMzFixNtKZ0tKc/ToYejp6aNnzz4vVG9CQgIAwNzcvMK+mZkZyMrKQuPGTV7omESvimoN\nMba2trCzK/5cuWXLltDX11fqk5OTg9WrV6NDhw5o3rx5qePExcUhIiICixcvfqn1ElHNN2bMMKSm\npgIAZDIZPDx6Yu7ceRXuFxp6EFpaWnjzzUEV9r106QIeP47Dm28OgoGBQaVrTUpKxOHDB2Bj0wAu\nLq4V9t+2LRCFhYUYMMCr0sckepVoJMRkZ2fj6NGjyM3NRffu3VG/vuY+plm4cCFSUlLwww8/lNkn\nNDQUMpkM/fv319hxiej19OWXq5GXl4ukpEScOnUCubk5yMzMhImJaZn7xMQ8wPXrEXBzc4eVVcVn\ngw8fPgBBEDBw4OBK15mbm4MFCz5ETk42vv7aDzKZrNz+p06dwO7dO/DGG53Qv//ASh+X6FWidohZ\nuHAhIiIicOTIEQBAfn4+3n77bdy6dQsAYGRkhG3btpV51kQdX3/9NX777Tds3bq13GAUFhYGd3d3\nlU6nEhGVp3Xrf89o9O8/EJ9+ugjTpr2HHTt+KnNhZGhoSSipeH1LWloa/vjjdzRsaI+WLV0qVWNe\nXh7mz5+LO3duYfHiz9CqVety+4eHn8WyZZ/A2bk5Pv30q0odk+hVpHaIOX/+PAYO/DfFHz16FLdu\n3cLatWvh5OSEmTNnwt/fHxs2bHihwoKCghAUFAQ/Pz+0bdu2zH7R0dGIjIzE8uXLX+h4RFRzmJvX\nVgoc2toyyGSC2ldojB49Ar/99iuuXPkLjRoNV9q/sLAQx4//AlNTUwwbNrDCtTO//HIA+fl5GD16\nVKWuFsnLy8O0abNw5colfPXVVxgypPzgdObMGSxePA+Ojo4ICgpSuN/Wq64mXk1DmqV2iElKSkKD\nBg3kX588eRKtWrVCv379AAAjRowo96MfVRw6dAgrV67EwoUL0bdv33L7hoaGQldXF717936hYxJR\nzZGcnKF0eW5BQSEKC0W1L9t98uQZAODRo+KrI/+7/++/n0JSUhJGjhyDZ89yAOSUO97u3SHQ0dFB\n58491a6l5AzM5csX8PHHiyoc49y5v7Bw4Uewt2+EVavWIycHyMmRxmXLvMRa82piKFQ7xOjr6yM7\nOxsAIIoizp07hzFjxihsT0+v/BvvwoULWLhwIbp06QIXFxdERETIt1lZWcHSUvFOk2FhYejWrRtq\n165d6WMS0estJycHoigqXVxQVFSEfftCIAgCWrRoWeq+R44chCAIKi3ovXUrElFRd+Hh0ROmpqWv\nsSkoKEAAhYseAAAgAElEQVRc3EPUqqUHS8t/L5XOz8/HggUf4vLlC/joowUVHu/ChXNYtOgj2Nvb\nY+3aDZI6A0OkKrVDTIsWLXDw4EEMGjQIv/76K9LS0uDp6SnfHhMTgzp16lS6oAsXLqCwsBBnz57F\n2bNnFbbNmDEDM2fOlH9969YtREdHw8fHp9LHI6Ka69ixo4iPfwxRFPHs2TMUFBRg27ZAAICVlTX6\n9i2+r1RsbAx8fN6Hh0dP2Nk1hLGxMRITE3HixDHExsagf/+BpV79k5SUiPPnw9G8eUs0bqx8e4j/\nKlk7U969YZKSEjF27Ai0adMO69YFyNs/+2wRLlwIh5ubO3R1a+HXX8MU9nNwaAoHh+JLp2/disT8\n+XMhCED//l44d+5PpeP06cMLIUj61A4xs2bNwqRJk9CxY0eIooi+ffvCxeXfxWknTpwodw3Lfw0d\nOhRDhw6Vfz1z5kyFoFIeZ2dnREZGql48Eb1WQkMPIiLiqkJbYOAmAICra1t5iKlXrx769XsTERFX\n8ccfp5GVlQVDw9pwdHTCxImTlZ5NVCIsLBSiKKp0lVFubi5OnPgVlpZW6NDhjXL7CoKgdK+Z27dv\nQRAEXLp0AZcuXVDaZ+LEyfIQU3xzvnwAwPr135R6DIYYqgkq9diB5ORkXLlyBcbGxujQoYO8PS0t\nDfv370eHDh3QrFkzjRaqaXzsAFHN9PxjBzSN6zSqDuda82rimphqf3ZSdeEDIKuGuTkfTFgVOM+K\nSh4AqWn8wVp1ONeaVxNDTKVudpeXl4f9+/fjwoULSE5OxkcffYTmzZsjNTUVx48fR+fOnV/52/87\nOjryL0gV4D9EVYPzTESvI7VDzNOnTzFhwgRERUXBwsICiYmJ8lt0GxsbY+PGjYiKisK8eRXfppuI\niIiosrTU3WHVqlVISEjAnj17cODAATz/aZQgCOjTp4/SVUVEREREmqZ2iDl9+jQmTJgAFxeXUp/U\namdnh7i4OI0UR0RERFQWtUNMdnY2LCwsyt1eVFT0QkURERERVUTtEOPg4IArV66Uuf3UqVNwdnZ+\noaKIiIiIKqJ2iBk7diwOHz6MoKAgZGVlydvj4uKwaNEiXLp0CRMmTNBokURERET/Van7xPj7+2Pj\nxo0Aip/gqq2tjcLCQgiCAF9fX0ydOlXjhb4MvCT15eOlv1WD81w1OM9Vh3OtebxPzP+bOXMmhg4d\niuPHj+PBgwcoKiqCnZ0devfuDTs7O03XSERERKRErRCTm5uLsLAwNG7cGC4uLnjnnXdeUllERERE\n5VNrTUytWrWwZMkSPnSRiIiIqp3aC3ubNm2K+Pj4l1ELERERkcrUDjGzZs3Crl27cOGC8qPgiYiI\niKqK2gt7Q0JCYGJiggkTJqBhw4Zo0KAB9PT0FPoIgoD169drrEgiIiKi/1I7xNy8eRMAYG1tjby8\nPNy/f1+pT2mPIyAiIiLSJLVDzMmTJ19GHURERERqUXtNDBEREdGrQO0zMao+odrGxkbtYoiIiIhU\npXaI6dGjh0prXngvGSIiInqZ1A4xX331lVKIKSwsxKNHj3Dw4EGYm5tj7NixGiuQiIiIqDRqh5hh\nw4aVuW3y5MkYOXIk0tP50C4iIiJ6uTS6sNfAwADDhg1DUFCQJoclIiIiUqLxq5OKioqQlJSk6WGJ\niIiIFKj9cVJZMjIycPHiRQQGBqJ58+aaGpaIiIioVGqHGGdn5zKvThJFETY2Nli6dOkLF0ZERERU\nHrVDzIwZM0oNMSYmJrCzs0Pnzp2hra2xEzxEREREpVI7bfj4+LyMOoiIiIjUovbC3vHjxyM8PLzM\n7efOncP48eNfqCgiIiKiiqgdYi5cuFDu1UfJycm4ePHiCxVFREREVJFKXWJd3mMHHjx4AENDw0oX\nRERERKQKldbE7N+/H/v375d/vXHjRoSEhCj1S09Px+3bt+Hh4aGxAomIiIhKo1KIyc3NRVpamvzr\n7Oxsha+B4rMz+vr6GDt2LKZNm6bZKomIiIj+Q6UQM3r0aIwePRpA8VOsFy1ahJ49e77UwoiIiIjK\no/Yl1idPnnwZdRARERGp5YXuSpeRkYGMjAwUFRUpbbOxsXmRoYmIiIjKVakQs3PnTgQFBSE2NrbM\nPpGRkZUuioiIiKgial9ivWvXLnz++eews7PDrFmzIIoiJkyYgPfffx9169aFs7Mzvvzyy5dRKxER\nEZGc2iFm+/bt6NKlC3744QeMHDkSANC9e3fMnj0bR48eRWZmJp49e6bxQomIiIiep3aIiYmJgaen\nJwBAR0cHAJCfnw8AMDIywltvvYWdO3dqsEQiIiIiZWqHGCMjIxQWFgIAateuDX19fcTHx8u3Gxoa\nlvtYAiIiIiJNUDvENG3aFLdu3ZJ/3bp1a+zatQsJCQl4/Pgx9uzZA3t7e03WSERERKRE7RAzaNAg\n3L17F3l5eQAAHx8fREVFwcPDAz169EB0dDRmzZql8UKJiIiInieIoii+6CCxsbE4efIkZDIZOnfu\njEaNGmmitpcuMTG9ukuo8SwsjDjPVYDzXDU4z1WHc615FhZG1V2Cxr3Qze5K2NraYsKECZoYioiI\niEgllQ4xZ8+exYULF5CcnIyJEyfCwcEBGRkZuH79Opo1awZTU1NN1klERESkQO0Qk5WVhRkzZuDc\nuXPQ0tJCUVER3nzzTTg4OEBXVxcffvghRo0aBV9f35dRLxERERGASoSYb775BpcvX8Y333yDdu3a\noVu3bvJturq66NevH06fPs0QQ0RywcFbcefObdy+HYnHj+NgZWWDvXsPltr32LGj+Ouvs7h9OxJJ\nSYkwMTFF06aOGD/+XTRv3vKFxgaAmJgH2LhxHa5du4qCgnw4OjrjvfemoG3b9iq9Fh+fKbh27UqZ\n293c3PHNN/4AgKtXL8PXd2q5423cGIiWLV1UOjYRKVI7xPzyyy8YN24c+vfvj5SUFKXtjRo1wuHD\nh9Ua09/fHyEhIUhMTESXLl2go6ODyMhIPH36FMbGxmjXrh3mzJmDhg0bKuwXEhKCwMBAxMXFwdbW\nFlOnTsWgQYNUOuadO3eQnJyhVp2kvpSU2pznKvCqzrO9fWPIZDJ8//0GmJiYwNHRGRkZZdeZl5eH\nL75YiqZNndCrV19YW9vg6dMkHDiwD1OnvovFiz9Hnz79FPZRdWwAePToIaZOfRc6OtoYN24CDA0N\ncejQAcyZMxNr1qxHu3ZuFb6mCRPeg5fXUKX23347hvDwP9G587+/2NnbN8KSJcuU+ubn5+Hrr7+E\nqakZmjVrUeExiah0aoeY1NTUcu8DU1RUJL/8WhU3btyAv78/5s6dC3d3d5iZmWHDhg344IMPUL9+\nfSQmJiIgIADvvPMODh8+jNq1awMAQkNDsXTpUrz//vtwd3fHmTNnMG/ePBgaGqJnz54VHtd7wU4Y\nmNRTuU4iUk9W6hN8+9EgODg0RUjIQVhbFz/Zfvz4UcjOzil1H5lMBn//79G6dRuFdi+vIfD2Honv\nvvNTCjGqjg0AAQH+yMzMwJYtO+Dg0AQA0Lfvm/D2HolvvlmJHTt+qvB1tW/fodT2bdt+gI6ODvr0\n6S9vMzMzV6oXAE6cOIaioiL06/cmZDJZhcckotKpHWJsbW0Vbnb3X+Hh4WjcuLHK40VFRUEQBIwZ\nMwaGhoYAgOXLlyv0adGiBfr27Ytz586hV69eAIrP3gwePBizZ88GAHTq1AlxcXFYu3atSiHGwKQe\napvVV7lOIqq8kpBREZlMphRggOIw4OraFmfOnEZKSjLMzMzVHjsnJwd//vkH2rZtLw8wAKCvr4+B\nAwdjy5bvcevWTTg7N1dpvOdFRFxFTMwD9O7dD0ZGFV/GevjwAQiCgIEDB6t9LCL6l9o3uxs+fDh+\n+uknnDhxQt4mCALy8/Ph7++P06dPY9SoUSqNtWDBAsybNw8A0K5dOzRr1gwXL15U6mdiYgIA8jM8\nOTk5ePDgATp27KjQr3Pnzrh37x4eP36s7ssiolfckydPoK2tg9q1K3evi3v37iI/Pw8tWrRS2tai\nRSuIoojIyJuVGjs09KDKoeTx4zhcvXoZLi6usLW1q9TxiKiY2mdi3n33Xdy7dw8zZ86UX0b98ccf\n49mzZ8jLy8OIESNUDjHTp0+HlZUVAgICEBwcjFq1asHBwQEAIIoiCgsLkZCQgLVr16JBgwbw8PAA\nUBxmRFGUP4CyRMnXUVFRsLa2VvelEdErKjz8LCIj/0H//gOV/t6r6unTRABA3boWStssLIo/Wk5M\nfKL2uFlZmTh9+jdYW9uotDg4NLR40bGX1xC1j0VEitQOMYIgYPny5Rg+fDh++eUXxMTEoKioCHZ2\ndujbty/c3d1VHsvW1hZ2dsW/ibRs2RL6+vrybZ9++in27NkDALCzs8OWLVtgYGAAADA2NoaJiQlu\n3LiBAQMGyPeJiIgAULxuh4hqhtjYGCxbthT16llixowPKj1OTk7xWhldXV2lbSVtubllr6cpy6+/\n/oKcnByVzsIUFRUhLCwUBgaG8PCo+GNvIiqfSiFm8+bN6NGjh/wsCQC0b98e7durdkliZUybNg0j\nRoxAXFwctmzZgokTJ2Lv3r0wNy/+LHz06NEIDg5GmzZt5At7Dx06BKA4aBGR9MXFPcIHH0yDTCbD\n6tXrYGJS+Zto6unpAUCpFx6UtNWqpaf2uEeOHIRMJkP//l4V9j1//i8kJj7BkCFvoVatWmofi4gU\nqRRi1qxZAysrK3mIefbsGXr06IFNmzbBza3iSxIrw8rKClZWVmjZsiU6deqEHj16YMeOHfDx8QFQ\nHHJiYmLg6+sLURRhamoKX19ffP3117CwUD5dTERVz9y8ttLzWrS1ZZDJhAqf4/Lw4UPMnj0deXm5\n2LZtG5ydnSs8XnljN2lSfIuGrKxUpe137xZfmt2okW2FdT2//c6dO7h1KxKenp5o1qziZ8YdP34U\ngiDA2/vtGvkcG03jHFFFKvXYAVEUkZWVhYKCAk3XU6ratWvD1tYWsbGx8jY9PT34+flhyZIlSE5O\nhp2dHU6dOgUdHR00b67+1QVEpHnJyRlKD/ErKChEYaFY7sP9Hj+Og4/PFGRnZ2Ht2g2oU6e+Sg8D\nLG9sc3Mb6Ojo4sKFS0rb//zzPARBQIMGjcs9zn8fSrht2w4IgoA+fQZWWF9KSgpOnTqFJk0cUa+e\nHR9uWAE+AFLzamIoVPvqpOqQnJyM6Oho2NraKm0zNzdHkyZNoK2tjd27d6Nfv37yS7WJSHri4x/D\n13cqMjMz8c0336FpUyeNjKuvr4/Onbvi6tXLiIq6J2/PysrC4cMHYGtrp3DjuYKCAsTE/A8JCfGl\njpefn4/jx8NgZmaOTp26VHj8sLBQFBYW8rJqIg3SyFOsNWnr1q14+PAh2rdvjzp16iA2Nhbbtm2D\nnp4eRo4cKe93+vRpPHr0CA4ODnj69Cn27t2L6OhorFy5shqrJ6LSHDt2FPHxjyGKIp49e4aCggJs\n2xYIALCyskbfvsUL9LOysuDjMxUJCfEYPnwUHjyIxoMH0Qpjubm9ATMzM7XHBoCpU2fiypWLmD17\nBkaOfBuGhrVx6NDPePo0CatWfatwnKSkRIwdOwJt2rTDunUBSq/pzJlTSEtLw7hx70BLq+LfB48c\nOQhdXV2Fm+ER0YtROcQ8fvxYfpO79PTiU3wPHz4s88Z3qnx+XdZ+Z86cQVhYGDIzM2FlZQV3d3dM\nnz4dlpaW8n4ymQx79uxBbGwsdHV10bVrV6xYsQL16vEuvESvmtDQg4iIuKrQFhi4CQDg6tpWHjTS\n0lKRkFB8n6d9+/aUOta6dQEKIUbVsQGgfv0G2LgxEBs3+mPHjh9RUJAPJ6dm+OYb/1IvjxYEocwL\nBY4cOQQtLS28+WbFjzq5ceNvxMbGoHfvfvK7jhPRixNEURQr6uTs7Kz0F1kUxVL/cpe0R0ZGaq7K\nl8B9+Kd87ADRS/T8YwdqCq7TqDqca82riWtiVDoT89/HANQEwcvHvJIPzKtpzM1fzQcT1jSv6jzb\n26v+CBIiInWpdCampmLKf/n421TV4DxXDc5z1eFca15NPBMjiauTiIiIiP6LIYaIiIgkiSGGiIiI\nJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgk\niSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJ\nIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkh\nhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGG\niIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCSJIYaI\niIgkiSGGiIiIJIkhhoiIiCRJu7oLAAB/f3+EhIQgMTERQ4YMwfLly3Hx4kWsW7cON27cgLa2Npyd\nnbFq1SpYWVkp7X/z5k0MHz4cpqamCA8PV+mYd+7cQXJyhqZfCv1HSkrtGjfP9vaNsW1bILZu3Vxm\nH21tbZw69e978eTJE9i7dyfu3bsLQdBC06aOGDfuHXTs2Flp3+Dgrbhz5zZu347E48dxsLKywd69\nB9WqsaCgADt3/ohffw1DXNwj6Ovro02bdnj//emws7NX6Hv16mX4+k4tdZxOnbpg5Uq/So9NRPQy\nVXuIuXHjBvz9/TF37ly4u7vDzMwMZ86cwfTp0zFmzBjMmDEDubm5uHz5MnJzc0sdY9myZahTpw4K\nCwtVPq73gp0wMKmnqZdBr4ms1Cf49qNB8PDogQYN7JS237t3B7t2BaNz527ytu3bg7Bp03dwdHTG\n5MnTAADHjoVh3rzZWLLkc/Tu3U9hjO+/3wATExM4OjojI6NyAXD+/Dm4cOEcunXzwFtvjcKzZ8+w\nf/9eTJnyLgICtqBhQ3ulfQYPHgYXlzYKbfXqKf8dqczYREQvQ7WHmKioKAiCgDFjxsDQ0BAFBQXw\n9vbG5MmT8cEHH8j7de/evdT9Dxw4gOTkZAwfPhwhISEqH9fApB5qm9V/4frp9dS4cRM0btxEqf3a\ntcsQBAEDBw4GAKSkJGPLlu/h4NAU338fBJlMBgAYPnwU3n13LNauXYXOnbvBwMBAPkZIyEFYW9sA\nAMaPH4Xs7By1ajtz5jTOnw/H4MHD8eGH8+Xtffr0x/jxo7B27Sr4+X2ntF+LFq3Qp08/pXZNjE1E\n9DJU65qYBQsWYN68eQCAdu3aoVmzZvjzzz+RkJCAMWPGVLh/ZmYm1qxZg48//hg6Ojovu1yicuXk\n5ODkyeOwsKgHd/eOAIDr1/9Gfn4+evfuKw8wACCTydCrVz+kp6fj7NnfFcYpCTCVdfVqcZAaMGCg\nQruNTX24uLTB5csX8eRJQpmvIS8v76WMTUSkadUaYqZPn45p04pPrwcHB2PPnj3466+/YGpqimvX\nrqFv375o0aIFvLy8cOrUKaX9/f390aRJE/Ts2bOqSydScvLkcWRmZmLAAC8IggAAyM8vDgR6enpK\n/fX09CCKIv7557pG66jomABw8+YNpW3ffrsGvXt3Rc+enfH228Owd+9ujY1NRPQyVGuIsbW1hZ1d\n8bqCli1bwsXFBdnZ2cjKysInn3yCSZMm4YcffkCTJk3g4+ODu3fvyve9f/8+du3ahUWLFlVX+UQK\nQkMPQktLC2++OUje1qhRYwDA5cuXlPpfuXIRADR+5qJRo8YQRVHpmLm5OfKA8fwxtbW10aVLd0yf\n7ouVK/3w0UcLYWRkjHXr1mD58s9faGwiopep2tfElCYvLw+LFi3CiBEjAADu7u7o378/AgMDsWLF\nCgDAV199heHDh6NJE+V1CURVLSbmAa5fj4CbmzusrKzl7Y0bN4GbmzvOnv0dGzaskwecI0cO4fz5\ncAiCgJwc9da8VKRPnwHYtm0LAgMDoKenh/btO+DZsxQEBm5CWloqACgcs1Wr1li+vLXCGIMGDcXc\nub4ICwuFl9cQtGzpUqmxiYheplcuxBgbGwMAOnToIG/T0tKCm5sbIiMjAQC///47rly5gqVLlyI9\nPR1A8T+coigiPT0dtWrVgq6ubtUXT68Fc/PasLAwUmjbuvXo/y9QH620bcMGfyxatAh79uzArl3B\nAIAGDRpg6dKlWLx4MczNTZX2KaGtLYNMJpS5/XklfSwsjPDjj9swb948rFr1FURRhCAIcHNzw+TJ\nk7FhwwZYWtapcEwfn+nw9g7HtWsX4OnZWaNjS1lNfm2vGs41VeSVCzEODg4AAFEUFdpL/rEEgP/9\n73/Izs5G7969lfbv0KEDPvjgA0ydWvp9L4heVHJyBhIT0+VfFxYW4sCBAzA2NoGr6xsK20osWfIl\nZs78ELGxD6Cvb4CmTR0RHv4nAMDSsn6p+wBAQUEhCgvFMreXsLAwUuhjamqFTZu24dGjh0hKSkLd\nunVRv34DbNiwDoIgwNzcqsIx9fRMAQCPHz/R+NhS9d95ppeHc615NTEUvnIhpkuXLpDJZDh37hwa\nNWoEACgqKsLFixfxxhtvAAD69euH5s2bK+z3888/48SJE9i4cSPq1+el01R1zp49g+TkZIwcOQba\n2mX/lTIzM4OZmZn86/DwsxAEodQb3mlK/foNUL9+A/nX5879CQMDQ7i4tC5nr2KxsQ8AAObmdTQ+\nNhGRJrxyIcbCwgJjxozBmjVrUFRUhIYNG2LPnj1ISEjA5MmTAQCWlpawtLRU2O/8+fPQ1tZG+/bt\nq6Nseo0dOXIQgiAoLOityK1bNxEaeght2rRDq1aV/6H/9GkSMjMzYGTUtMK+P/20G9HR9/Hee1NQ\nq9a/VxelpaXC2NhEoW9+fj62bPkegiCgc+eulR6biOhleuVCDADMmzcPBgYGCAgIQGpqKpo3b47A\nwEDY2tpWd2lECpKSEnH+fDiaN2+Jxo0dSu3zww8BiI2NQfPmLWBoWBu3b99CWNhh1KtnicWLP1Pq\nf+zYUcTHP4Yoinj27BkKCgqwbVsgAMDKyhp9+w6Q9w0I8McvvxxBcHAw7O2d5e0fffQBbGzqw96+\nMQQBOH/+HM6e/R2dO3eFt/dEhePNneuDunUt4OTUDHXr1kViYiJ+/TUMjx49xFtvjYKzs+JZT3XG\nJiJ6mao9xAwdOhRDhw5VaJPJZJg1axZmzZql8jgzZ87EzJkzNV0eUbnCwkIhiqL8Dr2lcXR0xuXL\nF3Hp0nnk5OTA0tIKI0a8jXHjJsDQsLZS/9DQg4iIuKrQFhi4CQDg6tpWIcQIggAtLeU7JbRs6YKT\nJ48jLOwIAKBhQ3vMnTsfgwcPk68tK+Hp2Qt//HEa+/aFICMjHXp6+nB0dMKkSdPQs6fyujN1xiYi\nepkE8b8raF8T7sM/5bOTSG0lz05ycKj445uqxEWQVYPzXHU415rHhb01SPDyMTXu6cqvInPzmvkU\nayIiqn6vbYhxdHRkyq8C/G2KiIhelmp97AARERFRZTHEEBERkSQxxBAREZEkMcQQERGRJDHEEBER\nkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGR\nJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEk\nMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQx\nxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHE\nEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQ\nERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQxxBAREZEkMcQQERGRJDHEEBERkSQJoiiK\n1V0EERERkbp4JoaIiIgkiSGGiIiIJIkhhoiIiCSJIYaIiIgkiSGGiIiIJIkhhoiIiCTptQoxUVFR\nmDBhAlxdXdG1a1esW7cOvMJcdfv374ezs7PCf82aNcOePXsU+gUEBMDDwwOtW7fGuHHjcOvWLaWx\n+L34V0xMDD755BMMGjQIzZs3x/jx40vtp8l5VWWsmkaVee7Ro4fSe7xLly5K/TjPZTt69CimTJmC\nLl26oE2bNhg2bBiOHDmi1I/vZ9IE7eouoKqkpaXhnXfegaOjIzZu3IiYmBisWLECoijigw8+qO7y\nJEMQBPz444+oVauWvK1BgwbyP2/atAkBAQH4+OOP0ahRI2zduhXvvPMOjhw5gjp16gDg9+K/7t69\niz/++AOtW7dGYWFhqX00Oa+qjFUTqTLPAODl5QVvb2/51zo6OgrbOc/l+/HHH9GgQQMsXrwYZmZm\n+P333zF37lw8e/YMY8eOBcD3M2mQ+JoICAgQO3ToIGZmZsrbNm/eLLq6uooZGRnVWJl0/Pzzz6Kz\ns7OYlZVV6vbc3FyxXbt24oYNG+RtWVlZ4htvvCGuXbtW3sbvRdl8fHxEb29vhTZNzquqY9V0pc2z\nKIqip6enuHLlynL35TyXLyUlRaltzpw5Ys+ePUVR5PuZNOu1+Tjpjz/+QJcuXWBgYCBve/PNN5Gd\nnY2LFy9WY2U1x5UrV5CZmYl+/frJ2/T19eHp6YkzZ87I2/i9UI8m5/Xy5csqjUVl4zyXz9TUVKmt\nefPmePLkCQDV54bzTKp4bULM/fv30ahRI4U2a2tr6Ovr4/79+9VUlfSIoohevXqhRYsW6Nevn8J6\nmPv370Mmk8He3l5hHwcHB4U55vdCPZqc1+joaJXGep3t27cPLVu2RPv27eHr64u4uDiF7Zxn9V29\neiyCBRAAAAe7SURBVFU+F6rODeeZVPFarYkxNjZWajc2NkZqamo1VCQ9FhYWmDVrFlxcXFBYWIij\nR49i6dKlyMnJwYQJE5CWlgYDAwMIgqCwn7GxMXJyclBQUABtbW1+L9SkyXlVdazXVa9eveDq6gpL\nS0vcv38f69evx7hx43Do0CHUrl0bgGr/lnCe/xUeHo7ffvsNy5cvB8D3M2kWv7uksi5duihcqdG1\na1fk5OQgICAAEyZMqMbKiDRj4cKF8j+3a9cOrq6uGDJkCPbv36+w2JdU8/DhQ3z44Yfo3bs3hgwZ\nUt3lUA302nycZGxsjPT0dKX2tLQ0mJiYVENFNUPfvn3x7NkzPHr0CMbGxsjKylK6BDItLQ16enry\n34j4vVCPJudV1bGoWNOmTdGoUSPcvHlT3sZ5Vk1qaiomT56MBg0aYNWqVfJ2vp9Jk16bENO4cWOl\nz0jj4+ORnZ2Nxo0bV1NV0vf8adzGjRujsLAQDx48UOhz//59hTnm90I9mpxXVceif/33owrOc8Vy\ncnIwZcoUFBUVISAgQOGWDHw/kya9NiGmW7duOHv2LLKysuRtR44cgb6+Ptzc3KqxMmn75ZdfYGr6\nf+3dX0hT7wPH8ffUVuYUpxj90ZVFJdSKohsvSlN2UTS7KZl0Y1DdFItKQgpSiFIjyqCgMpEiqMCL\nKDCVxKJUgi4iIUiyyCFuUVjOvzDn9yLa77e0mnztK0c/Lxi453k85znP2cVnz55zTiJLlixh48aN\nxMXFUV9fH6ofGhqiubmZrKysUJnOxeRM5bhGui35rqOjg/fv37N27dpQmcb590ZHR3G73XR1dXHj\nxg2sVmtYvT7PMpWiS0tLS6e7E/+FlStXcu/ePV68eMGCBQtobW3lwoUL7N27l82bN0939wzB7XbT\n09NDf38/Hz584MqVK9TV1VFUVMS6deuIjo4Gvt89Mz4+noGBAcrKyvD5fFRUVBAbGwvoXPxseHiY\npqYm3r17R0tLC319fSQnJ9PZ2UlaWlroW+xUjGuk52gm+t04p6am0tLSwuXLlwkEAvT29vL06VNO\nnTpFUlISpaWlmM1mQOP8JyUlJTx69Ihjx46RkJCAz+cLvZKTk0PjqM+zTAXT2M8/Js5gnZ2dnD59\nmlevXhEfH09+fj6HDh0aN10sE7t48SKNjY14vV7GxsZYsWIFhYWFOJ3OsHbXrl3jzp07fP36Fbvd\nzsmTJ8nIyAhro3PxP93d3eTm5k547E1NTSxevBiY2nGNZFszzZ/G2e/3U15eztu3b+nr6yMxMZEt\nW7Zw5MgRUlJSwtprnH8tJyeHnp6eCev0eZapNqtCjIiIiMwcs2ZNjIiIiMwsCjEiIiJiSAoxIiIi\nYkgKMSIiImJICjEiIiJiSAoxIiIiYkgKMSIiImJICjEiMu2Ki4vJycmZ7m6IiMHoEZ8iMmkej4fq\n6mra2trwer1ERUWRmppKZmYmLpdr0g/fM5lMREXpO5WITI7u2Csik9Lc3MzRo0cxm804nU5WrVoF\nfL9FfENDA58/f6atrY34+PiItzk6OkowGGTOnDl/q9siMgMpxIhIxDweD3l5edhsNmpqakhKSgqr\nDwQC1NTUUFBQgMVimaZeishsoflbEYlYVVUVw8PDnD17dlyAAYiJiWH//v2hAPPy5UsOHz7M1q1b\nsdvtZGdnU1ZWxsjISNj//bwmpru7m4yMDG7dusXdu3dxOBzY7XZ27dpFe3v73z1IETEMrYkRkYg9\nefIEm83GmjVrImpfX1/PyMgIBQUFWK1WXr9+ze3bt/H5fFRWVobamUymCZ8uff/+fYaGhnC5XJhM\nJqqqqnC73Tx+/Jjo6OgpOy4RMSaFGBGJSH9/P58+fcLhcIyr8/v9BAKB0Pu4uDjMZjPHjx/HbDaH\nynfv3k1aWhqVlZV4vV4WLlz42336fD4aGxuJi4sDYNmyZRw8eJDnz5+TlZU1RUcmIkaln5NEJCID\nAwMAzJ8/f1xdfn4+mZmZoVddXR1AWIAZGhqit7eXDRs2EAwGefPmzR/3uWPHjlCAAdi0aRNjY2N4\nPJ5/ezgiMgNoJkZEIvIjTAwODo6rKy8vZ3BwEI/HQ0lJSai8p6eHS5cu0dzczLdv30LlJpMJv9//\nx30uWrQo7H1CQgJA2LZEZPZSiBGRiFgsFlJSUujo6BhXt379egCsVis/LngMBoMUFhbi9/s5cOAA\n6enpxMbG4vP5KC4uJpILI3917xhdVCkioBAjIpOQnZ1NbW0t7e3t2O3237bt6Ojg48ePnDt3jry8\nvFB5a2vr3+6miMwSWhMjIhHbt28f8+bN48SJE3z58mVcfTAYDP39Yxbl/8sAbt68OeGVSCIik6WZ\nGBGJ2NKlSzl//jxFRUVs27YNp9PJ6tWrCQaDdHV18fDhQ2JiYkhJSWH58uXYbDYqKirw+XxYLBYa\nGhoiWgsjIhIJhRgRmZTc3FwePHhAdXU1z549o7a2NvTspO3bt+NyuUhPTwfg6tWrnDlzhuvXrzN3\n7lwcDgd79uxh586d47b78+zMr+4d86tyEZl99NgBERERMSStiRERERFDUogRERERQ1KIEREREUNS\niBERERFDUogRERERQ1KIEREREUNSiBERERFDUogRERERQ1KIEREREUNSiBERERFD+gdYy+6Iu9V2\nAQAAAABJRU5ErkJggg==\n", 273 | "text/plain": [ 274 | "" 275 | ] 276 | }, 277 | "metadata": {}, 278 | "output_type": "display_data" 279 | } 280 | ], 281 | "source": [ 282 | "xgb.plot_importance(bst, importance_type='gain', xlabel='Gain')" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "We can simplify it a little bit by introducing a *F-score* metric.\n", 290 | "\n", 291 | "> **F-score** - sums up how many times a split was performed on each feature. " 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 7, 297 | "metadata": { 298 | "collapsed": false 299 | }, 300 | "outputs": [ 301 | { 302 | "data": { 303 | "text/plain": [ 304 | "" 305 | ] 306 | }, 307 | "execution_count": 7, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | }, 311 | { 312 | "data": { 313 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAArEAAAGBCAYAAABxSOkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt8zvX/x/HnZTOGbYyxjc1hsSGzMBJy/k6+IeQ0pxQp\n50MlpPp2ovBdsm/4yqHIYSopUYkk5Zi+i29E+NqY82Y25rDt8/ujm+vnage71rbr+szjfru55Xp/\n3p/P9frsvc96eu99fT4WwzAMAQAAACZSwtEFAAAAAPYixAIAAMB0CLEAAAAwHUIsAAAATIcQCwAA\nANMhxAIAAMB0CLEA8i0jI0MhISGaNm2ao0sBANxlXB1dAOBsdu/erUGDBmW7zWKxaPXq1QoNDS3U\nGpYuXary5cvrkUceKdT3KQgWi0UWi8XRZRSYnTt3au/evRoyZIjKli3r6HIAADkgxAI5ePjhh9W6\ndess7YGBgYX+3kuWLFFQUJDTh1gXFxfFxsbK1bX4/CjZuXOnFixYoF69ehFiAcCJFZ//8wAFrH79\n+urSpYujyyhwN2/elGEYcnNzK5DjFdRxHO3KlSsqW7aseIghAJgDa2KBv2j9+vXq16+fGjVqpLCw\nMPXp00ebNm3K0u/zzz/XU089pbZt26pBgwZq3ry5Ro8erSNHjlj73Fpjeu7cOf34448KCQlRSEiI\n6tatq7Nnz+a6BnXNmjUKCQnRvn37rG1RUVEKCQnRsWPH9Prrr+vBBx9UWFiYDhw4YO3z/fff6/HH\nH1eTJk0UGhqqbt26KSYmJk/nnl09t7ft2LFDffr0UVhYmNq0aaNFixZJkpKTkzV58mQ1b95cYWFh\nGjFihC5cuGBz7Ntrf+WVV9SyZUs1bNhQffv21a5du7KtZ9WqVerevbsaNmyo8PBwPfHEE/r5559z\nrPnHH39Uv379dN9992n06NF69tlntWDBAklS69atrV//+fPnS5LOnj2r6dOnq1u3bgoPD1doaKge\nfvhhLVq0SJmZmdmOx969e7Vw4UJ16NBBDRo0UKdOnbRu3bps69+xY4eGDh2qZs2aKTQ0VB06dNC0\nadN0+fJlm355/Z4DgOKMmVggB2lpaUpKSrJpc3Nzs/kV86xZs/Tee++pTZs2GjdunEqUKKGvvvpK\no0eP1j/+8Q/16dPH2vfDDz+Uj4+P+vbtq4oVKyouLk6rV69Wv379tHbtWgUEBMjFxUUzZ87Ua6+9\npsqVK2v48OHWmcHy5cvfseY/r029tV51woQJKlOmjJ544glZLBZVqlRJkrRixQq9+uqratSokUaO\nHKnSpUtr+/btevHFF3Xq1CmNHz8+31+//fv3a9OmTerTp48eeeQRbdiwQbNmzVLp0qUVExOjGjVq\naMyYMfrf//6n5cuXa/LkyVq4cGGW2p955hm5ubnpySefVEpKilatWqUnnnhCixcvVtOmTa39Z8yY\noaVLl+q+++7TxIkTlZKSotWrV2vgwIH697//rQceeMCmvv/85z/auHGjevXqpR49eqhEiRIKCgrS\n1atXtWXLFk2bNk2enp6SpLp160qSDh48qC1btqhDhw4KDAzUzZs39d1332nmzJlKSEjI9h8XM2fO\n1I0bNxQZGSlXV1etWLFCzz//vGrWrGmztvrDDz/Ua6+9Jj8/P0VGRsrf318JCQnasmWLzp07Z63F\nnu85ACjWDAA2du3aZQQHBxshISFGcHCwzZ8JEyZY+8XGxhrBwcHG3Llzsxxj+PDhRnh4uJGWlmZt\nu/3vtxw5csSoX7++8dprr9m0P/jgg8aQIUOy9E9PTzeCg4ONF154Icu2mJgYIyQkxPjpp5+sbVFR\nUUZwcLAxZMgQIzMz06b/mTNnjHvvvdd4/vnnsxzrH//4h1G/fn0jISEhy7Y71XOrrV69esavv/5q\nbb9+/brRvHlzIyQkxJgxY4bNcV599VUjJCTEiIuLy1J7v379jPT0dGv7qVOnjLCwMKNLly7Wtt9/\n/90IDg42Bg4caNP3zJkzRqNGjYyOHTtmqS8kJMTYs2dPlnOKiooyQkJCjDNnzmTZdv369Wy/DhMm\nTDDq169vXLx40doWExNjBAcHGz179rSpKSEhwahfv77x3HPP2ZxT/fr1ja5duxpXrlzJ9j0Mw/7v\nOQAozlhOAOSgd+/eWrJkic2fp59+2rr9s88+U4kSJdStWzclJSXZ/Gnbtq1SUlIUGxtr7V+6dGnr\n31NTU5WUlCRvb29Vr15dv/zyS6Gdh8Vi0eDBg7PM0m7cuFHp6enq0aNHlvrbtGmj9PR07dixI9/v\n27hxY+sMpvTHLHaDBg0kSQMHDrTp26RJE0nSiRMnstQ+ZMgQubi4WNv8/f3VuXNnHTlyRHFxcZKk\nTZs2yWKxaNiwYTZ9q1Spou7duys+Pl6HDh2yOXb9+vWt75tXt6//vXnzppKTk5WUlKQWLVooIyND\n//3vf7PUP2DAAJua/Pz8FBgYaHOuGzZsUEZGhkaPHq0yZcrk+P72fs8BQHHGcgIgBzVq1FDz5s1z\n3H78+HFlZmaqY8eO2W63WCy6ePGi9fWBAwc0Z84c7d27V2lpaVneqzBld/xjx47JMIwsgfIWi8WS\nZZ2qPapVq5alzdPTUyVKlJC/v79Nu5eXlwzD0KVLl7LsU6tWrSxt99xzjyQpPj5egYGBOnXqlE17\ndn1PnjypkJAQa3t+vubp6emaP3++Pv/8c8XFxdl8CMxisWRZuypl/3UoX768zffGrTB+e33Zsfd7\nDgCKM0IskE+GYcjV1dVmHeef1alTR5J06tQpDRw4UOXLl9fo0aNVvXp164zbq6++qoyMjDy9Z273\nY83tGLfPAt9ev8Vi0ezZs1WhQoVs96tevXqe6srO7bOPt8vtHIwivDOAu7u73fu89tprWrVqlbp0\n6aIRI0bI29tbrq6u2r9/v6KiorJ8uEuSSpQouF942fM9BwDFHSEWyKfq1atrx44dqlq16h3vHfv1\n11/r2rVrmj17tho1amSzLSkpyfqhnVtyCnolSpSQh4eHkpOTs2y7NZtnT/2SVKFChVxnnB3t6NGj\nCgoKsmm7dUeHgIAAm/8eOXJEfn5+Nn1///13SdnPiGYnt5D9+eefq3nz5po5c2aWGv+KW7PChw4d\nyrVOe77nAKC4Y00skE/dunWTYRiaPXt2tjNwt/9a99Zs3J/7rVixIssdECSpTJky2f5qXfojyOzb\nt083btywtiUlJeV426acdO7cWa6urpozZ47NsW5JSUnRzZs37TpmQTMMQ0uWLFF6erq17dSpU9q4\ncaNq165tDXLt27eXYRhatGiRzYz02bNn9emnnyowMPCOv6q/5dYMeXb/UHBxcckyW5yamqoPPvjA\n7nO7XadOneTi4qK5c+fqypUrOfaz53sOAIo7ZmKBfLp1f9N58+ape/fuioiIkI+Pj86fP6/9+/dr\nx44d+s9//iNJatOmjaKiovTMM88oMjJSHh4e+umnn/TDDz9kO/PWsGFDrVu3TnPnzlXNmjVVokQJ\ndejQQW5ubhowYICef/55DRo0SF26dFFycrLWrFmjgIAAJSYm5rl+f39/vfjii3r55ZfVuXNnde3a\nVX5+fkpKStKhQ4f07bff6ssvv1SVKlUK7GuWHzdu3FD//v3VuXNn6y22bt68qalTp1r7BAUFaciQ\nIVq6dKkGDBighx56yHqLrevXr+ull17K8/uFhYXJMAy9+eab6tKli9zc3BQcHKygoCD97W9/08cf\nf6yJEyfq/vvv17lz5/TJJ5/I29tb8fHxWY6V1+UR/v7+ev755/X666+rS5cueuSRR+Tn56czZ85o\n8+bNmjVrlu655x67vucAoLgjxALZuHWP0jsZM2aMGjRooOXLl+v9999XWlqaKlWqpNq1a+uFF16w\n9qtevboWLlyoqKgoLViwQK6urmrUqJGWL1+uadOmZZlBmzhxoq5cuaLly5crJSVFhmFo69atqlKl\nih555BGdP39eK1eu1IwZM1S9enWNGzdON27csPuT6b169VJQUJAWL16sVatWKSUlRRUqVFCtWrU0\nfvx4eXt75+trldev35/3ya5t1qxZWr58uRYuXKiUlBTVrVtXs2fPVrNmzWz6Tpo0STVr1tTKlSs1\ne/ZslSxZUmFhYRo1apTCwsLyXF94eLgmTJigmJgYvfDCC8rIyNDYsWMVFBSkqVOnysPDQ1999ZW+\n+eYb+fn5qX///goODtbQoUPzdE45GTBggGrUqKHFixdr2bJlunnzpipXrqwHHnjA5h8Sef2eA4Di\nzmIU5ScpACCP3n77bS1YsMAa3gEAuB1rYgEAAGA6hFgAAACYDiEWAAAApsOaWAAAAJjOXXt3gvT0\nDCUlXXV0GbhNhQplGBMnw5g4J8bF+ZhhTHx8PBxdAlCg7trlBK6u2T8SE47DmDgfxsQ5MS7OhzEB\nit5dG2IBAABgXoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDp\nEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIB\nAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABg\nOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRY\nAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAA\nmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4h\nFgAAAKZDiAUAAIDpuDq6AEmKjo5WTEyMLly4oIceekgeHh7asWOHzp49q0qVKqlt27YaM2aMPDw8\nbPaLiYnRokWLlJCQoICAAD311FPq2rVrnt7z8OHDSkxMLYzTQT4lJZVjTJwMY+KcGBfn4+3d0NEl\nAKbyv//9T+vWrdOPP/6ouLg4Xb9+XYGBgerUqZMGDx4sd3f3Ox7DYhiGUQS15ujAgQN69NFHNXHi\nRDVt2lS//PKLPvnkEz366KOqU6eO4uPjFRUVJT8/P8XExFj3W79+vZ599lk9+eSTatasmbZt26b3\n339f0dHRat++/R3ft1nPl1XGq3JhnhoA4C5wNfmclk2PVIUKfo4uJVc+Ph537gQUkdmzZ2vFihVq\n166dwsLC5Orqql27dmnDhg0KCQlRTEyM3Nzccj2Gw2dijx49KovFosjISJUtW1Y1atTQwIEDrdvD\nw8NVpUoVDR06VHv37lWTJk0k/TF7261bN40fP16S9MADDyghIUFvv/12nkJsGa/KKlehauGcFAAA\nAHLUqVMnDR8+XOXKlbO29enTR4GBgVqwYIHWrFmj/v3753oMh66JnTx5siZNmiRJaty4serWravD\nhw9n6Ve3bl0ZhqFz585Jkq5du6YTJ06oefPmNv1atGih33//XadPny784gEAAJAv9evXtwmwt3Tu\n3FmGYejIkSN3PIZDZ2JHjBghX19fzZ8/X8uWLVOpUqUUFBSUpd/PP/8si8WiGjVqSJJu3LghwzBU\nsmRJm363Xh89elR+fs79ax0AAADYujURWalSpTv2dehMbEBAgAIDAyVJ9957r0JDQ1W2bFmbPteu\nXdOsWbPUtGlT1atXT5Lk6ekpLy8vHThwwKZvbGysJCk5ObkIqgcAAEBByczM1Lx58+Tq6qqHH374\njv0dvib2TqZMmaKkpCS99957Nu19+/bVsmXLdN9991k/2PXZZ59JkiwWiyNKBQAAQD69/vrrio2N\n1YQJE6y/fc+NU4fYt956S5s3b9aSJUtUtarth7CefvppxcXFacyYMTIMQ+XLl9eYMWP01ltvycfH\nx0EVAwDuVnz6H8i/t99+Wx9++KH69u2rYcOG5Wkfpw2xS5cu1dKlSxUVFaVGjRpl2V66dGlFRUVp\n2rRpSkxMVGBgoL799luVLFnSuuwAAICicv58iqNLyBUhG85q7ty5mj9/vh599FG9/PLLed7PKUPs\nZ599pjfffFNTpkxRRERErn29vb3l7e2tzMxMrVq1Sp06dcqyrhYAAADOZ+7cufrXv/6lHj166LXX\nXrNrX6cLsbt379aUKVPUsmVLhYaGWj+sJUm+vr6qUqWKJGnr1q06deqUgoKCdPHiRa1Zs0bHjx/X\nm2++6ajSAQAAkEfR0dH617/+pe7du+uNN96we3+nDLEZGRnavn27tm/fbrNt5MiRGjVqlCTJxcVF\nq1evVnx8vNzc3NSqVSvNmDFDlSvzFC4AAABn9uGHHyo6Olr+/v5q1qyZ9cP5t1SqVEkPPPBArsdw\n+GNnHYXHzgIACgKPnQXsN3nyZH366ac5bg8PD9cHH3yQ6zHu2hB7+PBhJSamOroM3Mbbuxxj4mQY\nE+fEuDif8PCGSky86ugyckWIRXHjdMsJikqdOnWc/pOkdxsfHw/GxMkwJs6JcXE+Li4uji4BuOs4\n9IldAAAAQH4QYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOI\nBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAA\ngOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQ\nYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEA\nAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6\nhFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6ro4uQJKio6MVExOj8+fPq2XLlipZsqQOHjyoixcv\nytPTU40bN9aECRNUvXp1m/1iYmK0aNEiJSQkKCAgQE899ZS6du3qoLMAANztli1bosOHf9Nvvx3U\n6dMJ8vX115o16xxdFlAsOTzEHjhwQNHR0Zo4caKaNWumChUq6N1339XYsWNVtWpVnT9/XvPnz9dj\njz2mzz//XOXKlZMkrV+/Xi+99JKefPJJNWvWTNu2bdOkSZNUtmxZtW/f/o7ve/jwYSUmphb26cEO\nSUnlGBMn4+3d0NElAKby73+/Ky8vL9WpE6LUVH6eAYXJ4SH26NGjslgsioyMVNmyZSVJ06dPt+lT\nv359RUREaOfOnerQoYOkP2Zvu3XrpvHjx0uSHnjgASUkJOjtt9/OU4gdOHmFynhVLuCzAYqPq8nn\ntGx6OVWo4OfoUgDTiIlZJz8/f0nSoEF9lJZ2zcEVAcWXQ0Ps5MmTtXbtWlksFjVu3FgWi0UffPCB\nwsPDbfp5eXlJkm7cuCFJunbtmk6cOKGnn37apl+LFi20adMmnT59Wn5+uf+Pt4xXZZWrULUAzwYA\ncLe7FWABFD6HhtgRI0bI19dX8+fP17Jly1SqVCkFBQVJkgzDUEZGhs6ePau3335b1apVU5s2bST9\nEWYNw1DJkiVtjnfr9dGjR+8YYgEAAGBeDg2xAQEBCgwMlCTde++9cnd3t257+eWXtXr1aklSYGCg\nFi9erDJlykiSPD095eXlpQMHDqhz587WfWJjYyVJycnJRXUKAAAAcIACucVWWlqaPv74Y61YsUKn\nTp0qiEPq6aef1kcffaR33nlH3t7eGjJkiBITE63b+/btq1WrVmnTpk26fPmy1q9fr88++0ySZLFY\nCqQGAAAAOCe7Z2KnTJmi2NhYffHFF5Kkmzdvql+/fjp06JAkycPDQ++//77q1av3lwrz9fWVr6+v\n7r33Xj3wwANq166dPvzwQ40ePVrSHyE3Li5OY8aMkWEYKl++vMaMGaO33npLPj4+f+m9AQAA4Nzs\nDrG7du3Sww8/bH29YcMGHTp0SG+//baCg4M1atQoRUdH69133y2wIsuVK6eAgADFx8db20qXLq2o\nqChNmzZNiYmJCgwM1LfffquSJUv+5QAN4P/5+Hg4ugRkg3FxPn8eE1dXF7m4WBgroJDYHWIvXLig\natWqWV9v2bJFDRo0UKdOnSRJvXr10nvvvVdwFUpKTEzU8ePHrR/sup23t7e8vb2VmZmpVatWqVOn\nTtZbdQH4686fT3F0CfgTHx8PxsXJZDcm6ekZysgwnGasCNMobuwOse7u7kpLS5P0xx0Edu7cqcjI\nSJvtKSn5v2CXLFmikydPqkmTJqpYsaLi4+P1/vvvq3Tp0urdu7e139atW3Xq1CkFBQXp4sWLWrNm\njY4fP64333wz3+8NAAAAc7A7xNavX1/r1q1T165d9fXXX+vy5ctq27atdXtcXJwqVqyY74JCQkK0\nbds2bdy4UVeuXJGvr6+aNWumESNGqEqVKtZ+Li4uWr16teLj4+Xm5qZWrVppxowZqlyZBxgAABzj\nq6826MyZ0zIMQ5cuXVJ6erref3+RJMnX108REZ3vcAQAeWUxDMOwZ4f9+/dr6NChunz5sgzDUERE\nhObMmWPdHhERoQYNGmjWrFkFXmxBavv4uzzsAMhFatIpLXi+A0/sckIsJ3A+t8Zk9Ojhio39Ods+\nYWGN9M4784u4sv/HcgIUN3bPxDZo0EAbN27Uvn375OnpqaZNm1q3Xb58WZGRkTZtzupq8jlHlwA4\nNa4RwH5z5y5wdAnAXcPumdji4vDhw0pMTHV0GbiNt3c5xsTJhIc3VGLiVUeXgT9hJtb5mGFMmIlF\ncZOvJ3bduHFDa9eu1e7du5WYmKhnn31W9erVU3JysjZt2qQWLVo4/WNf69Sp4/Q/cO42ZvifwN3G\nxcXF0SUAAJAtu0PsxYsXNXjwYB09elQ+Pj46f/689TGvnp6emjdvno4ePapJkyYVeLEAAACAlI/H\nzs6cOVNnz57V6tWr9emnn+r21QgWi0V/+9vftH379gItEgAAALid3SF269atGjx4sEJDQ2WxWLJs\nDwwMVEJCQoEUBwAAAGTH7hCblpYmHx+fXLdnZmb+paIAAACA3NgdYoOCgrRv374ct3/77bcKCQn5\nS0UBAAAAubE7xPbv31+ff/65li5dqqtX///WOwkJCZo6dar27t2rwYMHF2iRAAAAwO3ydZ/Y6Oho\nzZs3T5KUkZEhV1dXZWRkyGKxaMyYMXrqqacKvNDCwO2cnAu32HI+jIlzYlycjxnGhPvEorjJ131i\nR40ape7du2vTpk06ceKEMjMzFRgYqI4dOyowMLCgawQAAABs2BVir1+/ro0bN6pWrVoKDQ3VY489\nVkhlAQAAADmza01sqVKlNG3aNB08eLCw6gEAAADuyO4PdtWuXVtnzpwpjFoAAACAPLE7xI4bN04r\nV67U7t27C6MeAAAA4I7s/mBXTEyMvLy8NHjwYFWvXl3VqlVT6dKlbfpYLBbNnTu3wIoEAAAAbmd3\niP31118lSX5+frpx44aOHTuWpU92j6MFAAAACordIXbLli2FUQcAAACQZ3aviQUAAAAcze6Z2ISE\nhDz18/f3t7sYAAAAIC/sDrHt2rXL05pX7iULAACAwmJ3iH3jjTeyhNiMjAydOnVK69atk7e3t/r3\n719gBQIAAAB/ZneI7dGjR47bhg0bpt69eyslJeUvFQUAAADkpkA/2FWmTBn16NFDS5cuLcjDAgAA\nADYK/O4EmZmZunDhQkEfFgAAALCyezlBTlJTU7Vnzx4tWrRI9erVK6jDAgAAAFnYHWJDQkJyvDuB\nYRjy9/fXSy+99JcLAwAAAHJid4gdOXJktiHWy8tLgYGBatGihVxdC2yCFwAAAMjC7rQ5evTowqgD\nAAAAyDO7P9g1aNAg7dixI8ftO3fu1KBBg/5SUQAAAEBu7A6xu3fvzvXuA4mJidqzZ89fKgoAAADI\nTb5usZXbY2dPnDihsmXL5rsgAAAA4E7ytCZ27dq1Wrt2rfX1vHnzFBMTk6VfSkqKfvvtN7Vp06bA\nCgQAAAD+LE8h9vr167p8+bL1dVpams1r6Y/ZWXd3d/Xv319PP/10wVYJAAAA3CZPIbZv377q27ev\nJKldu3aaOnWq2rdvX6iFAQAAADmx+xZbW7ZsKYw6AAAAgDz7S08lSE1NVWpqqjIzM7Ns8/f3/yuH\nBgAAAHKUrxC7YsUKLV26VPHx8Tn2OXjwYL6LAgAAAHJj9y22Vq5cqVdeeUWBgYEaN26cDMPQ4MGD\n9eSTT6pSpUoKCQnR66+/Xhi1AgAAAJLyEWKXL1+uli1b6r333lPv3r0lSa1bt9b48eO1YcMGXbly\nRZcuXSrwQgEAAIBb7A6xcXFxatu2rSSpZMmSkqSbN29Kkjw8PPToo49qxYoVBVgiAAAAYMvuEOvh\n4aGMjAxJUrly5eTu7q4zZ85Yt5ctWzbXx9ICAAAAf5XdIbZ27do6dOiQ9XXDhg21cuVKnT17VqdP\nn9bq1atVo0aNgqwRAAAAsGF3iO3atauOHDmiGzduSJJGjx6to0ePqk2bNmrXrp2OHz+ucePGFXih\nAAAAwC0WwzCMv3qQ+Ph4bdmyRS4uLmrRooVq1qxZELUVuvPnUxxdAm7j4+PBmDgZxsQ5MS7Oxwxj\n4uPj4egSgAL1lx52cEtAQIAGDx5cEIcCAAAA7ijfIXb79u3avXu3EhMTNWTIEAUFBSk1NVX79+9X\n3bp1Vb58+YKsEwAAALCyO8RevXpVI0eO1M6dO1WiRAllZmbq73//u4KCguTm5qZnnnlGffr00Zgx\nYwqjXgAAAMD+D3b985//1E8//aR//vOf+vbbb3X7klo3Nzd16tRJW7duLcgaAQAwhWXLlmjatOfV\nu3c3tWoVrl69ujm6JKDYsnsm9ssvv9SAAQP00EMPKSkpKcv2mjVr6vPPP7frmNHR0YqJidH58+f1\nyCOPaPr06dqzZ4/eeecdHThwQK6urgoJCdHMmTPl6+ubZf9ff/1VPXv2VPny5bVjx448vefhw4eV\nmJhqV53Zdn3eAAAWGklEQVQoXElJ5RgTJ+Pt3dDRJQCm8u9/vysvLy/VqROi1FR+ngGFye4Qm5yc\nnOt9YDMzM62338qLAwcOKDo6WhMnTlSzZs1UoUIFbdu2TSNGjFBkZKRGjhyp69ev66efftL169ez\nPcarr76qihUrWh/CkBcDJ69QGa/Kee4P3G2uJp/TsunlVKGCn6NLAUwjJmad/Pz8JUmDBvVRWto1\nB1cEFF92h9iAgACbhx382Y4dO1SrVq08H+/o0aOyWCyKjIxU2bJllZ6eroEDB2rYsGEaO3astV/r\n1q2z3f/TTz9VYmKievbsqZiYmDy/bxmvyipXoWqe+wMAcCe3AiyAwmf3mtiePXvqo48+0jfffGNt\ns1gsunnzpqKjo7V161b16dMnT8eaPHmyJk2aJElq3Lix6tatqx9++EFnz55VZGTkHfe/cuWKZs+e\nreeee04lS5a091QAAABgUnbPxD7++OP6/fffNWrUKOtttJ577jldunRJN27cUK9evfIcYkeMGCFf\nX1/Nnz9fy5YtU6lSpfTFF1+ofPny+s9//qNZs2bp5MmTqlWrliZMmKC2bdva7B8dHa177rlH7du3\n18GDB+09FQAAAJiU3SHWYrFo+vTp6tmzp7788kvFxcUpMzNTgYGBioiIULNmzfJ8rICAAAUGBkqS\n7r33Xrm7u+ujjz7S1atX9eKLL2rChAmqVq2aYmJiNHr0aK1du1a1a9eWJB07dkwrV67URx99ZO8p\nAAAAwOTyFGIXLlyodu3aKSgoyNrWpEkTNWnSpFCKunHjhqZOnapevXpJkpo1a6aHHnpIixYt0owZ\nMyRJb7zxhnr27Kl77rmnUGoAAACA88pTiJ09e7Z8fX2tIfbSpUtq166dFixYoPDw8AItyNPTU5LU\ntGlTa1uJEiUUHh5uXTLw3Xffad++fXrppZeUkvLHs6qvXbsmwzCUkpKiUqVKyc3NrUDrAu5WPG/d\nOTEuzufPY+Lq6iIXFwtjBRSSfD121jAMXb16Venp6QVdjzUo3/4QhVuvLRaLJOl///uf0tLS1LFj\nxyz7N23aVGPHjtVTTz1V4LUBd6Pz51McXQL+xMfHg3FxMtmNSXp6hjIyDKcZK8I0ipt8hdjC1LJl\nS7m4uGjnzp2qWbOmpD/uPbtnzx7df//9kqROnTqpXr16Nvt98skn+uabbzRv3jxVrcqtswAAAIoz\npwuxPj4+ioyM1OzZs5WZmanq1atr9erVOnv2rIYNGyZJqlKliqpUqWKz365du+Tq6lpo63QBALiT\nr77aoDNnTsswDF26dEnp6el6//1FkiRfXz9FRHR2cIVA8ZHnEHv69GnrQw5urUM9efJkjg8+CAkJ\nyXdRkyZNUpkyZTR//nwlJyerXr16WrRokQICAvJ9TAAACtv69esUG/uzTduiRQskSWFhjQixQAGy\nGH9efJqNkJAQ63rUW25fo5pdu7Pft7VZz5d57CyQiz8eOxvJY2edEGtinY8ZxoQ1sShu8jQTO336\n9MKuo8gtmx6pxMRUR5eB23h7l2NMnExQUJASE686ugwAALLIU4jt3r17YddR5OrUqeP0/2q+25hh\nJuNu4+Li4ugSAADIVglHFwAAAADYixALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHE\nAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAA\nwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQI\nsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAA\nADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAd\nQiwAAABMhxALAAAA0yHEAgAAwHQIsQAAADAdQiwAAABMhxALAAAA03F1dAGSFB0drZiYGF24cEEP\nPfSQPDw8tGPHDp09e1aVKlVS27ZtNWbMGHl4eFj3adeunRISErIcy2Kx6Pvvv1elSpVyfc/Dhw8r\nMTG1wM8F+ZeUVI4xcTLe3g0dXQJgKsuWLdHhw7/pt98O6vTpBPn6+mvNmnWOLgsolhweYg8cOKDo\n6GhNnDhRTZs21S+//KJPPvlEgwYNUp06dRQfH6+oqCjFxsYqJibGut+7776rGzdu2Bxr6tSpKlmy\n5B0DrCQNnLxCZbwqF/j5AMXF1eRzWja9nCpU8HN0KYBp/Pvf78rLy0t16oQoNZV/lAOFyeEh9ujR\no7JYLIqMjFTZsmVVo0YNDRw40Lo9PDxcVapU0dChQ7V37141adJEkhQSEmJznAsXLujYsWOaMGFC\nnt63jFdllatQteBOBABw14uJWSc/P39J0qBBfZSWds3BFQHFl0PXxE6ePFmTJk2SJDVu3Fh169bV\n4cOHs/SrW7euDMPQuXPncjzWhg0bZBiGOnfuXGj1AgCQm1sBFkDhc+hM7IgRI+Tr66v58+dr2bJl\nKlWqlIKCgrL0+/nnn2WxWFSjRo0cj7VhwwaFhYXJz49ffQIAABR3Dp2JDQgIUGBgoCTp3nvvVWho\nqMqWLWvT59q1a5o1a5aaNm2qevXqZXuchIQExcbG6u9//3uh1wwAAADHc/ia2DuZMmWKkpKS9N57\n7+XYZ/369XJxcdFDDz1UhJUBAADAUZw6xL711lvavHmzlixZoqpVc/4Q1saNG9WsWTN5e3sXYXXA\n3cHHx+POnVDkGBfn8+cxcXV1kYuLhbECConThtilS5dq6dKlioqKUqNGjXLsd/z4cR08eFDTp08v\nwuqAu8f58ymOLgF/4uPjwbg4mezGJD09QxkZhtOMFWEaxY1TPrHrs88+05tvvqnJkycrIiIi177r\n16+Xm5ubOnbsWETVAQAAwNGcbiZ29+7dmjJlilq2bKnQ0FDFxsZat/n6+qpKlSo2/Tdu3KgHH3xQ\n5cqVK+pSAQAA4CBOGWIzMjK0fft2bd++3WbbyJEjNWrUKOvrQ4cO6fjx4xo9enRRlwkAQBZffbVB\nZ86clmEYunTpktLT0/X++4skSb6+foqI4F7mQEGxGIZhOLoIR2j7+Ls8sQvIRWrSKS14vgOPnXVC\nrIl1PrfGZPTo4YqN/TnbPmFhjfTOO/OLuLL/x5pYFDdONxNbVK4m5/z0LwBcI0B+zJ27wNElAHeN\nuzbELpseqcTEVEeXgdt4e5djTJxMUFCQEhOvOroMAACyuGtDbJ06dfh1nJPhV6TOx8XFxdElAACQ\nLae8xRYAAACQG0IsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0\nCLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEA\nAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAw\nHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIs\nAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAA\nTIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAwHUIsAAAATIcQ\nCwAAANMhxAIAAMB0CLEAAAAwHYthGIajiwAAAADswUwsAAAATIcQCwAAANMhxAIAAMB0CLEAAAAw\nHUIsAAAATIcQCwAAANMpdiH26NGjGjx4sMLCwtSqVSu98847ystdxFJTUzV58mQ1bdpUTZo00TPP\nPKNLly4VQcXFX37G5NSpUwoJCcnyZ+LEiUVUdfEWFxenF198UV27dlW9evU0aNCgPO3HdVK48jMu\nXCuFa8OGDRo+fLhatmyp++67Tz169NAXX3xxx/24VoDC5+roAgrS5cuX9dhjj6lOnTqaN2+e4uLi\nNGPGDBmGobFjx+a679ixY3XixAm98cYbkqSZM2dq1KhRWr58eVGUXmz9lTGRpOeff16NGjWyvq5Q\noUJhlnvXOHLkiL7//ns1bNhQGRkZed6P66Rw5XdcJK6VwvLBBx+oWrVqeuGFF1ShQgV99913mjhx\noi5duqT+/fvnuB/XClAEjGJk/vz5RtOmTY0rV65Y2xYuXGiEhYUZqampOe63b98+Izg42Ni7d6+1\nLTY21ggODjZ+/PHHQq25uMvvmJw8edIIDg42tm7dWhRl3tVGjx5tDBw48I79uE6KVl7HhWulcCUl\nJWVpmzBhgtG+ffsc9+FaAYpGsVpO8P3336tly5YqU6aMte3vf/+70tLStGfPnlz3q1Spkho3bmxt\nCw0NVbVq1bRt27ZCrbm4y++YwPlwneBuVL58+Sxt9erV07lz53Lch2sFKBrFKsQeO3ZMNWvWtGnz\n8/OTu7u7jh07lut+tWrVytIeFBSk48ePF3idd5P8jsktkydPVr169dSyZUvNmDFD169fL6xScQdc\nJ86Na6Xo/Pzzz6pRo0aO27lWgKJR7NbEenp6Zmn39PRUcnJyvvY7efJkgdZ4t8nvmLi5uWnAgAFq\n0aKFypUrp127dmnhwoWKj4/Xv/71r8IsGTngOnFOXCtFa8eOHdq8ebOmT5+eYx+uFaBoFKsQi+LD\nx8dHL7zwgvV1eHi4KlasqFdeeUW//fabgoODHVgd4Dy4VorOyZMn9cwzz6hjx4565JFHHF0OcNcr\nVssJPD09lZKSkqX98uXL8vLyKvD9cGcF+bWNiIiQYRj69ddfC6o82IHrxDy4VgpecnKyhg0bpmrV\nqmnmzJm59uVaAYpGsQqxtWrVyrLO8syZM0pLS8t2fVJu+0nZr+eEffI7JtmxWCwFWRrsxHViHlwr\nBevatWsaPny4MjMzNX/+fJUqVSrX/lwrQNEoViH2wQcf1Pbt23X16lVr2xdffCF3d3eFh4fnut+F\nCxe0b98+a9v+/fsVHx+v1q1bF2rNxV1+xyQ7X375pSwWi+rXr1/QZSIPuE7Mg2ul4GRkZGjMmDGK\ni4vTe++9l6f773KtAEXD5eWXX37Z0UUUlNq1a2v16tXatWuXKleurB9//FH//Oc/NWTIELVq1cra\nr2PHjjp8+LDatWsnSfL19dXPP/+sjz/+WH5+fjp+/Lj+8Y9/qHbt2hozZoyjTqdYyO+YREdH67vv\nvlNaWprOnj2rTz75RHPnzlX79u01YMAAR51OsXHt2jVt3rxZv//+u3744QddvnxZFStW1NGjRxUQ\nECBXV1euEwfIz7hwrRSul156SRs3btTEiRPl6emps2fPWv9UrFhRLi4uXCuAgxSrD3Z5enpq6dKl\nevXVV/X000/Lw8NDjz/+uEaNGmXTLzMzU5mZmTZtc+bM0RtvvKGpU6cqMzNTbdu21dSpU4uy/GIp\nv2NSq1YtLV68WDExMbp27Zr8/f01bNgwDR8+vKhPoVi6ePGixo4da/Nr53HjxkmSNm/eLH9/f64T\nB8jPuHCtFK4ffvhBFotFr7/+epZtXCuAY1kM4w4PsQcAAACcTLFaEwsAAIC7AyEWAAAApkOIBQAA\ngOkQYgEAAGA6hFgAAACYDiEWAAAApkOIBQAAgOkQYgEAAGA6xeqJXQDss3btWk2ePDnbbRMnTtSw\nYcOKuCIAAPKGEAvc5SwWi8aPHy8/Pz+b9nr16jmoIgAA7owQC0APPvigQkJCHF2GXdLS0uTu7u7o\nMgAADsKaWAD5sn//fj3xxBO6//771bBhQ7Vv315Tpkyx6XP9+nXNmTNHERERatCggVq1aqUJEybo\n3Llz1j6JiYmaPHmymjdvrtDQUPXo0UNfffWVzXF2796tkJAQffnll5o9e7ZatWqlxo0bW7cnJyfr\n1VdfVevWrdWgQQN16tRJy5YtK9wvAADAoZiJBaCUlBQlJSVZX1ssFpUvXz7H/omJiRo6dKiqVaum\nESNGyN3dXSdPntQ333xj7ZOZmamhQ4dq79696tq1qx577DGlpKTou+++U1xcnCpXrqzr169rwIAB\nOnXqlAYOHChfX1+tX79eY8eO1cyZM9WlSxeb942Ojpa7u7uefPJJXblyRdIfM7IDBgxQYmKi+vXr\np8qVK2vXrl16/fXXdfnyZY0cObKAv1oAAGdAiAXucoZhaODAgTZtZcqU0b59+3LcZ9++fbp8+bIW\nLVpkE3bHjx9v/fvHH3+sPXv2aNq0aerfv7+1/cknn7T+fdWqVTp+/LiioqLUqVMnSVLv3r3Vu3dv\nvfnmm+rcubNcXFys/dPT07VixQqVLFnS2rZ48WKdPn1an332mfz9/a3H8PDw0MKFCzVo0CB5eHjY\n+2UBADg5Qixwl7NYLHrllVdUrVo1a5ura+4/Gjw9PWUYhr7++mv16tVLFoslS59vvvlGlSpVUmRk\nZI7H2bZtm6pUqWINsJLk5uamfv366eWXX9Z///tfhYaGWrf16NHDJsBK0tdff63w8HC5u7vbzCa3\naNFCq1atUmxsrFq2bJnr+QAAzIcQC0ANGjSw64NdTZs2VUREhF566SXNnj1b999/v9q1a6fOnTtb\nQ2Z8fLxq1aqVbcC9JSEhQTVq1MjSXqtWLRmGoVOnTtmE2NuD9i0nTpzQ4cOH1bx58yzbLBaLEhMT\n83xeAADzIMQCyJc5c+bol19+0bfffqvt27dr0qRJWrx4sVatWlVodw0oVapUlrbMzEy1atVKQ4YM\nyXaf2rVrF0otAADHIsQCyLfQ0FCFhoZq7Nix+vLLLzVu3Dht2LBBPXv2VGBgoA4cOKDMzEyVKJH9\njVD8/f117NixLO3Hjh2TxWJR1apV71hDYGCgrl27lu1MLACg+OIWWwDsdvny5SxtwcHBkv64rZYk\ndejQQRcuXNDKlStzPE7r1q115swZm1tq3bx5UytXrlTFihVVv379O9YSERGhvXv3avfu3Vm23b5G\nFgBQvDATC9zlDMOwe5+1a9dqxYoV6tChgwIDA5WWlqY1a9bIw8NDrVu3liR1795dn376qV577TXF\nxsbqvvvuU2pqqrZt26axY8eqSZMm6tOnj1avXq3nnntOv/zyi/z8/LR+/Xr99ttvmjlzps2dCXIy\ndOhQbd68WU888YR69uypunXrKjU1VQcPHtQ333yjffv25TgTDAAwL0IscJfL7YNXOWnatKn279+v\njRs36uLFi/Lw8FBoaKhmzpxpXQLg4uKiRYsW6d1339UXX3yhjRs3ytvbW+Hh4QoMDJT0xxrXZcuW\nadasWfrkk0905coV3XPPPZozZ47+9re/5alOd3d3ffjhh5o/f76++uorffzxx/Ly8lLNmjX17LPP\nEmABoJiyGPmZhgEAAAAciCkKAAAAmA4hFgAAAKZDiAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKZD\niAUAAIDpEGIBAABgOoRYAAAAmA4hFgAAAKbzf0ffluz8UlLaAAAAAElFTkSuQmCC\n", 314 | "text/plain": [ 315 | "" 316 | ] 317 | }, 318 | "metadata": {}, 319 | "output_type": "display_data" 320 | } 321 | ], 322 | "source": [ 323 | "xgb.plot_importance(bst)" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "In case you want to visualize it another way, a created model enables convinient way of accessing the F-score." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 8, 336 | "metadata": { 337 | "collapsed": false 338 | }, 339 | "outputs": [ 340 | { 341 | "data": { 342 | "text/plain": [ 343 | "{'f27': 1, 'f29': 2, 'f39': 1, 'f64': 1}" 344 | ] 345 | }, 346 | "execution_count": 8, 347 | "metadata": {}, 348 | "output_type": "execute_result" 349 | } 350 | ], 351 | "source": [ 352 | "importances = bst.get_fscore()\n", 353 | "importances" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "Now you can manipulate data in your own way" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 9, 366 | "metadata": { 367 | "collapsed": false 368 | }, 369 | "outputs": [ 370 | { 371 | "data": { 372 | "text/plain": [ 373 | "" 374 | ] 375 | }, 376 | "execution_count": 9, 377 | "metadata": {}, 378 | "output_type": "execute_result" 379 | }, 380 | { 381 | "data": { 382 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAg8AAAFzCAYAAAC5LeqFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt0VOWh9/HfmASEXCCBgQRIJARCEllRgQSBcC8FasUK\nSi3IzQPKNYjwNiegYnXJzSJeouARhBZbTfSgRQTPaVFuCgJCaXlBIiFCuIVAQiBCiCbz/sHrLGIA\neSaT7JnM97OWf8yeZ+/8Zj1r42/tefYem8PhcAgAAOAm3WJ1AAAA4F0oDwAAwAjlAQAAGKE8AAAA\nI5QHAABghPIAAACM+FsdwCo//FCuoqKLVseAC0JDGzJ3Xoz5817MnXez24PddiyfvfLg7+9ndQS4\niLnzbsyf92Lu8COfLQ8AAMA1lAcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAY\noTwAAAAjlAcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAYoTwAAAAjlAcAAGDE\n3+oAVsnOzlZhYYnVMeCCoqIg5s6LMX/ei7nzbnZ7R7cdy2fLQ+5b7RVttzoFXBVmdQBUC/PnvZg7\n75RbIOluh9uO57PlIdouxUZYnQIAAO/DmgcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADA\nCOUBAAAYoTwAAAAjlAcAAGCE8gAAAIxQHgAAgBHKAwAAMOIRP4yVkZGhrKwsnTlzRoMGDVJwcLC2\nbdum/Px8NW3aVH369FFqaqqCg4Mr7ZeVlaXly5frxIkTioyM1IQJEzR48GCLPgUAAL7B8vKwb98+\nZWRkaMaMGUpOTta//vUvrV69WqNGjVJsbKzy8vK0ePFi7d27V1lZWc791q5dqzlz5ujRRx9Vly5d\ntHnzZqWlpSkwMFD9+vWz8BMBAFC3WV4ecnJyZLPZNHz4cAUGBqp169YaOXKk8/2kpCQ1b95c48aN\n065du9S5c2dJV65W3HfffZo+fbokqVu3bjpx4oReeuklygMAADXI0jUP6enpSktLkyR16tRJ8fHx\nys7OrjIuPj5eDodDp0+fliSVlpbqyJEj6tq1a6Vx3bt316FDh3Ty5MmaDw8AgI+y9MrDpEmTFB4e\nrqVLl2rVqlWqX7++YmJiqozbs2ePbDabWrduLUkqKyuTw+FQQEBApXE/vs7JyVFERESN5wcAwBdZ\neuUhMjJSUVFRkqQOHTooMTFRgYGBlcaUlpbqj3/8o5KTk5WQkCBJCgkJUaNGjbRv375KY/fu3StJ\nKi4uroX0AAD4JsvXPPycWbNmqaioSMuWLau0/aGHHtKqVat01113ORdMrlmzRpJks9msiAoAgE/w\n6PKwcOFCbdiwQStWrFDLli0rvTdx4kQdPXpUqampcjgcaty4sVJTU7Vw4ULZ7XaLEgMAUPd57EOi\nVq5cqZUrV2rhwoXq2LFjlfdvvfVWLV68WJ9//rnWrl2rzZs3q0WLFgoICHB+vQEAANzPI688rFmz\nRgsWLNCsWbM0YMCAG44NCwtTWFiYKioq9O6772rgwIFV1k0AAAD38bjysGPHDs2aNUspKSlKTEx0\nLoKUpPDwcDVv3lyStHHjRh0/flwxMTE6e/as3nvvPeXm5mrBggVWRQcAwCd4ZHkoLy/X1q1btXXr\n1krvTZ48WVOmTJEk+fn5KTMzU3l5eapXr5569Oih+fPnq1mzZlbEBgDAZ9gcDofD6hBWyF5kUyyP\nggAA+IDsk1LsDPf9795jF0wCAADPRHkAAABGKA8AAMAI5QEAABihPAAAACOUBwAAYITyAAAAjFAe\nAACAEcoDAAAwQnkAAABGKA8AAMAI5QEAABihPAAAACOUBwAAYMTf6gBWyS2wOgEAALUjt0CKdePx\nfLY8RD9yUIWFJVbHgAvCwoKYOy/G/Hkv5s57NXLz8WwOh8Ph5mN6jYKCC1ZHgAvs9mDmzosxf96L\nufNudnuw247FmgcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAYoTwAAAAjlAcA\nAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAYoTwAAAAjlAcAAGCE8gAAAIxQHgAA\ngBHKAwAAMEJ5AAAARigPAADACOUBAAAYoTwAAAAjlAcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAA\nRigPAADACOUBAAAYoTwAAAAjlAcAAGCE8gAAAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAY\n8bc6gFWys7NVWFhidQy4oKgoiLnzYmFhd1gdAUA1+Wx5yH2rvaLtVqeAq8KsDgCX5BZIOWEHFRoa\nYXUUANXgs+Uh2i7F8u8XAADGWPMAAACMUB4AAIARygMAADBCeQAAAEYoDwAAwAjlAQAAGKE8AAAA\nI5QHAABghPIAAACMUB4AAIARygMAADDiEb9tkZGRoaysLBUUFCglJUUBAQE6cOCAzp49q5CQEHXq\n1ElPPPGEbrvttkr7ZWVlafny5Tpx4oQiIyM1YcIEDR482KJPAQCAb7C8POzbt08ZGRmaMWOGunTp\notDQUL3++uuaNm2aWrZsqYKCAi1dulRjxozRRx99pKCgIEnS2rVrNWfOHD366KPq0qWLNm/erLS0\nNAUGBqpfv34WfyoAAOouy8tDTk6ObDabhg8frsDAQEnSvHnzKo25/fbbNWDAAG3fvl2/+MUvJF25\nWnHfffdp+vTpkqRu3brpxIkTeumllygPAADUIEvXPKSnpystLU2S1KlTJ8XHx2vnzp1VxjVq1EiS\nVFZWJkkqLS3VkSNH1LVr10rjunfvrkOHDunkyZM1nBwAAN9l6ZWHSZMmKTw8XEuXLtWqVatUv359\nxcTESJIcDofKy8uVn5+vl156Sa1atVLv3r0lXSkRDodDAQEBlY734+ucnBxFRETU6mcBAMBXWFoe\nIiMjFRUVJUnq0KGDGjRo4HzvmWeeUWZmpiQpKipKb731lho2bChJCgkJUaNGjbRv3z796le/cu6z\nd+9eSVJxcXFtfQQAAHyOx96qOXHiRL3//vt65ZVXFBYWprFjx6qwsND5/kMPPaR3331Xf//733X+\n/HmtXbtWa9askSTZbDarYgMAUOdZvmDyesLDwxUeHq4OHTqoW7du6tu3r/7yl79o6tSpkq6Ui6NH\njyo1NVUOh0ONGzdWamqqFi5cKLvdbnF6AADqLo8tD1cLCgpSZGSk8vLynNtuvfVWLV68WE899ZQK\nCwsVFRWlzz77TAEBAUpISLAwLYCfY7cHWx0BLmLuIHlJeSgsLFRubq5zweTVwsLCFBYWpoqKCr37\n7rsaOHCg85ZPAJ6poOCC1RHgArs9mLnzYu4sfh5XHlasWKFjx46pc+fOatKkifLy8vSnP/1Jt956\nq4YNG+Yct3HjRh0/flwxMTE6e/as3nvvPeXm5mrBggUWpgcAoO7zuPIQFxenzZs3a/369fruu+8U\nHh6uLl26aNKkSWrevLlznJ+fnzIzM5WXl6d69eqpR48emj9/vpo1a2ZhegAA6j6bw+FwWB3CCtmL\nbIrlURBArco+KenegwoN5eTzRnxt4d3c+bWFx96qCQAAPBPlAQAAGKE8AAAAI5QHAABghPIAAACM\nUB4AAIARygMAADBCeQAAAEZcKg9lZWXKzMzUjBkzNHbsWO3fv1+SVFxcrPfff18nT550a0gAAOA5\njB9PffbsWY0ePVo5OTmy2+0qKChQcXGxJCkkJERLlixRTk6O0tLS3B4WAABYz/jKwwsvvKD8/Hxl\nZmbqww8/1NVPt7bZbPrlL3+prVu3ujUkAADwHMblYePGjRo9erQSExNls9mqvB8VFaUTJ064JRwA\nAPA8xuXh0qVLstvtN3y/oqKiWqEAAIDnMi4PMTEx2r1793Xf/+yzzxQXF1etUAAAwHMZL5gcMWKE\nnnrqKcXHx6t///7O7SdOnNBrr72mXbt2afHixW4NWRNyC6xOAPie3AIp2uoQAKrN5rh6xeNNysjI\n0JIlSyRJ5eXl8vf3V3l5uWw2m1JTUzVhwgS3B3W37OxsFRaWWB0DLggLC2LuvFhS0h0qLLxodQy4\nwG4PVkHBBatjwEV2e7DbjuVSeZCk48eP6+9//7uOHDmiiooKRUVFqX///oqKinJbuJrGSeCd+AfM\nuzF/3ou5827uLA9GX1tcvnxZ69evV5s2bZSYmKgxY8a4LQgAAPAORgsm69evr6eeekoHDhyoqTwA\nAMDDGd9t0a5dO506daomsgAAAC9gXB4ef/xxvfPOO9qxY0dN5AEAAB7O+FbNrKwsNWrUSKNHj9Zt\nt92mVq1a6dZbb600xmaz6dVXX3VbSAAA4DmMy8OPv6AZERGhsrIyHT58uMqYaz22GgAA1A3G5eHT\nTz+tiRwAAMBLGK95AAAAvs34ysPN/mJmixYtjMMAAADPZ1we+vbte1NrGngWBAAAdZNxeZg7d26V\n8lBeXq7jx4/rb3/7m8LCwjRixAi3BQQAAJ7FuDwMGTLkuu+NHz9ew4YN04ULPPscAIC6yq0LJhs2\nbKghQ4Zo5cqV7jwsAADwIG6/26KiokJnzpxx92EBAICHMP7a4npKSkq0c+dOLV++XAkJCe46LAAA\n8DDG5SEuLu66d1s4HA61aNFCc+bMqXYwAADgmYzLw+TJk69ZHho1aqSoqCh1795d/v5uu6ABAAA8\njPH/5adOnVoTOQAAgJcwXjA5atQobdu27brvb9++XaNGjapWKAAA4LmMy8OOHTtueDdFYWGhdu7c\nWa1QAADAc7l0q+aNHk995MgRBQYGuhwIAAB4tpta8/DBBx/ogw8+cL5esmSJsrKyqoy7cOGCDh48\nqN69e7stIAAA8Cw3VR4uX76s8+fPO19funSp0mvpytWIBg0aaMSIEZo4caJ7UwIAAI9hczgcDpMd\n+vbtq9mzZ6tfv341lanWFBTwGxzeyG4PZu68GPPnvZg772a3B7vtWMa3an766adu++MAAMD7VOtp\nTiUlJSopKVFFRUWV91q0aFGdQwMAAA/lUnn461//qpUrVyovL++6Yw4cOOByKAAA4LmMb9V85513\n9OyzzyoqKkqPP/64HA6HRo8erUcffVRNmzZVXFycnn/++ZrICgAAPIBxeXj77beVkpKiZcuWadiw\nYZKkXr16afr06Vq3bp2+++47nTt3zu1BAQCAZzAuD0ePHlWfPn0kSQEBAZKk77//XpIUHBysBx54\nQH/961/dGBEAAHgS4/IQHBys8vJySVJQUJAaNGigU6dOOd8PDAy84eOrAQCAdzMuD+3atdPXX3/t\nfH3HHXfonXfeUX5+vk6ePKnMzEy1bt3anRkBAIAHMS4PgwcP1jfffKOysjJJV36iOycnR71791bf\nvn2Vm5urxx9/3O1BAQCAZzB+wuS15OXl6dNPP5Wfn5+6d++u6Ohod2SrcTwpzTvxlDvvxvx5L+bO\nu1n6hMlriYyM1OjRo91xKAAA4OFcLg9bt27Vjh07VFhYqLFjxyomJkYlJSX697//rfj4eDVu3Nid\nOQEAgIcwLg8XL17U5MmTtX37dt1yyy2qqKjQPffco5iYGNWrV08zZ87Ub3/7W6WmptZEXgAAYDHj\nBZMvvviivvrqK7344ov67LPPdPWSiXr16mngwIHauHGjOzMCAAAPYlwePvnkEz388MMaNGiQ8yFR\nV4uOjtaxY8fcEg4AAHge468tiouLb/gch4qKCudtnJ4sOztbhYUlVseAC4qKgpg7LxYWdofVEQBU\nk3F5iIyMrPSQqJ/atm2b2rRpU61QtSH3rfaKtludAq4KszoAXJJbIOWEHVRoaITVUQBUg3F5GDp0\nqF5++WV169ZNnTp1kiTZbDZ9//33euONN7Rx40Y988wz7s7pdtF2KZZ/vwAAMGZcHh555BEdOnRI\nU6ZMcd6O+fvf/17nzp1TWVmZHnzwQf32t791e1AAAOAZjMuDzWbTvHnzNHToUH3yySc6evSoKioq\nFBUVpQEDBqhLly41kRMAAHiImyoPb775pvr27auYmBjnts6dO6tz5841FgwAAHimm7pVc9GiRdq/\nf7/z9blz59SxY0ft3LmzxoIBAADPZPycB0lyOBy6ePGifvjhB3fnAQAAHs6l8gAAAHwX5QEAABi5\n6bstTp486Xw41IULV37P/dixY9d9YFRcXJwb4gEAAE9jc1z9y1bXERcXJ5vNVmmbw+Gosu3q7QcO\nHHBfyhqQvcjGQ6KAWpZ9UtK9PGHSW9ntwSoouGB1DLjIbg9227Fu6srDvHnz3PYHAQCAd7up8nD/\n/ffXdA4AAOAlWDAJAACMeER5yMjIUM+ePRUfH6/09HRJ0s6dOzVy5EjdddddSkpK0siRI3Xq1Klr\n7r9//37Fx8era9eutRkbAACfZPzbFu62b98+ZWRkaMaMGerSpYtCQ0O1efNmTZo0ScOHD9fkyZN1\n+fJlffXVV7p8+fI1j/Hcc8+pSZMmKi8vr+X0AAD4HsvLQ05Ojmw2m4YPH67AwED98MMPGjlypMaP\nH69p06Y5x/Xq1eua+3/44YcqLCzU0KFDlZWVVVuxAQDwWZZ+bZGenq60tDRJUqdOnRQfH6/PP/9c\n+fn5Gj58+M/u/91332nRokX6/e9/r4CAgJqOCwAAZHF5mDRpkiZOnChJWrVqlTIzM/XFF1+ocePG\n+uc//6kBAwbo9ttv17333qvPPvusyv4ZGRlq27at+vXrV9vRAQDwWZaWh8jISEVFRUmSOnTooMTE\nRF26dEkXL17U008/rXHjxmnZsmVq27atpk6dqm+++ca57+HDh/XOO+9o9uzZVsUHAMAnecTdFj9V\nVlam6dOn68EHH1TXrl21aNEitWzZUsuXL3eOmTt3roYOHaq2bdtamBQAAN9j+YLJnwoJCZEkJScn\nO7fdcsstSkpKcj7yetOmTdq9e7fmzJnj/J2N0tJSORwOXbhwQfXr11e9evVqPzyAm+LOx+SidjF3\nkDywPMTExEi68hsZV7v6tzS+/fZbXbp0Sf3796+yf3JysqZNm6YJEybUfFgALuH3EbwTv23h3Wr9\nty1qU0pKivz8/LR9+3ZFR0dLkioqKrRz507dfffdkqSBAwcqISGh0n6rV6/WP/7xDy1ZskQtW7as\n9dwAAPgKjysPdrtdw4cP16JFi1RRUaHbbrtNmZmZys/P1/jx4yVJzZs3V/PmzSvt9+WXX8rf31+d\nO3e2IjYAAD7D48qDJKWlpalhw4ZaunSpiouLlZCQoOXLlysyMtLqaAAA+Dyb46eLC3xE9iKbYiOs\nTgH4luyTku49qNBQTj5vxJoH7+bONQ8eeasmAADwXJQHAABghPIAAACMUB4AAIARygMAADBCeQAA\nAEYoDwAAwAjlAQAAGKE8AAAAI5QHAABghPIAAACMUB4AAIARygMAADBCeQAAAEb8rQ5gldwCqxMA\nvie3QIq2OgSAavPZ8hD9yEEVFpZYHQMuCAsLYu68VCNJMTExKiy8aHUUANXgs+UhNjZWBQUXrI4B\nF9jtwcydF/Pz87M6AoBqYs0DAAAwQnkAAABGKA8AAMAI5QEAABihPAAAACOUBwAAYITyAAAAjFAe\nAACAEcoDAAAwQnkAAABGKA8AAMAI5QEAABihPAAAACOUBwAAYITyAAAAjFAeAACAEcoDAAAwQnkA\nAABGKA8AAMAI5QEAABihPAAAACOUBwAAYITyAAAAjFAeAACAEcoDAAAwQnkAAABGKA8AAMAI5QEA\nABihPAAAACOUBwAAYITyAAAAjFAeAACAEcoDAAAwQnkAAABGKA8AAMAI5QEAABihPAAAACOUBwAA\nYITyAAAAjPhbHcAq2dnZKiwssToGXFBUFMTcebGwsDusjgCgmny2POS+1V7RdqtTwFVhVgeAS3IL\npJywgwoNjbA6CoBq8NnyEG2XYvn3CwAAY6x5AAAARigPAADACOUBAAAYoTwAAAAjlAcAAGCE8gAA\nAIxQHgAAgBHKAwAAMEJ5AAAARigPAADACOUBAAAYoTwAAAAjHvHDWBkZGcrKytKZM2c0aNAgBQcH\na9u2bcrPz1fTpk3Vp08fpaamKjg42LlP3759deLEiSrHstls2rJli5o2bVqbHwEAAJ9heXnYt2+f\nMjIyNGPGDCUnJ+tf//qXVq9erVGjRik2NlZ5eXlavHix9u7dq6ysLOd+r7/+usrKyioda/bs2QoI\nCKA4AABQgywvDzk5ObLZbBo+fLgCAwPVunVrjRw50vl+UlKSmjdvrnHjxmnXrl3q3LmzJCkuLq7S\ncc6cOaPDhw/riSeeqNX8AAD4GkvXPKSnpystLU2S1KlTJ8XHxys7O7vKuPj4eDkcDp0+ffq6x1q3\nbp0cDod+9atf1VheAABg8ZWHSZMmKTw8XEuXLtWqVatUv359xcTEVBm3Z88e2Ww2tW7d+rrHWrdu\nne68805FRETUYGIAAGDplYfIyEhFRUVJkjp06KDExEQFBgZWGlNaWqo//vGPSk5OVkJCwjWPc+LE\nCe3du1f33HNPjWcGAMDXWb7m4efMmjVLRUVFWrZs2XXHrF27Vn5+fho0aFAtJgMAwDd5dHlYuHCh\nNmzYoBUrVqhly5bXHbd+/Xp16dJFYWFhtZgOgKvs9uCfHwSPxNxB8uDysHLlSq1cuVKLFy9Wx44d\nrzsuNzdXBw4c0Lx582oxHYDqKCi4YHUEuMBuD2buvJg7i59HPmFyzZo1WrBggdLT0zVgwIAbjl27\ndq3q1aun/v3711I6AAB8m8ddedixY4dmzZqllJQUJSYmau/evc73wsPD1bx580rj169fr549eyoo\nKKi2owIA4JM8sjyUl5dr69at2rp1a6X3Jk+erClTpjhff/3118rNzdXUqVNrOyYAAD7L5nA4HFaH\nsEL2IptieSQEUKuyT0q696BCQzn5vBFrHrxbnV/zAAAAPBflAQAAGKE8AAAAI5QHAABghPIAAACM\nUB4AAIARygMAADBCeQAA4CfWrftI//EfI/XLX/bSoEF99cgjI/Tqq4uNj7N8+Rv69a9/4Xy9Z89X\n6tEjSbm5hyVJP/zwg95667906NA3bsteGzzuCZMAgLqlvLxc33572JK/3bp1G/n5+Rnts2rVCi1b\ntlQPPzxGEydOVVlZmQ4ePKD/+Z/1mjp1utGxbDabJJvzdfv28XrjjRVq2bKVJOn777/XihVvKiKi\nhdq2bWd0bCtRHgAANerbbw+r+MNOirbX7t/NLZC+/c1Xiokx+5/y6tXv6f77H9D48ROd27p1S9HY\nseOrnalhw4ZKSOjgfO2tD3mmPAAAaly0XZb8JEChC/uUlFxQaGjYDcecOnVSDz44WE8//Zy2b/9C\nW7ZsUv369TVkyIM3LBl79nyl1NQJ+vOfMxUd3UYDBvSSzWbT3Ll/0Ny5f5DNZlNW1hqFh4dr1aoV\n+vjjNTp9+rSCgoIUG9tes2c/87PZagPlAQCAq8TGxun99zPVrFlzde/eQyEhja47dsmSV9WtW4qe\nf36B/vnPPVqx4k01bhyq++9/4Lr7XPkq44qXX16iadMmasyYceratbskqWnTplq/fq3efnulJk5M\nVXR0GxUXF2v37p26dOmSQkPd91ldRXkAAOAqTzyRplmzZmrevGclSbfd1lq9e/fT7373sBo2DKw0\nNjo6RjNnpkuSkpLuVmFhoVatWnHD8nC1+PjbJUktWrSs9HXG11/vV1LS3frNb4Y6t/Xs2bs6H8ut\nuNsCAICrxMS01V/+8r7mz39RQ4Y8KElauXKZxo0bpdLS0kpje/ToVel1r159dOZMgU6fzq9Whnbt\nYrVt21YtX/6GDhz4v6qoqKjW8dzNZ6885BZYnQDwPbkFUrTVIYCb4O/vr27dUtStW4okae3av2nh\nwue1du2HeuCBh5zjfrr+IDQ0TA6HQ2fPnlGzZs1d/vv33HOfLl68pDVrPtCf/rRcISEhuu++oRo3\nbkKlrz2s4rPlIfqRgyosLLE6BlwQFhbE3HmpRpJiYmJUWHjR6iiAkV//+j4tWfKKjhw5Uml7UVFh\nldc2m01NmjSt1t+z2WwaNux3GjbsdyooOK3//d/1+q//el3NmjXXffcNqdax3cFny0NsbKwKCi5Y\nHQMusNuDmTsvZnrPPVDbioqKFPqTVYlFRUUqKSlRkyZNKm3fvHljpXUJmzZ9qiZNmt70VYeAgABJ\nUllZ2XXH2O3NNGLEaH388Rp9+23uzX6MGuWz5QEAgGsZPfohpaT0VHLy3QoNDdPJkyf07rt/UYMG\nDTRw4D2Vxn777WG98MJc9e7dV3v27Na6dR9p2rSZNzz+1c928Pf3V0REC3366T8UHd1G9erVV9u2\n7bR48UKFhDTS7bd3UGBgkHbv3qXjx4+pU6ekGvnMpigPAIAaZ8U6s9yCK1+VmRo7dry2bNmkl19e\npPPnixUW1lSJiYl69tl5Cg+v/LCKiROn6osvturJJ9NUr159jRkzzrnI8np+umbh//yfWXrttZc1\nffpkff/998rKWqMOHRL10Ucfas2a1SorK1PLlpFKS3tSKSk9XfhE7mdzeOvjrdyAS9/eia8tvBvz\n571cnTtvezz1zfjxIVELFy5W164pbj9+TbDbg912LK48AABqlJ+fn/EjouHZeM4DAAAu8IRbJq3C\nlQcAAAyFh0do8+YdVsewDFceAACAEcoDAAAwQnkAAABGKA8AAMAI5QEAABihPAAAACOUBwAAYITy\nAAAAjFAeAACAEcoDAAAwQnkAAABGKA8AAMAI5QEAABixORwOh9UhAACA9+DKAwAAMEJ5AAAARigP\nAADACOUBAAAYoTwAAAAjlAcAAGCkzpWHnJwcjR49Wnfeead69OihV155RTdzN2pJSYnS09OVnJys\nzp07a+bMmTp37lwtJMaPXJm748ePKy4ursp/M2bMqKXU+NHRo0f19NNPa/DgwUpISNCoUaNuaj/O\nPeu5Mnece55h3bp1euyxx5SSkqK77rpLQ4YM0ccff/yz+1X3vPOvTmhPc/78eY0ZM0axsbFasmSJ\njh49qvnK+z6SAAAEyElEQVTz58vhcGjatGk33HfatGk6cuSI5s6dK0l64YUXNGXKFL399tu1Ed3n\nVWfuJOk///M/1bFjR+fr0NDQmoyLa/jmm2+0ZcsW3XHHHSovL7/p/Tj3rOfq3Emce1b785//rFat\nWunJJ59UaGioNm3apBkzZujcuXMaMWLEdfer9nnnqEOWLl3qSE5Odnz33XfObW+++abjzjvvdJSU\nlFx3v927dzvat2/v2LVrl3Pb3r17He3bt3d88cUXNZoZV7g6d8eOHXO0b9/esXHjxtqIiZs0depU\nx8iRI392HOee57nZuePc8wxFRUVVtj3xxBOOfv36XXcfd5x3depriy1btiglJUUNGzZ0brvnnnt0\n6dIl7dy584b7NW3aVJ06dXJuS0xMVKtWrbR58+YazYwrXJ07eDfOPaB6GjduXGVbQkKCTp8+fd19\n3HHe1anycPjwYUVHR1faFhERoQYNGujw4cM33K9NmzZVtsfExCg3N9ftOVGVq3P3o/T0dCUkJCgl\nJUXz58/X5cuXayoq3Ihzz/tx7nmePXv2qHXr1td93x3nXZ1b8xASElJle0hIiIqLi13a79ixY27N\niGtzde7q1aunhx9+WN27d1dQUJC+/PJLvfnmm8rLy9Nrr71Wk5HhBpx73otzzzNt27ZNGzZs0Lx5\n8647xh3nXZ0qD/A9drtdTz75pPN1UlKSmjRpomeffVYHDx5U+/btLUwH1F2ce57n2LFjmjlzpvr3\n76/f/OY3Nfq36tTXFiEhIbpw4UKV7efPn1ejRo3cvh/cx51zMGDAADkcDu3fv99d8VBDOPfqFs49\n6xQXF2v8+PFq1aqVXnjhhRuOdcd5V6fKQ5s2bap8P37q1CldunTpmt/v3Gg/6drfw6NmuDp312Kz\n2dwZDTWIc69u4dyzRmlpqR577DFVVFRo6dKlql+//g3Hu+O8q1PloWfPntq6dasuXrzo3Pbxxx+r\nQYMGSkpKuuF+Z86c0e7du53b/v3vfysvL0+9evWq0cy4wtW5u5ZPPvlENptNt99+u7tjws049+oW\nzr3aV15ertTUVB09elTLli27qedsuOO883vmmWeecTW0p2nXrp0yMzP15ZdfqlmzZvriiy/04osv\nauzYserRo4dzXP/+/ZWdna2+fftKksLDw7Vnzx7993//tyIiIpSbm6s//OEPateunVJTU636OD7F\n1bnLyMjQpk2bdOnSJeXn52v16tV69dVX1a9fPz388MNWfRyfVFpaqg0bNujQoUP6/PPPdf78eTVp\n0kQ5OTmKjIyUv78/556HcmXuOPc8w5w5c7R+/XrNmDFDISEhys/Pd/7XpEkT+fn51ch5V6cWTIaE\nhGjlypV67rnnNHHiRAUHB+uRRx7RlClTKo2rqKhQRUVFpW0vv/yy5s6dq9mzZ6uiokJ9+vTR7Nmz\nazO+T3N17tq0aaO33npLWVlZKi0tVYsWLTR+/Hg99thjtf0RfN7Zs2c1bdq0SpeuH3/8cUnShg0b\n1KJFC849D+XK3HHueYbPP/9cNptNzz//fJX3avK8szkcN/HDDwAAAP9fnVrzAAAAah7lAQAAGKE8\nAAAAI5QHAABghPIAAACMUB4AAIARygMAADBCeQAAAEYoDwAAwMj/A1qyBrpQzVd+AAAAAElFTkSu\nQmCC\n", 383 | "text/plain": [ 384 | "" 385 | ] 386 | }, 387 | "metadata": {}, 388 | "output_type": "display_data" 389 | } 390 | ], 391 | "source": [ 392 | "# create df\n", 393 | "importance_df = pd.DataFrame({\n", 394 | " 'Splits': list(importances.values()),\n", 395 | " 'Feature': list(importances.keys())\n", 396 | " })\n", 397 | "importance_df.sort_values(by='Splits', inplace=True)\n", 398 | "importance_df.plot(kind='barh', x='Feature', figsize=(8,6), color='orange')" 399 | ] 400 | } 401 | ], 402 | "metadata": { 403 | "kernelspec": { 404 | "display_name": "Python 3", 405 | "language": "python", 406 | "name": "python3" 407 | }, 408 | "language_info": { 409 | "codemirror_mode": { 410 | "name": "ipython", 411 | "version": 3 412 | }, 413 | "file_extension": ".py", 414 | "mimetype": "text/x-python", 415 | "name": "python", 416 | "nbconvert_exporter": "python", 417 | "pygments_lexer": "ipython3", 418 | "version": "3.5.2" 419 | } 420 | }, 421 | "nbformat": 4, 422 | "nbformat_minor": 0 423 | } 424 | -------------------------------------------------------------------------------- /notebooks/3. Going deeper/3.3 Hyper-parameter tuning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Hyper-parameter tuning\n", 15 | "\n", 16 | "As you know there are plenty of tunable parameters. Each one results in different output. The question is which combination results in best output.\n", 17 | "\n", 18 | "The following notebook will show you how to use Scikit-learn modules for figuring out the best parameters for your models.\n", 19 | "\n", 20 | "**What's included:**\n", 21 | "- data preparation,\n", 22 | "- finding best hyper-parameters using grid-search,\n", 23 | "- finding best hyper-parameters using randomized grid-search" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### Prepare data\n", 31 | "Let's begin with loading all required libraries in one place and setting seed number for reproducibility." 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "metadata": { 38 | "collapsed": true 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "import numpy as np\n", 43 | "\n", 44 | "from xgboost.sklearn import XGBClassifier\n", 45 | "\n", 46 | "from sklearn.grid_search import GridSearchCV, RandomizedSearchCV\n", 47 | "from sklearn.datasets import make_classification\n", 48 | "from sklearn.cross_validation import StratifiedKFold\n", 49 | "\n", 50 | "from scipy.stats import randint, uniform\n", 51 | "\n", 52 | "# reproducibility\n", 53 | "seed = 342\n", 54 | "np.random.seed(seed)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "Generate artificial dataset:" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 2, 67 | "metadata": { 68 | "collapsed": false 69 | }, 70 | "outputs": [], 71 | "source": [ 72 | "X, y = make_classification(n_samples=1000, n_features=20, n_informative=8, n_redundant=3, n_repeated=2, random_state=seed)" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "Define cross-validation strategy for testing. Let's use `StratifiedKFold` which guarantees that target label is equally distributed across each fold:" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 3, 85 | "metadata": { 86 | "collapsed": true 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "cv = StratifiedKFold(y, n_folds=10, shuffle=True, random_state=seed)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "### Grid-Search\n", 98 | "In grid-search we start by defining a dictionary holding possible parameter values we want to test. **All** combinations will be evaluted." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 4, 104 | "metadata": { 105 | "collapsed": false 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "params_grid = {\n", 110 | " 'max_depth': [1, 2, 3],\n", 111 | " 'n_estimators': [5, 10, 25, 50],\n", 112 | " 'learning_rate': np.linspace(1e-16, 1, 3)\n", 113 | "}" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Add a dictionary for fixed parameters." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 5, 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "params_fixed = {\n", 132 | " 'objective': 'binary:logistic',\n", 133 | " 'silent': 1\n", 134 | "}" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "Create a `GridSearchCV` estimator. We will be looking for combination giving the best accuracy." 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 6, 147 | "metadata": { 148 | "collapsed": false 149 | }, 150 | "outputs": [], 151 | "source": [ 152 | "bst_grid = GridSearchCV(\n", 153 | " estimator=XGBClassifier(**params_fixed, seed=seed),\n", 154 | " param_grid=params_grid,\n", 155 | " cv=cv,\n", 156 | " scoring='accuracy'\n", 157 | ")" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "Before running the calculations notice that $3*4*3*10=360$ models will be created to test all combinations. You should always have rough estimations about what is going to happen." 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 7, 170 | "metadata": { 171 | "collapsed": false 172 | }, 173 | "outputs": [ 174 | { 175 | "data": { 176 | "text/plain": [ 177 | "GridSearchCV(cv=sklearn.cross_validation.StratifiedKFold(labels=[0 1 ..., 1 1], n_folds=10, shuffle=True, random_state=342),\n", 178 | " error_score='raise',\n", 179 | " estimator=XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,\n", 180 | " gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,\n", 181 | " min_child_weight=1, missing=None, n_estimators=100, nthread=-1,\n", 182 | " objective='binary:logistic', reg_alpha=0, reg_lambda=1,\n", 183 | " scale_pos_weight=1, seed=342, silent=1, subsample=1),\n", 184 | " fit_params={}, iid=True, n_jobs=1,\n", 185 | " param_grid={'n_estimators': [5, 10, 25, 50], 'learning_rate': array([ 1.00000e-16, 5.00000e-01, 1.00000e+00]), 'max_depth': [1, 2, 3]},\n", 186 | " pre_dispatch='2*n_jobs', refit=True, scoring='accuracy', verbose=0)" 187 | ] 188 | }, 189 | "execution_count": 7, 190 | "metadata": {}, 191 | "output_type": "execute_result" 192 | } 193 | ], 194 | "source": [ 195 | "bst_grid.fit(X, y)" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "Now, we can look at all obtained scores, and try to manually see what matters and what not. A quick glance looks that the largeer `n_estimators` then the accuracy is higher." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 8, 208 | "metadata": { 209 | "collapsed": false 210 | }, 211 | "outputs": [ 212 | { 213 | "data": { 214 | "text/plain": [ 215 | "[mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 5, 'max_depth': 1},\n", 216 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 10, 'max_depth': 1},\n", 217 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 25, 'max_depth': 1},\n", 218 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 50, 'max_depth': 1},\n", 219 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 5, 'max_depth': 2},\n", 220 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 10, 'max_depth': 2},\n", 221 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 25, 'max_depth': 2},\n", 222 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 50, 'max_depth': 2},\n", 223 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 5, 'max_depth': 3},\n", 224 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 10, 'max_depth': 3},\n", 225 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 25, 'max_depth': 3},\n", 226 | " mean: 0.49800, std: 0.00245, params: {'learning_rate': 9.9999999999999998e-17, 'n_estimators': 50, 'max_depth': 3},\n", 227 | " mean: 0.84100, std: 0.03515, params: {'learning_rate': 0.5, 'n_estimators': 5, 'max_depth': 1},\n", 228 | " mean: 0.87300, std: 0.03374, params: {'learning_rate': 0.5, 'n_estimators': 10, 'max_depth': 1},\n", 229 | " mean: 0.89200, std: 0.03375, params: {'learning_rate': 0.5, 'n_estimators': 25, 'max_depth': 1},\n", 230 | " mean: 0.90200, std: 0.03262, params: {'learning_rate': 0.5, 'n_estimators': 50, 'max_depth': 1},\n", 231 | " mean: 0.86400, std: 0.04665, params: {'learning_rate': 0.5, 'n_estimators': 5, 'max_depth': 2},\n", 232 | " mean: 0.89400, std: 0.04189, params: {'learning_rate': 0.5, 'n_estimators': 10, 'max_depth': 2},\n", 233 | " mean: 0.92200, std: 0.02584, params: {'learning_rate': 0.5, 'n_estimators': 25, 'max_depth': 2},\n", 234 | " mean: 0.92000, std: 0.02233, params: {'learning_rate': 0.5, 'n_estimators': 50, 'max_depth': 2},\n", 235 | " mean: 0.89700, std: 0.03904, params: {'learning_rate': 0.5, 'n_estimators': 5, 'max_depth': 3},\n", 236 | " mean: 0.92000, std: 0.02864, params: {'learning_rate': 0.5, 'n_estimators': 10, 'max_depth': 3},\n", 237 | " mean: 0.92300, std: 0.02193, params: {'learning_rate': 0.5, 'n_estimators': 25, 'max_depth': 3},\n", 238 | " mean: 0.92400, std: 0.02255, params: {'learning_rate': 0.5, 'n_estimators': 50, 'max_depth': 3},\n", 239 | " mean: 0.83500, std: 0.04939, params: {'learning_rate': 1.0, 'n_estimators': 5, 'max_depth': 1},\n", 240 | " mean: 0.86800, std: 0.03386, params: {'learning_rate': 1.0, 'n_estimators': 10, 'max_depth': 1},\n", 241 | " mean: 0.89500, std: 0.02720, params: {'learning_rate': 1.0, 'n_estimators': 25, 'max_depth': 1},\n", 242 | " mean: 0.90500, std: 0.02783, params: {'learning_rate': 1.0, 'n_estimators': 50, 'max_depth': 1},\n", 243 | " mean: 0.87800, std: 0.03342, params: {'learning_rate': 1.0, 'n_estimators': 5, 'max_depth': 2},\n", 244 | " mean: 0.90800, std: 0.04261, params: {'learning_rate': 1.0, 'n_estimators': 10, 'max_depth': 2},\n", 245 | " mean: 0.91000, std: 0.03632, params: {'learning_rate': 1.0, 'n_estimators': 25, 'max_depth': 2},\n", 246 | " mean: 0.91300, std: 0.02449, params: {'learning_rate': 1.0, 'n_estimators': 50, 'max_depth': 2},\n", 247 | " mean: 0.90500, std: 0.03112, params: {'learning_rate': 1.0, 'n_estimators': 5, 'max_depth': 3},\n", 248 | " mean: 0.91700, std: 0.02729, params: {'learning_rate': 1.0, 'n_estimators': 10, 'max_depth': 3},\n", 249 | " mean: 0.92700, std: 0.03342, params: {'learning_rate': 1.0, 'n_estimators': 25, 'max_depth': 3},\n", 250 | " mean: 0.93300, std: 0.02581, params: {'learning_rate': 1.0, 'n_estimators': 50, 'max_depth': 3}]" 251 | ] 252 | }, 253 | "execution_count": 8, 254 | "metadata": {}, 255 | "output_type": "execute_result" 256 | } 257 | ], 258 | "source": [ 259 | "bst_grid.grid_scores_" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "If there are many results, we can filter them manually to get the best combination" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 9, 272 | "metadata": { 273 | "collapsed": false 274 | }, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "Best accuracy obtained: 0.933\n", 281 | "Parameters:\n", 282 | "\tlearning_rate: 1.0\n", 283 | "\tn_estimators: 50\n", 284 | "\tmax_depth: 3\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "print(\"Best accuracy obtained: {0}\".format(bst_grid.best_score_))\n", 290 | "print(\"Parameters:\")\n", 291 | "for key, value in bst_grid.best_params_.items():\n", 292 | " print(\"\\t{}: {}\".format(key, value))" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "Looking for best parameters is an iterative process. You should start with coarsed-granularity and move to to more detailed values." 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "### Randomized Grid-Search\n", 307 | "When the number of parameters and their values is getting big traditional grid-search approach quickly becomes ineffective. A possible solution might be to randomly pick certain parameters from their distribution. While it's not an exhaustive solution, it's worth giving a shot.\n", 308 | "\n", 309 | "Create a parameters distribution dictionary:" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 10, 315 | "metadata": { 316 | "collapsed": false 317 | }, 318 | "outputs": [], 319 | "source": [ 320 | "params_dist_grid = {\n", 321 | " 'max_depth': [1, 2, 3, 4],\n", 322 | " 'gamma': [0, 0.5, 1],\n", 323 | " 'n_estimators': randint(1, 1001), # uniform discrete random distribution\n", 324 | " 'learning_rate': uniform(), # gaussian distribution\n", 325 | " 'subsample': uniform(), # gaussian distribution\n", 326 | " 'colsample_bytree': uniform() # gaussian distribution\n", 327 | "}" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "Initialize `RandomizedSearchCV` to **randomly pick 10 combinations of parameters**. With this approach you can easily control the number of tested models." 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 11, 340 | "metadata": { 341 | "collapsed": false 342 | }, 343 | "outputs": [], 344 | "source": [ 345 | "rs_grid = RandomizedSearchCV(\n", 346 | " estimator=XGBClassifier(**params_fixed, seed=seed),\n", 347 | " param_distributions=params_dist_grid,\n", 348 | " n_iter=10,\n", 349 | " cv=cv,\n", 350 | " scoring='accuracy',\n", 351 | " random_state=seed\n", 352 | ")" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "Fit the classifier:" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 12, 365 | "metadata": { 366 | "collapsed": false 367 | }, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "RandomizedSearchCV(cv=sklearn.cross_validation.StratifiedKFold(labels=[0 1 ..., 1 1], n_folds=10, shuffle=True, random_state=342),\n", 373 | " error_score='raise',\n", 374 | " estimator=XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,\n", 375 | " gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,\n", 376 | " min_child_weight=1, missing=None, n_estimators=100, nthread=-1,\n", 377 | " objective='binary:logistic', reg_alpha=0, reg_lambda=1,\n", 378 | " scale_pos_weight=1, seed=342, silent=1, subsample=1),\n", 379 | " fit_params={}, iid=True, n_iter=10, n_jobs=1,\n", 380 | " param_distributions={'subsample': , 'n_estimators': , 'gamma': [0, 0.5, 1], 'colsample_bytree': , 'learning_rate': , 'max_depth': [1, 2, 3, 4]},\n", 381 | " pre_dispatch='2*n_jobs', random_state=342, refit=True,\n", 382 | " scoring='accuracy', verbose=0)" 383 | ] 384 | }, 385 | "execution_count": 12, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "rs_grid.fit(X, y)" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "One more time take a look at choosen parameters and their accuracy score:" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 13, 404 | "metadata": { 405 | "collapsed": false 406 | }, 407 | "outputs": [ 408 | { 409 | "data": { 410 | "text/plain": [ 411 | "[mean: 0.80200, std: 0.02403, params: {'subsample': 0.11676744056370758, 'n_estimators': 492, 'gamma': 0, 'colsample_bytree': 0.065034396841929132, 'learning_rate': 0.82231421953113004, 'max_depth': 3},\n", 412 | " mean: 0.90800, std: 0.02534, params: {'subsample': 0.4325346125891868, 'n_estimators': 689, 'gamma': 1, 'colsample_bytree': 0.11848249237448605, 'learning_rate': 0.13214054942810016, 'max_depth': 1},\n", 413 | " mean: 0.86400, std: 0.03584, params: {'subsample': 0.15239319471904489, 'n_estimators': 392, 'gamma': 0, 'colsample_bytree': 0.37621772642449514, 'learning_rate': 0.61087022642994204, 'max_depth': 4},\n", 414 | " mean: 0.90100, std: 0.02794, params: {'subsample': 0.70993001900730734, 'n_estimators': 574, 'gamma': 1, 'colsample_bytree': 0.20992824607318106, 'learning_rate': 0.40898494335099522, 'max_depth': 1},\n", 415 | " mean: 0.91200, std: 0.02440, params: {'subsample': 0.93610608633544701, 'n_estimators': 116, 'gamma': 1, 'colsample_bytree': 0.22187963515640408, 'learning_rate': 0.82924717948414195, 'max_depth': 2},\n", 416 | " mean: 0.92900, std: 0.01577, params: {'subsample': 0.76526283302535481, 'n_estimators': 281, 'gamma': 0, 'colsample_bytree': 0.80580143163765727, 'learning_rate': 0.46363095388213049, 'max_depth': 4},\n", 417 | " mean: 0.89900, std: 0.03200, params: {'subsample': 0.1047221390561941, 'n_estimators': 563, 'gamma': 1, 'colsample_bytree': 0.4649668429588838, 'learning_rate': 0.0056355243866283988, 'max_depth': 4},\n", 418 | " mean: 0.89300, std: 0.02510, params: {'subsample': 0.70326840897694187, 'n_estimators': 918, 'gamma': 0.5, 'colsample_bytree': 0.50136727776346701, 'learning_rate': 0.32309692902992948, 'max_depth': 1},\n", 419 | " mean: 0.90300, std: 0.03573, params: {'subsample': 0.40219949856580106, 'n_estimators': 665, 'gamma': 1, 'colsample_bytree': 0.32232842572609355, 'learning_rate': 0.87857246352479834, 'max_depth': 4},\n", 420 | " mean: 0.88900, std: 0.02604, params: {'subsample': 0.18284845802969663, 'n_estimators': 771, 'gamma': 1, 'colsample_bytree': 0.65705813574097693, 'learning_rate': 0.44206350002617856, 'max_depth': 3}]" 421 | ] 422 | }, 423 | "execution_count": 13, 424 | "metadata": {}, 425 | "output_type": "execute_result" 426 | } 427 | ], 428 | "source": [ 429 | "rs_grid.grid_scores_" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "There are also some handy properties allowing to quickly analyze best estimator, parameters and obtained score:" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 14, 442 | "metadata": { 443 | "collapsed": false 444 | }, 445 | "outputs": [ 446 | { 447 | "data": { 448 | "text/plain": [ 449 | "XGBClassifier(base_score=0.5, colsample_bylevel=1,\n", 450 | " colsample_bytree=0.80580143163765727, gamma=0,\n", 451 | " learning_rate=0.46363095388213049, max_delta_step=0, max_depth=4,\n", 452 | " min_child_weight=1, missing=None, n_estimators=281, nthread=-1,\n", 453 | " objective='binary:logistic', reg_alpha=0, reg_lambda=1,\n", 454 | " scale_pos_weight=1, seed=342, silent=1,\n", 455 | " subsample=0.76526283302535481)" 456 | ] 457 | }, 458 | "execution_count": 14, 459 | "metadata": {}, 460 | "output_type": "execute_result" 461 | } 462 | ], 463 | "source": [ 464 | "rs_grid.best_estimator_" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 15, 470 | "metadata": { 471 | "collapsed": false 472 | }, 473 | "outputs": [ 474 | { 475 | "data": { 476 | "text/plain": [ 477 | "{'colsample_bytree': 0.80580143163765727,\n", 478 | " 'gamma': 0,\n", 479 | " 'learning_rate': 0.46363095388213049,\n", 480 | " 'max_depth': 4,\n", 481 | " 'n_estimators': 281,\n", 482 | " 'subsample': 0.76526283302535481}" 483 | ] 484 | }, 485 | "execution_count": 15, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | } 489 | ], 490 | "source": [ 491 | "rs_grid.best_params_" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 16, 497 | "metadata": { 498 | "collapsed": false 499 | }, 500 | "outputs": [ 501 | { 502 | "data": { 503 | "text/plain": [ 504 | "0.92900000000000005" 505 | ] 506 | }, 507 | "execution_count": 16, 508 | "metadata": {}, 509 | "output_type": "execute_result" 510 | } 511 | ], 512 | "source": [ 513 | "rs_grid.best_score_" 514 | ] 515 | } 516 | ], 517 | "metadata": { 518 | "kernelspec": { 519 | "display_name": "Python 3", 520 | "language": "python", 521 | "name": "python3" 522 | }, 523 | "language_info": { 524 | "codemirror_mode": { 525 | "name": "ipython", 526 | "version": 3 527 | }, 528 | "file_extension": ".py", 529 | "mimetype": "text/x-python", 530 | "name": "python", 531 | "nbconvert_exporter": "python", 532 | "pygments_lexer": "ipython3", 533 | "version": "3.5.2" 534 | } 535 | }, 536 | "nbformat": 4, 537 | "nbformat_minor": 0 538 | } 539 | -------------------------------------------------------------------------------- /notebooks/3. Going deeper/3.4 Evaluate results.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Evaluate results\n", 15 | "In this notebook you will see how measure the quality of the algorithm performance.\n", 16 | "\n", 17 | "**You will learn how to:**\n", 18 | "- use predefined evaluation metrics,\n", 19 | "- write your own evaluation metrics,\n", 20 | "- use early stopping feature,\n", 21 | "- cross validate results" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Prepare data\n", 29 | "Begin with loading all required libraries:" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "import numpy as np\n", 41 | "import xgboost as xgb\n", 42 | "\n", 43 | "from pprint import pprint\n", 44 | "\n", 45 | "# reproducibility\n", 46 | "seed = 123\n", 47 | "np.random.seed(seed)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "Load agaricus dataset from file:" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": { 61 | "collapsed": true 62 | }, 63 | "outputs": [], 64 | "source": [ 65 | "# load Agaricus data\n", 66 | "dtrain = xgb.DMatrix('../data/agaricus.txt.train')\n", 67 | "dtest = xgb.DMatrix('../data/agaricus.txt.test')" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "Specify training parameters - we are going to use 5 decision tree stumps with average learning rate." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": { 81 | "collapsed": true 82 | }, 83 | "outputs": [], 84 | "source": [ 85 | "# specify general training parameters\n", 86 | "params = {\n", 87 | " 'objective':'binary:logistic',\n", 88 | " 'max_depth':1,\n", 89 | " 'silent':1,\n", 90 | " 'eta':0.5\n", 91 | "}\n", 92 | "\n", 93 | "num_rounds = 5" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "Before training the model let's also specify `watchlist` array to observe it's performance on the both datasets." 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": { 107 | "collapsed": true 108 | }, 109 | "outputs": [], 110 | "source": [ 111 | "watchlist = [(dtest,'test'), (dtrain,'train')]" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "### Using predefined evaluation metrics\n", 119 | "\n", 120 | "#### What is already available?\n", 121 | "There are already [some](https://github.com/dmlc/xgboost/blob/master/doc/parameter.md) predefined metrics availabe. You can use them as the input for the `eval_metric` parameter while training the model.\n", 122 | "\n", 123 | "- `rmse` - [root mean square error](https://www.wikiwand.com/en/Root-mean-square_deviation),\n", 124 | "- `mae` - [mean absolute error](https://en.wikipedia.org/wiki/Mean_absolute_error?oldformat=true),\n", 125 | "- `logloss` - [negative log-likelihood](https://en.wikipedia.org/wiki/Likelihood_function?oldformat=true)\n", 126 | "- `error` - binary classification error rate. It is calculated as `#(wrong cases)/#(all cases)`. Treat predicted values with probability $p > 0.5$ as positive,\n", 127 | "- `merror` - multiclass classification error rate. It is calculated as `#(wrong cases)/#(all cases)`,\n", 128 | "- `auc` - [area under curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic?oldformat=true),\n", 129 | "- `ndcg` - [normalized discounted cumulative gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain?oldformat=true),\n", 130 | "- `map` - [mean average precision](https://en.wikipedia.org/wiki/Information_retrieval?oldformat=true)\n", 131 | "\n", 132 | "By default an `error` metric will be used." 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "[0]\ttest-error:0.11049\ttrain-error:0.113926\n", 147 | "[1]\ttest-error:0.11049\ttrain-error:0.113926\n", 148 | "[2]\ttest-error:0.03352\ttrain-error:0.030401\n", 149 | "[3]\ttest-error:0.027312\ttrain-error:0.021495\n", 150 | "[4]\ttest-error:0.031037\ttrain-error:0.025487\n" 151 | ] 152 | } 153 | ], 154 | "source": [ 155 | "bst = xgb.train(params, dtrain, num_rounds, watchlist)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "To change is simply specify the `eval_metric` argument to the `params` dictionary." 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 6, 168 | "metadata": { 169 | "collapsed": false 170 | }, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "[0]\ttest-logloss:0.457887\ttrain-logloss:0.460108\n", 177 | "[1]\ttest-logloss:0.383911\ttrain-logloss:0.378728\n", 178 | "[2]\ttest-logloss:0.312678\ttrain-logloss:0.308061\n", 179 | "[3]\ttest-logloss:0.26912\ttrain-logloss:0.26139\n", 180 | "[4]\ttest-logloss:0.239746\ttrain-logloss:0.232174\n" 181 | ] 182 | } 183 | ], 184 | "source": [ 185 | "params['eval_metric'] = 'logloss'\n", 186 | "bst = xgb.train(params, dtrain, num_rounds, watchlist)" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "You can also use multiple evaluation metrics at one time" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 7, 199 | "metadata": { 200 | "collapsed": false 201 | }, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "[0]\ttest-logloss:0.457887\ttest-auc:0.892138\ttrain-logloss:0.460108\ttrain-auc:0.888997\n", 208 | "[1]\ttest-logloss:0.383911\ttest-auc:0.938901\ttrain-logloss:0.378728\ttrain-auc:0.942881\n", 209 | "[2]\ttest-logloss:0.312678\ttest-auc:0.976157\ttrain-logloss:0.308061\ttrain-auc:0.981415\n", 210 | "[3]\ttest-logloss:0.26912\ttest-auc:0.979685\ttrain-logloss:0.26139\ttrain-auc:0.985158\n", 211 | "[4]\ttest-logloss:0.239746\ttest-auc:0.9785\ttrain-logloss:0.232174\ttrain-auc:0.983744\n" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "params['eval_metric'] = ['logloss', 'auc']\n", 217 | "bst = xgb.train(params, dtrain, num_rounds, watchlist)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "### Creating custom evaluation metric\n", 225 | "\n", 226 | "In order to create our own evaluation metric, the only thing needed to do is to create a method taking two arguments - predicted probabilities and `DMatrix` object holding training data.\n", 227 | "\n", 228 | "In this example our classification metric will simply count the number of misclassified examples assuming that classes with $p> 0.5$ are positive. You can change this threshold if you want more certainty. \n", 229 | "\n", 230 | "The algorithm is getting better when the number of misclassified examples is getting lower. Remember to also set the argument `maximize=False` while training." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 8, 236 | "metadata": { 237 | "collapsed": true 238 | }, 239 | "outputs": [], 240 | "source": [ 241 | "# custom evaluation metric\n", 242 | "def misclassified(pred_probs, dtrain):\n", 243 | " labels = dtrain.get_label() # obtain true labels\n", 244 | " preds = pred_probs > 0.5 # obtain predicted values\n", 245 | " return 'misclassified', np.sum(labels != preds)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 9, 251 | "metadata": { 252 | "collapsed": false 253 | }, 254 | "outputs": [ 255 | { 256 | "name": "stdout", 257 | "output_type": "stream", 258 | "text": [ 259 | "[0]\ttest-misclassified:178\ttrain-misclassified:742\n", 260 | "[1]\ttest-misclassified:178\ttrain-misclassified:742\n", 261 | "[2]\ttest-misclassified:54\ttrain-misclassified:198\n", 262 | "[3]\ttest-misclassified:44\ttrain-misclassified:140\n", 263 | "[4]\ttest-misclassified:50\ttrain-misclassified:166\n" 264 | ] 265 | } 266 | ], 267 | "source": [ 268 | "bst = xgb.train(params, dtrain, num_rounds, watchlist, feval=misclassified, maximize=False)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "You can see that even though the `params` dictionary is holding `eval_metric` key these values are being ignored and overwritten by `feval`." 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "### Extracting the evaluation results\n", 283 | "You can get evaluation scores by declaring a dictionary for holding values and passing it as a parameter for `evals_result` argument." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 10, 289 | "metadata": { 290 | "collapsed": false 291 | }, 292 | "outputs": [ 293 | { 294 | "name": "stdout", 295 | "output_type": "stream", 296 | "text": [ 297 | "[0]\ttest-misclassified:178\ttrain-misclassified:742\n", 298 | "[1]\ttest-misclassified:178\ttrain-misclassified:742\n", 299 | "[2]\ttest-misclassified:54\ttrain-misclassified:198\n", 300 | "[3]\ttest-misclassified:44\ttrain-misclassified:140\n", 301 | "[4]\ttest-misclassified:50\ttrain-misclassified:166\n" 302 | ] 303 | } 304 | ], 305 | "source": [ 306 | "evals_result = {}\n", 307 | "bst = xgb.train(params, dtrain, num_rounds, watchlist, feval=misclassified, maximize=False, evals_result=evals_result)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "Now you can reuse these scores (i.e. for plotting)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": 11, 320 | "metadata": { 321 | "collapsed": false 322 | }, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "{'test': {'misclassified': [178.0, 178.0, 54.0, 44.0, 50.0]},\n", 329 | " 'train': {'misclassified': [742.0, 742.0, 198.0, 140.0, 166.0]}}\n" 330 | ] 331 | } 332 | ], 333 | "source": [ 334 | "pprint(evals_result)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "### Early stopping\n", 342 | "There is a nice optimization trick when fitting multiple trees. \n", 343 | "\n", 344 | "You can train the model until the validation score **stops** improving. Validation error needs to decrease at least every `early_stopping_rounds` to continue training. This approach results in simpler model, because the lowest number of trees will be found (simplicity).\n", 345 | "\n", 346 | "In the following example a total number of 1500 trees is to be created, but we are telling it to stop if the validation score does not improve for last ten iterations." 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 12, 352 | "metadata": { 353 | "collapsed": false 354 | }, 355 | "outputs": [ 356 | { 357 | "name": "stdout", 358 | "output_type": "stream", 359 | "text": [ 360 | "[0]\ttest-error:0.11049\ttrain-error:0.113926\n", 361 | "Multiple eval metrics have been passed: 'train-error' will be used for early stopping.\n", 362 | "\n", 363 | "Will train until train-error hasn't improved in 10 rounds.\n", 364 | "[1]\ttest-error:0.11049\ttrain-error:0.113926\n", 365 | "[2]\ttest-error:0.03352\ttrain-error:0.030401\n", 366 | "[3]\ttest-error:0.027312\ttrain-error:0.021495\n", 367 | "[4]\ttest-error:0.031037\ttrain-error:0.025487\n", 368 | "[5]\ttest-error:0.019243\ttrain-error:0.01735\n", 369 | "[6]\ttest-error:0.019243\ttrain-error:0.01735\n", 370 | "[7]\ttest-error:0.015518\ttrain-error:0.013358\n", 371 | "[8]\ttest-error:0.015518\ttrain-error:0.013358\n", 372 | "[9]\ttest-error:0.009311\ttrain-error:0.007523\n", 373 | "[10]\ttest-error:0.015518\ttrain-error:0.013358\n", 374 | "[11]\ttest-error:0.019243\ttrain-error:0.01735\n", 375 | "[12]\ttest-error:0.009311\ttrain-error:0.007523\n", 376 | "[13]\ttest-error:0.001862\ttrain-error:0.001996\n", 377 | "[14]\ttest-error:0.005587\ttrain-error:0.005988\n", 378 | "[15]\ttest-error:0.005587\ttrain-error:0.005988\n", 379 | "[16]\ttest-error:0.005587\ttrain-error:0.005988\n", 380 | "[17]\ttest-error:0.005587\ttrain-error:0.005988\n", 381 | "[18]\ttest-error:0.005587\ttrain-error:0.005988\n", 382 | "[19]\ttest-error:0.005587\ttrain-error:0.005988\n", 383 | "[20]\ttest-error:0.005587\ttrain-error:0.005988\n", 384 | "[21]\ttest-error:0.005587\ttrain-error:0.005988\n", 385 | "[22]\ttest-error:0.001862\ttrain-error:0.001996\n", 386 | "[23]\ttest-error:0.001862\ttrain-error:0.001996\n", 387 | "Stopping. Best iteration:\n", 388 | "[13]\ttest-error:0.001862\ttrain-error:0.001996\n", 389 | "\n" 390 | ] 391 | } 392 | ], 393 | "source": [ 394 | "params['eval_metric'] = 'error'\n", 395 | "num_rounds = 1500\n", 396 | "\n", 397 | "bst = xgb.train(params, dtrain, num_rounds, watchlist, early_stopping_rounds=10)" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "When using `early_stopping_rounds` parameter resulting model will have 3 additional fields - `bst.best_score`, `bst.best_iteration` and `bst.best_ntree_limit`." 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 13, 410 | "metadata": { 411 | "collapsed": false 412 | }, 413 | "outputs": [ 414 | { 415 | "name": "stdout", 416 | "output_type": "stream", 417 | "text": [ 418 | "Booster best train score: 0.001996\n", 419 | "Booster best iteration: 13\n", 420 | "Booster best number of trees limit: 14\n" 421 | ] 422 | } 423 | ], 424 | "source": [ 425 | "print(\"Booster best train score: {}\".format(bst.best_score))\n", 426 | "print(\"Booster best iteration: {}\".format(bst.best_iteration))\n", 427 | "print(\"Booster best number of trees limit: {}\".format(bst.best_ntree_limit))" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "Also keep in mind that `train()` will return a model from the last iteration, not the best one." 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "### Cross validating results\n", 442 | "Native package provides an option for cross-validating results (but not as sophisticated as Sklearn package). The next input shows a basic execution. Notice that we are passing only single `DMatrix`, so it would be good to merge train and test into one object to have more training samples." 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 14, 448 | "metadata": { 449 | "collapsed": false 450 | }, 451 | "outputs": [ 452 | { 453 | "data": { 454 | "text/html": [ 455 | "
\n", 456 | "\n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | "
test-error-meantest-error-stdtrain-error-meantrain-error-std
00.1138250.0131860.1138250.001465
10.1138250.0131860.1138250.001465
20.0304150.0056980.0304150.000633
30.0215050.0052770.0215050.000586
40.0254990.0054610.0254990.000607
50.0207370.0076270.0196960.003491
60.0173580.0033690.0173580.000374
70.0153610.0036990.0144740.001923
80.0133640.0037660.0133640.000419
90.0125960.0047000.0117420.003820
\n", 539 | "
" 540 | ], 541 | "text/plain": [ 542 | " test-error-mean test-error-std train-error-mean train-error-std\n", 543 | "0 0.113825 0.013186 0.113825 0.001465\n", 544 | "1 0.113825 0.013186 0.113825 0.001465\n", 545 | "2 0.030415 0.005698 0.030415 0.000633\n", 546 | "3 0.021505 0.005277 0.021505 0.000586\n", 547 | "4 0.025499 0.005461 0.025499 0.000607\n", 548 | "5 0.020737 0.007627 0.019696 0.003491\n", 549 | "6 0.017358 0.003369 0.017358 0.000374\n", 550 | "7 0.015361 0.003699 0.014474 0.001923\n", 551 | "8 0.013364 0.003766 0.013364 0.000419\n", 552 | "9 0.012596 0.004700 0.011742 0.003820" 553 | ] 554 | }, 555 | "execution_count": 14, 556 | "metadata": {}, 557 | "output_type": "execute_result" 558 | } 559 | ], 560 | "source": [ 561 | "num_rounds = 10 # how many estimators\n", 562 | "hist = xgb.cv(params, dtrain, num_rounds, nfold=10, metrics={'error'}, seed=seed)\n", 563 | "hist" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": {}, 569 | "source": [ 570 | "Notice that:\n", 571 | "\n", 572 | "- by default we get a pandas data frame object (can be changed with `as_pandas` param),\n", 573 | "- metrics are passed as an argument (muliple values are allowed),\n", 574 | "- we can use own evaluation metrics (param `feval` and `maximize`),\n", 575 | "- we can use early stopping feature (param `early_stopping_rounds`)" 576 | ] 577 | } 578 | ], 579 | "metadata": { 580 | "kernelspec": { 581 | "display_name": "Python 3", 582 | "language": "python", 583 | "name": "python3" 584 | }, 585 | "language_info": { 586 | "codemirror_mode": { 587 | "name": "ipython", 588 | "version": 3 589 | }, 590 | "file_extension": ".py", 591 | "mimetype": "text/x-python", 592 | "name": "python", 593 | "nbconvert_exporter": "python", 594 | "pygments_lexer": "ipython3", 595 | "version": "3.5.2" 596 | } 597 | }, 598 | "nbformat": 4, 599 | "nbformat_minor": 0 600 | } 601 | -------------------------------------------------------------------------------- /notebooks/3. Going deeper/3.5 Deal with missing values.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Deal with missing values\n", 15 | "The following notebook demonstrate XGBoost resilience to missing values. Two approaches - native interface, and Sklearn wrapper were tested against missing datasets.\n", 16 | "\n", 17 | "Missing value is commonly seen in real-world data sets. Handling missing values has no rule to apply to all cases, since there could be various reasons for the values to be missing.\n", 18 | "\n", 19 | "**What you will learn**:\n", 20 | "-
how to prepare data with missing elements,\n", 21 | "- handling missing values in native interface,\n", 22 | "- handling missing values in Sklearn interface" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "### Prepare data\n", 30 | "First begin with loading all libraries and assuring reproducibility" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": { 37 | "collapsed": true 38 | }, 39 | "outputs": [], 40 | "source": [ 41 | "import numpy as np\n", 42 | "import xgboost as xgb\n", 43 | "\n", 44 | "from xgboost.sklearn import XGBClassifier\n", 45 | "\n", 46 | "from sklearn.cross_validation import cross_val_score\n", 47 | "\n", 48 | "# reproducibility\n", 49 | "seed = 123" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "Let's prepare a valid dataset with no missing values. There are 10 samples, each one will contain 5 randomly generated features and will be assigned to one of two classes." 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 2, 62 | "metadata": { 63 | "collapsed": false 64 | }, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "array([[ 0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897],\n", 70 | " [ 0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752],\n", 71 | " [ 0.34317802, 0.72904971, 0.43857224, 0.0596779 , 0.39804426],\n", 72 | " [ 0.73799541, 0.18249173, 0.17545176, 0.53155137, 0.53182759],\n", 73 | " [ 0.63440096, 0.84943179, 0.72445532, 0.61102351, 0.72244338],\n", 74 | " [ 0.32295891, 0.36178866, 0.22826323, 0.29371405, 0.63097612],\n", 75 | " [ 0.09210494, 0.43370117, 0.43086276, 0.4936851 , 0.42583029],\n", 76 | " [ 0.31226122, 0.42635131, 0.89338916, 0.94416002, 0.50183668],\n", 77 | " [ 0.62395295, 0.1156184 , 0.31728548, 0.41482621, 0.86630916],\n", 78 | " [ 0.25045537, 0.48303426, 0.98555979, 0.51948512, 0.61289453]])" 79 | ] 80 | }, 81 | "execution_count": 2, 82 | "metadata": {}, 83 | "output_type": "execute_result" 84 | } 85 | ], 86 | "source": [ 87 | "# create valid dataset\n", 88 | "np.random.seed(seed)\n", 89 | "\n", 90 | "data_v = np.random.rand(10,5) # 10 entities, each contains 5 features\n", 91 | "data_v" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "In the second example we are going to add some missing values" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 3, 104 | "metadata": { 105 | "collapsed": false 106 | }, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/plain": [ 111 | "array([[ 0.69646919, nan, nan, 0.55131477, 0.71946897],\n", 112 | " [ nan, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752],\n", 113 | " [ 0.34317802, 0.72904971, 0.43857224, nan, 0.39804426],\n", 114 | " [ 0.73799541, 0.18249173, 0.17545176, 0.53155137, 0.53182759],\n", 115 | " [ 0.63440096, 0.84943179, 0.72445532, 0.61102351, nan],\n", 116 | " [ 0.32295891, 0.36178866, 0.22826323, 0.29371405, 0.63097612],\n", 117 | " [ 0.09210494, 0.43370117, 0.43086276, 0.4936851 , 0.42583029],\n", 118 | " [ 0.31226122, 0.42635131, nan, 0.94416002, 0.50183668],\n", 119 | " [ 0.62395295, 0.1156184 , 0.31728548, 0.41482621, 0.86630916],\n", 120 | " [ 0.25045537, nan, 0.98555979, 0.51948512, 0.61289453]])" 121 | ] 122 | }, 123 | "execution_count": 3, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "# add some missing values\n", 130 | "data_m = np.copy(data_v)\n", 131 | "\n", 132 | "data_m[2, 3] = np.nan\n", 133 | "data_m[0, 1] = np.nan\n", 134 | "data_m[0, 2] = np.nan\n", 135 | "data_m[1, 0] = np.nan\n", 136 | "data_m[4, 4] = np.nan\n", 137 | "data_m[7, 2] = np.nan\n", 138 | "data_m[9, 1] = np.nan\n", 139 | "\n", 140 | "data_m" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "Also generate target variables. Each sample will be assigned to one of two classes - so we are dealing with binary classification problem" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 4, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "array([0, 1, 0, 0, 0, 0, 0, 1, 1, 0])" 161 | ] 162 | }, 163 | "execution_count": 4, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "np.random.seed(seed)\n", 170 | "\n", 171 | "label = np.random.randint(2, size=10) # binary target\n", 172 | "label" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "### Native interface\n", 180 | "In this case we will check how does the native interface handles missing data. Begin with specifing default parameters." 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 5, 186 | "metadata": { 187 | "collapsed": true 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "# specify general training parameters\n", 192 | "params = {\n", 193 | " 'objective':'binary:logistic',\n", 194 | " 'max_depth':1,\n", 195 | " 'silent':1,\n", 196 | " 'eta':0.5\n", 197 | "}\n", 198 | "\n", 199 | "num_rounds = 5" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "In the experiment first we will create a valid `DMatrix` (with all values), see if it works ok, and then repeat the process with lacking one." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 6, 212 | "metadata": { 213 | "collapsed": true 214 | }, 215 | "outputs": [], 216 | "source": [ 217 | "dtrain_v = xgb.DMatrix(data_v, label=label)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "Cross-validate results" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 7, 230 | "metadata": { 231 | "collapsed": false 232 | }, 233 | "outputs": [ 234 | { 235 | "data": { 236 | "text/html": [ 237 | "
\n", 238 | "\n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | "
test-error-meantest-error-stdtrain-error-meantrain-error-std
00.33333300.3333330
10.33333300.3333330
20.33333300.3333330
30.33333300.3333330
40.33333300.3333330
\n", 286 | "
" 287 | ], 288 | "text/plain": [ 289 | " test-error-mean test-error-std train-error-mean train-error-std\n", 290 | "0 0.333333 0 0.333333 0\n", 291 | "1 0.333333 0 0.333333 0\n", 292 | "2 0.333333 0 0.333333 0\n", 293 | "3 0.333333 0 0.333333 0\n", 294 | "4 0.333333 0 0.333333 0" 295 | ] 296 | }, 297 | "execution_count": 7, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "xgb.cv(params, dtrain_v, num_rounds, seed=seed)" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "The output obviously doesn't make sense, because the data is completely random.\n", 311 | "\n", 312 | "When creating `DMatrix` holding missing values we have to explicitly tell what denotes that it's missing. Sometimes it might be `0`, `999` or others. In our case it's Numpy's `NAN`. Add `missing` argument to `DMatrix` constructor to handle it." 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 8, 318 | "metadata": { 319 | "collapsed": false 320 | }, 321 | "outputs": [], 322 | "source": [ 323 | "dtrain_m = xgb.DMatrix(data_m, label=label, missing=np.nan)" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "Cross-validate results:" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 9, 336 | "metadata": { 337 | "collapsed": false 338 | }, 339 | "outputs": [ 340 | { 341 | "data": { 342 | "text/html": [ 343 | "
\n", 344 | "\n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | "
test-error-meantest-error-stdtrain-error-meantrain-error-std
00.33333300.3333330
10.33333300.3333330
20.33333300.3333330
30.33333300.3333330
40.33333300.3333330
\n", 392 | "
" 393 | ], 394 | "text/plain": [ 395 | " test-error-mean test-error-std train-error-mean train-error-std\n", 396 | "0 0.333333 0 0.333333 0\n", 397 | "1 0.333333 0 0.333333 0\n", 398 | "2 0.333333 0 0.333333 0\n", 399 | "3 0.333333 0 0.333333 0\n", 400 | "4 0.333333 0 0.333333 0" 401 | ] 402 | }, 403 | "execution_count": 9, 404 | "metadata": {}, 405 | "output_type": "execute_result" 406 | } 407 | ], 408 | "source": [ 409 | "xgb.cv(params, dtrain_m, num_rounds, seed=seed)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "It looks like the algorithm works also with missing values.\n", 417 | "\n", 418 | "In XGBoost chooses a soft way to handle missing values. \n", 419 | "\n", 420 | "When using a feature with missing values to do splitting, XGBoost will assign a direction to the missing values instead of a numerical value. \n", 421 | "\n", 422 | "Specifically, XGBoost guides all the data points with missing values to the left and right respectively, then choose the direction with a higher gain with regard to the objective." 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### Sklearn wrapper
\n", 430 | "The following section shows how to validate the same behaviour using Sklearn interface.\n", 431 | "\n", 432 | "Begin with defining parameters and creating an estimator object." 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 10, 438 | "metadata": { 439 | "collapsed": true 440 | }, 441 | "outputs": [], 442 | "source": [ 443 | "params = {\n", 444 | " 'objective': 'binary:logistic',\n", 445 | " 'max_depth': 1,\n", 446 | " 'learning_rate': 0.5,\n", 447 | " 'silent': 1.0,\n", 448 | " 'n_estimators': 5\n", 449 | "}" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 11, 455 | "metadata": { 456 | "collapsed": false 457 | }, 458 | "outputs": [ 459 | { 460 | "data": { 461 | "text/plain": [ 462 | "XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,\n", 463 | " gamma=0, learning_rate=0.5, max_delta_step=0, max_depth=1,\n", 464 | " min_child_weight=1, missing=None, n_estimators=5, nthread=-1,\n", 465 | " objective='binary:logistic', reg_alpha=0, reg_lambda=1,\n", 466 | " scale_pos_weight=1, seed=0, silent=1.0, subsample=1)" 467 | ] 468 | }, 469 | "execution_count": 11, 470 | "metadata": {}, 471 | "output_type": "execute_result" 472 | } 473 | ], 474 | "source": [ 475 | "clf = XGBClassifier(**params)\n", 476 | "clf" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "Cross-validate results with full dataset. Because we have only 10 samples, we will perform 2-fold CV." 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 12, 489 | "metadata": { 490 | "collapsed": false 491 | }, 492 | "outputs": [ 493 | { 494 | "data": { 495 | "text/plain": [ 496 | "array([ 0.66666667, 0.75 ])" 497 | ] 498 | }, 499 | "execution_count": 12, 500 | "metadata": {}, 501 | "output_type": "execute_result" 502 | } 503 | ], 504 | "source": [ 505 | "cross_val_score(clf, data_v, label, cv=2, scoring='accuracy')" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "Some score was obtained, we won't dig into it's interpretation.\n", 513 | "\n", 514 | "See if things work also with missing values" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 13, 520 | "metadata": { 521 | "collapsed": false 522 | }, 523 | "outputs": [ 524 | { 525 | "data": { 526 | "text/plain": [ 527 | "array([ 0.66666667, 0.75 ])" 528 | ] 529 | }, 530 | "execution_count": 13, 531 | "metadata": {}, 532 | "output_type": "execute_result" 533 | } 534 | ], 535 | "source": [ 536 | "cross_val_score(clf, data_m, label, cv=2, scoring='accuracy')" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "Both methods works with missing datasets. The Sklearn package by default handles data with `np.nan` as missing (so you will need additional pre-precessing if using different convention)." 544 | ] 545 | } 546 | ], 547 | "metadata": { 548 | "kernelspec": { 549 | "display_name": "Python 3", 550 | "language": "python", 551 | "name": "python3" 552 | }, 553 | "language_info": { 554 | "codemirror_mode": { 555 | "name": "ipython", 556 | "version": 3 557 | }, 558 | "file_extension": ".py", 559 | "mimetype": "text/x-python", 560 | "name": "python", 561 | "nbconvert_exporter": "python", 562 | "pygments_lexer": "ipython3", 563 | "version": "3.5.2" 564 | } 565 | }, 566 | "nbformat": 4, 567 | "nbformat_minor": 0 568 | } 569 | -------------------------------------------------------------------------------- /notebooks/3. Going deeper/3.6 Handle Imbalanced Datasets.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Handle Imbalanced Dataset\n", 15 | "\n", 16 | "There are plenty of examples in real-world problems that deals with imbalanced target classes. Imagine medical data where there are only a few positive instances out of thousands of negatie (normal) ones. Another example might be analyzing fraud transaction, in which the actual frauds represent only a fraction of all available data.\n", 17 | "\n", 18 | "> Imbalanced data refers to a classification problems where the classes are not equally distributed.\n", 19 | "\n", 20 | "You can read good introduction about tackling imbalanced datasets [here](http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/).\n", 21 | "\n", 22 | "**What you will learn**:\n", 23 | "- what are the common approaches when dealing with class imbalance,\n", 24 | "- what is the *accuracy paradox*,\n", 25 | "- how to manually denote which samples are more important than others,\n", 26 | "- how to use scale_pos_weight parameter to do it automatically" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "### General advices\n", 34 | "These are some common tactics when approaching imbalanced datasets:\n", 35 | "\n", 36 | "- collect more data,\n", 37 | "- use better evaluation metric (that notices mistakes - ie. AUC, F1, Kappa, ...),\n", 38 | "- try oversampling minority class or undersampling majority class,\n", 39 | "- generate artificial samples of minority class (ie. SMOTE algorithm)\n", 40 | "\n", 41 | "In XGBoost you can try to:\n", 42 | "- make sure that parameter `min_child_weight` is small (because leaf nodes can have smaller size groups), it is set to `min_child_weight=1` by default,\n", 43 | "- assign more weights to specific samples while initalizing `DMatrix`,\n", 44 | "- control the balance of positive and negative weights using `set_pos_weight` parameter,\n", 45 | "- use AUC for evaluation" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "### Prepare data\n", 53 | "Let's test it by generating an artificial dataset to perform some experiments. But first load essential libraries that will be used throughout the lecture." 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 1, 59 | "metadata": { 60 | "collapsed": true 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "import numpy as np\n", 65 | "import pandas as pd\n", 66 | "\n", 67 | "import xgboost as xgb\n", 68 | "\n", 69 | "from sklearn.datasets import make_classification\n", 70 | "from sklearn.metrics import accuracy_score, precision_score, recall_score\n", 71 | "from sklearn.cross_validation import train_test_split\n", 72 | "\n", 73 | "# reproducibility\n", 74 | "seed = 123" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "We'll use a function to generate dataset for binary classification. To assure that it's imbalanced use `weights` parameter. In this case there will be 200 samples each described by 5 features, but only 10% of them (about 20 samples) will be positive. That makes the problem harder." 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 2, 87 | "metadata": { 88 | "collapsed": false 89 | }, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "There are 21 positive instances.\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "X, y = make_classification(\n", 101 | " n_samples=200,\n", 102 | " n_features=5,\n", 103 | " n_informative=3,\n", 104 | " n_classes=2,\n", 105 | " weights=[.9, .1],\n", 106 | " shuffle=True,\n", 107 | " random_state=seed\n", 108 | ")\n", 109 | "\n", 110 | "print('There are {} positive instances.'.format(y.sum()))" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Divide created data into train and test. Remember so that both datasets should be similiar in terms of distribution, so they need stratification." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 3, 123 | "metadata": { 124 | "collapsed": false 125 | }, 126 | "outputs": [ 127 | { 128 | "name": "stdout", 129 | "output_type": "stream", 130 | "text": [ 131 | "Total number of postivie train instances: 14\n", 132 | "Total number of positive test instances: 7\n" 133 | ] 134 | } 135 | ], 136 | "source": [ 137 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y, random_state=seed)\n", 138 | "\n", 139 | "print('Total number of postivie train instances: {}'.format(y_train.sum()))\n", 140 | "print('Total number of positive test instances: {}'.format(y_test.sum()))" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "### Baseline model\n", 148 | "In this approach try to completely ignore the fact that classed are imbalanced and see how it will perform. Create `DMatrix` for train and test data." 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 4, 154 | "metadata": { 155 | "collapsed": true 156 | }, 157 | "outputs": [], 158 | "source": [ 159 | "dtrain = xgb.DMatrix(X_train, label=y_train)\n", 160 | "dtest = xgb.DMatrix(X_test)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "Assume that we will create 15 decision tree stumps, solving binary classification problem, where each next one will be train very aggressively.\n", 168 | "\n", 169 | "These parameters will also be used in consecutive examples." 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 5, 175 | "metadata": { 176 | "collapsed": false 177 | }, 178 | "outputs": [], 179 | "source": [ 180 | "params = {\n", 181 | " 'objective':'binary:logistic',\n", 182 | " 'max_depth':1,\n", 183 | " 'silent':1,\n", 184 | " 'eta':1\n", 185 | "}\n", 186 | "\n", 187 | "num_rounds = 15" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "Train the booster and make predictions." 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 6, 200 | "metadata": { 201 | "collapsed": false 202 | }, 203 | "outputs": [], 204 | "source": [ 205 | "bst = xgb.train(params, dtrain, num_rounds)\n", 206 | "y_test_preds = (bst.predict(dtest) > 0.5).astype('int')" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Let's see how the confusion matrix looks like." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 7, 219 | "metadata": { 220 | "collapsed": false, 221 | "scrolled": false 222 | }, 223 | "outputs": [ 224 | { 225 | "data": { 226 | "text/html": [ 227 | "
\n", 228 | "\n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | "
Predicted01All
Actual
059059
1437
All63366
\n", 264 | "
" 265 | ], 266 | "text/plain": [ 267 | "Predicted 0 1 All\n", 268 | "Actual \n", 269 | "0 59 0 59\n", 270 | "1 4 3 7\n", 271 | "All 63 3 66" 272 | ] 273 | }, 274 | "execution_count": 7, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "pd.crosstab(\n", 281 | " pd.Series(y_test, name='Actual'),\n", 282 | " pd.Series(y_test_preds, name='Predicted'),\n", 283 | " margins=True\n", 284 | ")" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "We can also present the performance using 3 different evaluation metrics:\n", 292 | "- [accuracy](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), \n", 293 | "- [precision](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) (the ability of the classifier not to label as positive a sample that is negative),\n", 294 | "- [recall](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html) (the ability of the classifier to find all the positive samples). " 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 8, 300 | "metadata": { 301 | "collapsed": false, 302 | "scrolled": true 303 | }, 304 | "outputs": [ 305 | { 306 | "name": "stdout", 307 | "output_type": "stream", 308 | "text": [ 309 | "Accuracy: 0.94\n", 310 | "Precision: 1.00\n", 311 | "Recall: 0.43\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "print('Accuracy: {0:.2f}'.format(accuracy_score(y_test, y_test_preds)))\n", 317 | "print('Precision: {0:.2f}'.format(precision_score(y_test, y_test_preds)))\n", 318 | "print('Recall: {0:.2f}'.format(recall_score(y_test, y_test_preds)))" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "Intuitively we know that the foucs should be on finding positive samples. First results are very promising (94% accuracy - wow), but deeper analysis show that the results are biased towards majority class - we are very poor at predicting the actual label of positive instances. That is called an [accuracy paradox](https://en.wikipedia.org/wiki/Accuracy_paradox?oldformat=true)." 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "### Custom weights
\n", 333 | "Try to explicitly tell the algorithm what important using relative instance weights. Let's specify that positive instances have 5x more weight and add this information while creating `DMatrix`." 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 9, 339 | "metadata": { 340 | "collapsed": true 341 | }, 342 | "outputs": [], 343 | "source": [ 344 | "weights = np.zeros(len(y_train))\n", 345 | "weights[y_train == 0] = 1\n", 346 | "weights[y_train == 1] = 5\n", 347 | "\n", 348 | "dtrain = xgb.DMatrix(X_train, label=y_train, weight=weights) # weights added\n", 349 | "dtest = xgb.DMatrix(X_test)" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "Train the classifier and get predictions (same as in baseline):" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 10, 362 | "metadata": { 363 | "collapsed": true 364 | }, 365 | "outputs": [], 366 | "source": [ 367 | "bst = xgb.train(params, dtrain, num_rounds)\n", 368 | "y_test_preds = (bst.predict(dtest) > 0.5).astype('int')" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "Inspect the confusion matrix, and obtained evaluation metrics:" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 11, 381 | "metadata": { 382 | "collapsed": false 383 | }, 384 | "outputs": [ 385 | { 386 | "data": { 387 | "text/html": [ 388 | "
\n", 389 | "\n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | "
Predicted01All
Actual
052759
1257
All541266
\n", 425 | "
" 426 | ], 427 | "text/plain": [ 428 | "Predicted 0 1 All\n", 429 | "Actual \n", 430 | "0 52 7 59\n", 431 | "1 2 5 7\n", 432 | "All 54 12 66" 433 | ] 434 | }, 435 | "execution_count": 11, 436 | "metadata": {}, 437 | "output_type": "execute_result" 438 | } 439 | ], 440 | "source": [ 441 | "pd.crosstab(\n", 442 | " pd.Series(y_test, name='Actual'),\n", 443 | " pd.Series(y_test_preds, name='Predicted'),\n", 444 | " margins=True\n", 445 | ")" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 12, 451 | "metadata": { 452 | "collapsed": false 453 | }, 454 | "outputs": [ 455 | { 456 | "name": "stdout", 457 | "output_type": "stream", 458 | "text": [ 459 | "Accuracy: 0.86\n", 460 | "Precision: 0.42\n", 461 | "Recall: 0.71\n" 462 | ] 463 | } 464 | ], 465 | "source": [ 466 | "print('Accuracy: {0:.2f}'.format(accuracy_score(y_test, y_test_preds)))\n", 467 | "print('Precision: {0:.2f}'.format(precision_score(y_test, y_test_preds)))\n", 468 | "print('Recall: {0:.2f}'.format(recall_score(y_test, y_test_preds)))" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "You see that we made a trade-off here. We are now able to better classify the minority class, but the overall accuracy and precision decreased. Test multiple weights combinations and see which one works best." 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "### Use `scale_pos_weight` parameter
\n", 483 | "You can automate the process of assigning weights manually by calculating the proportion between negative and positive instances and setting it to `scale_pos_weight` parameter.\n", 484 | "\n", 485 | "Let's reinitialize datasets." 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": 13, 491 | "metadata": { 492 | "collapsed": true 493 | }, 494 | "outputs": [], 495 | "source": [ 496 | "dtrain = xgb.DMatrix(X_train, label=y_train)\n", 497 | "dtest = xgb.DMatrix(X_test)" 498 | ] 499 | }, 500 | { 501 | "cell_type": "markdown", 502 | "metadata": {}, 503 | "source": [ 504 | "Calculate the ratio between both classes and assign it to a parameter." 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 14, 510 | "metadata": { 511 | "collapsed": false 512 | }, 513 | "outputs": [], 514 | "source": [ 515 | "train_labels = dtrain.get_label()\n", 516 | "\n", 517 | "ratio = float(np.sum(train_labels == 0)) / np.sum(train_labels == 1)\n", 518 | "params['scale_pos_weight'] = ratio" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "And like before, analyze the confusion matrix and obtained metrics" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 15, 531 | "metadata": { 532 | "collapsed": false 533 | }, 534 | "outputs": [ 535 | { 536 | "data": { 537 | "text/html": [ 538 | "
\n", 539 | "\n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | "
Predicted01All
Actual
051859
1077
All511566
\n", 575 | "
" 576 | ], 577 | "text/plain": [ 578 | "Predicted 0 1 All\n", 579 | "Actual \n", 580 | "0 51 8 59\n", 581 | "1 0 7 7\n", 582 | "All 51 15 66" 583 | ] 584 | }, 585 | "execution_count": 15, 586 | "metadata": {}, 587 | "output_type": "execute_result" 588 | } 589 | ], 590 | "source": [ 591 | "bst = xgb.train(params, dtrain, num_rounds)\n", 592 | "y_test_preds = (bst.predict(dtest) > 0.5).astype('int')\n", 593 | "\n", 594 | "pd.crosstab(\n", 595 | " pd.Series(y_test, name='Actual'),\n", 596 | " pd.Series(y_test_preds, name='Predicted'),\n", 597 | " margins=True\n", 598 | ")" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 16, 604 | "metadata": { 605 | "collapsed": false 606 | }, 607 | "outputs": [ 608 | { 609 | "name": "stdout", 610 | "output_type": "stream", 611 | "text": [ 612 | "Accuracy: 0.88\n", 613 | "Precision: 0.47\n", 614 | "Recall: 1.00\n" 615 | ] 616 | } 617 | ], 618 | "source": [ 619 | "print('Accuracy: {0:.2f}'.format(accuracy_score(y_test, y_test_preds)))\n", 620 | "print('Precision: {0:.2f}'.format(precision_score(y_test, y_test_preds)))\n", 621 | "print('Recall: {0:.2f}'.format(recall_score(y_test, y_test_preds)))" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "You can see that scalling weight by using `scale_pos_weights` in this case gives better results that doing it manually. We are now able to perfectly classify all posivie classes (focusing on the real problem). On the other hand the classifier sometimes makes a mistake by wrongly classifing the negative case into positive (producing so called *false positives*)." 629 | ] 630 | } 631 | ], 632 | "metadata": { 633 | "kernelspec": { 634 | "display_name": "Python 3", 635 | "language": "python", 636 | "name": "python3" 637 | }, 638 | "language_info": { 639 | "codemirror_mode": { 640 | "name": "ipython", 641 | "version": 3 642 | }, 643 | "file_extension": ".py", 644 | "mimetype": "text/x-python", 645 | "name": "python", 646 | "nbconvert_exporter": "python", 647 | "pygments_lexer": "ipython3", 648 | "version": "3.5.2" 649 | } 650 | }, 651 | "nbformat": 4, 652 | "nbformat_minor": 0 653 | } 654 | -------------------------------------------------------------------------------- /notebooks/data/featmap.txt: -------------------------------------------------------------------------------- 1 | 0 cap-shape=bell i 2 | 1 cap-shape=conical i 3 | 2 cap-shape=convex i 4 | 3 cap-shape=flat i 5 | 4 cap-shape=knobbed i 6 | 5 cap-shape=sunken i 7 | 6 cap-surface=fibrous i 8 | 7 cap-surface=grooves i 9 | 8 cap-surface=scaly i 10 | 9 cap-surface=smooth i 11 | 10 cap-color=brown i 12 | 11 cap-color=buff i 13 | 12 cap-color=cinnamon i 14 | 13 cap-color=gray i 15 | 14 cap-color=green i 16 | 15 cap-color=pink i 17 | 16 cap-color=purple i 18 | 17 cap-color=red i 19 | 18 cap-color=white i 20 | 19 cap-color=yellow i 21 | 20 bruises?=bruises i 22 | 21 bruises?=no i 23 | 22 odor=almond i 24 | 23 odor=anise i 25 | 24 odor=creosote i 26 | 25 odor=fishy i 27 | 26 odor=foul i 28 | 27 odor=musty i 29 | 28 odor=none i 30 | 29 odor=pungent i 31 | 30 odor=spicy i 32 | 31 gill-attachment=attached i 33 | 32 gill-attachment=descending i 34 | 33 gill-attachment=free i 35 | 34 gill-attachment=notched i 36 | 35 gill-spacing=close i 37 | 36 gill-spacing=crowded i 38 | 37 gill-spacing=distant i 39 | 38 gill-size=broad i 40 | 39 gill-size=narrow i 41 | 40 gill-color=black i 42 | 41 gill-color=brown i 43 | 42 gill-color=buff i 44 | 43 gill-color=chocolate i 45 | 44 gill-color=gray i 46 | 45 gill-color=green i 47 | 46 gill-color=orange i 48 | 47 gill-color=pink i 49 | 48 gill-color=purple i 50 | 49 gill-color=red i 51 | 50 gill-color=white i 52 | 51 gill-color=yellow i 53 | 52 stalk-shape=enlarging i 54 | 53 stalk-shape=tapering i 55 | 54 stalk-root=bulbous i 56 | 55 stalk-root=club i 57 | 56 stalk-root=cup i 58 | 57 stalk-root=equal i 59 | 58 stalk-root=rhizomorphs i 60 | 59 stalk-root=rooted i 61 | 60 stalk-root=missing i 62 | 61 stalk-surface-above-ring=fibrous i 63 | 62 stalk-surface-above-ring=scaly i 64 | 63 stalk-surface-above-ring=silky i 65 | 64 stalk-surface-above-ring=smooth i 66 | 65 stalk-surface-below-ring=fibrous i 67 | 66 stalk-surface-below-ring=scaly i 68 | 67 stalk-surface-below-ring=silky i 69 | 68 stalk-surface-below-ring=smooth i 70 | 69 stalk-color-above-ring=brown i 71 | 70 stalk-color-above-ring=buff i 72 | 71 stalk-color-above-ring=cinnamon i 73 | 72 stalk-color-above-ring=gray i 74 | 73 stalk-color-above-ring=orange i 75 | 74 stalk-color-above-ring=pink i 76 | 75 stalk-color-above-ring=red i 77 | 76 stalk-color-above-ring=white i 78 | 77 stalk-color-above-ring=yellow i 79 | 78 stalk-color-below-ring=brown i 80 | 79 stalk-color-below-ring=buff i 81 | 80 stalk-color-below-ring=cinnamon i 82 | 81 stalk-color-below-ring=gray i 83 | 82 stalk-color-below-ring=orange i 84 | 83 stalk-color-below-ring=pink i 85 | 84 stalk-color-below-ring=red i 86 | 85 stalk-color-below-ring=white i 87 | 86 stalk-color-below-ring=yellow i 88 | 87 veil-type=partial i 89 | 88 veil-type=universal i 90 | 89 veil-color=brown i 91 | 90 veil-color=orange i 92 | 91 veil-color=white i 93 | 92 veil-color=yellow i 94 | 93 ring-number=none i 95 | 94 ring-number=one i 96 | 95 ring-number=two i 97 | 96 ring-type=cobwebby i 98 | 97 ring-type=evanescent i 99 | 98 ring-type=flaring i 100 | 99 ring-type=large i 101 | 100 ring-type=none i 102 | 101 ring-type=pendant i 103 | 102 ring-type=sheathing i 104 | 103 ring-type=zone i 105 | 104 spore-print-color=black i 106 | 105 spore-print-color=brown i 107 | 106 spore-print-color=buff i 108 | 107 spore-print-color=chocolate i 109 | 108 spore-print-color=green i 110 | 109 spore-print-color=orange i 111 | 110 spore-print-color=purple i 112 | 111 spore-print-color=white i 113 | 112 spore-print-color=yellow i 114 | 113 population=abundant i 115 | 114 population=clustered i 116 | 115 population=numerous i 117 | 116 population=scattered i 118 | 117 population=several i 119 | 118 population=solitary i 120 | 119 habitat=grasses i 121 | 120 habitat=leaves i 122 | 121 habitat=meadows i 123 | 122 habitat=paths i 124 | 123 habitat=urban i 125 | 124 habitat=waste i 126 | 125 habitat=woods i 127 | -------------------------------------------------------------------------------- /notebooks/images/ada-t1.dot: -------------------------------------------------------------------------------- 1 | digraph Tree { 2 | node [shape=box] ; 3 | 0 [label="X[1] <= 1.2335\ngini = 0.4999\nsamples = 800\nvalue = [0.4937, 0.5062]"] ; 4 | 1 [label="gini = 0.4655\nsamples = 537\nvalue = [0.2475, 0.4237]"] ; 5 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 6 | 2 [label="gini = 0.3759\nsamples = 263\nvalue = [0.2463, 0.0825]"] ; 7 | 0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 8 | } -------------------------------------------------------------------------------- /notebooks/images/ada-t1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/ada-t1.png -------------------------------------------------------------------------------- /notebooks/images/bias-variance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/bias-variance.png -------------------------------------------------------------------------------- /notebooks/images/boosting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/boosting.png -------------------------------------------------------------------------------- /notebooks/images/dt.dot: -------------------------------------------------------------------------------- 1 | digraph Tree { 2 | node [shape=box] ; 3 | 0 [label="X[1] <= 1.2335\ngini = 0.4999\nsamples = 800\nvalue = [395, 405]"] ; 4 | 1 [label="X[4] <= -0.6412\ngini = 0.4655\nsamples = 537\nvalue = [198, 339]"] ; 5 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 6 | 2 [label="X[8] <= -1.7983\ngini = 0.355\nsamples = 325\nvalue = [75, 250]"] ; 7 | 1 -> 2 ; 8 | 3 [label="X[3] <= 0.7503\ngini = 0.493\nsamples = 59\nvalue = [33, 26]"] ; 9 | 2 -> 3 ; 10 | 4 [label="X[17] <= 2.1915\ngini = 0.2706\nsamples = 31\nvalue = [5, 26]"] ; 11 | 3 -> 4 ; 12 | 5 [label="X[0] <= 2.3056\ngini = 0.1372\nsamples = 27\nvalue = [2, 25]"] ; 13 | 4 -> 5 ; 14 | 6 [label="gini = 0.0\nsamples = 24\nvalue = [0, 24]"] ; 15 | 5 -> 6 ; 16 | 7 [label="X[12] <= 1.663\ngini = 0.4444\nsamples = 3\nvalue = [2, 1]"] ; 17 | 5 -> 7 ; 18 | 8 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 19 | 7 -> 8 ; 20 | 9 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 21 | 7 -> 9 ; 22 | 10 [label="X[14] <= -1.872\ngini = 0.375\nsamples = 4\nvalue = [3, 1]"] ; 23 | 4 -> 10 ; 24 | 11 [label="gini = 0.0\nsamples = 3\nvalue = [3, 0]"] ; 25 | 10 -> 11 ; 26 | 12 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 27 | 10 -> 12 ; 28 | 13 [label="gini = 0.0\nsamples = 28\nvalue = [28, 0]"] ; 29 | 3 -> 13 ; 30 | 14 [label="X[10] <= 0.6085\ngini = 0.2659\nsamples = 266\nvalue = [42, 224]"] ; 31 | 2 -> 14 ; 32 | 15 [label="X[14] <= -3.986\ngini = 0.055\nsamples = 106\nvalue = [3, 103]"] ; 33 | 14 -> 15 ; 34 | 16 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 35 | 15 -> 16 ; 36 | 17 [label="X[1] <= 0.8647\ngini = 0.0374\nsamples = 105\nvalue = [2, 103]"] ; 37 | 15 -> 17 ; 38 | 18 [label="X[19] <= -1.7662\ngini = 0.019\nsamples = 104\nvalue = [1, 103]"] ; 39 | 17 -> 18 ; 40 | 19 [label="X[15] <= 2.2647\ngini = 0.4444\nsamples = 3\nvalue = [1, 2]"] ; 41 | 18 -> 19 ; 42 | 20 [label="gini = 0.0\nsamples = 2\nvalue = [0, 2]"] ; 43 | 19 -> 20 ; 44 | 21 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 45 | 19 -> 21 ; 46 | 22 [label="gini = 0.0\nsamples = 101\nvalue = [0, 101]"] ; 47 | 18 -> 22 ; 48 | 23 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 49 | 17 -> 23 ; 50 | 24 [label="X[3] <= 0.1002\ngini = 0.3687\nsamples = 160\nvalue = [39, 121]"] ; 51 | 14 -> 24 ; 52 | 25 [label="X[4] <= -6.8057\ngini = 0.1454\nsamples = 76\nvalue = [6, 70]"] ; 53 | 24 -> 25 ; 54 | 26 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 55 | 25 -> 26 ; 56 | 27 [label="X[14] <= 1.8035\ngini = 0.1023\nsamples = 74\nvalue = [4, 70]"] ; 57 | 25 -> 27 ; 58 | 28 [label="X[3] <= -3.4155\ngini = 0.0788\nsamples = 73\nvalue = [3, 70]"] ; 59 | 27 -> 28 ; 60 | 29 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 61 | 28 -> 29 ; 62 | 30 [label="X[7] <= -2.4732\ngini = 0.054\nsamples = 72\nvalue = [2, 70]"] ; 63 | 28 -> 30 ; 64 | 31 [label="X[14] <= -1.3282\ngini = 0.48\nsamples = 5\nvalue = [2, 3]"] ; 65 | 30 -> 31 ; 66 | 32 [label="gini = 0.0\nsamples = 3\nvalue = [0, 3]"] ; 67 | 31 -> 32 ; 68 | 33 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 69 | 31 -> 33 ; 70 | 34 [label="gini = 0.0\nsamples = 67\nvalue = [0, 67]"] ; 71 | 30 -> 34 ; 72 | 35 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 73 | 27 -> 35 ; 74 | 36 [label="X[4] <= -2.5429\ngini = 0.477\nsamples = 84\nvalue = [33, 51]"] ; 75 | 24 -> 36 ; 76 | 37 [label="X[12] <= 1.9586\ngini = 0.3432\nsamples = 50\nvalue = [11, 39]"] ; 77 | 36 -> 37 ; 78 | 38 [label="X[9] <= 1.8146\ngini = 0.1284\nsamples = 29\nvalue = [2, 27]"] ; 79 | 37 -> 38 ; 80 | 39 [label="X[17] <= 2.3583\ngini = 0.0689\nsamples = 28\nvalue = [1, 27]"] ; 81 | 38 -> 39 ; 82 | 40 [label="gini = 0.0\nsamples = 26\nvalue = [0, 26]"] ; 83 | 39 -> 40 ; 84 | 41 [label="X[13] <= 0.9607\ngini = 0.5\nsamples = 2\nvalue = [1, 1]"] ; 85 | 39 -> 41 ; 86 | 42 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 87 | 41 -> 42 ; 88 | 43 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 89 | 41 -> 43 ; 90 | 44 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 91 | 38 -> 44 ; 92 | 45 [label="X[6] <= 1.0354\ngini = 0.4898\nsamples = 21\nvalue = [9, 12]"] ; 93 | 37 -> 45 ; 94 | 46 [label="X[7] <= -2.4848\ngini = 0.1975\nsamples = 9\nvalue = [8, 1]"] ; 95 | 45 -> 46 ; 96 | 47 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 97 | 46 -> 47 ; 98 | 48 [label="gini = 0.0\nsamples = 8\nvalue = [8, 0]"] ; 99 | 46 -> 48 ; 100 | 49 [label="X[11] <= 1.1381\ngini = 0.1528\nsamples = 12\nvalue = [1, 11]"] ; 101 | 45 -> 49 ; 102 | 50 [label="gini = 0.0\nsamples = 11\nvalue = [0, 11]"] ; 103 | 49 -> 50 ; 104 | 51 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 105 | 49 -> 51 ; 106 | 52 [label="X[9] <= 0.8134\ngini = 0.4567\nsamples = 34\nvalue = [22, 12]"] ; 107 | 36 -> 52 ; 108 | 53 [label="X[17] <= 1.5933\ngini = 0.5\nsamples = 24\nvalue = [12, 12]"] ; 109 | 52 -> 53 ; 110 | 54 [label="X[0] <= 1.3919\ngini = 0.4654\nsamples = 19\nvalue = [7, 12]"] ; 111 | 53 -> 54 ; 112 | 55 [label="X[3] <= 2.6756\ngini = 0.1528\nsamples = 12\nvalue = [1, 11]"] ; 113 | 54 -> 55 ; 114 | 56 [label="gini = 0.0\nsamples = 11\nvalue = [0, 11]"] ; 115 | 55 -> 56 ; 116 | 57 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 117 | 55 -> 57 ; 118 | 58 [label="X[2] <= -0.6306\ngini = 0.2449\nsamples = 7\nvalue = [6, 1]"] ; 119 | 54 -> 58 ; 120 | 59 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 121 | 58 -> 59 ; 122 | 60 [label="gini = 0.0\nsamples = 6\nvalue = [6, 0]"] ; 123 | 58 -> 60 ; 124 | 61 [label="gini = 0.0\nsamples = 5\nvalue = [5, 0]"] ; 125 | 53 -> 61 ; 126 | 62 [label="gini = 0.0\nsamples = 10\nvalue = [10, 0]"] ; 127 | 52 -> 62 ; 128 | 63 [label="X[3] <= -0.0931\ngini = 0.4871\nsamples = 212\nvalue = [123, 89]"] ; 129 | 1 -> 63 ; 130 | 64 [label="X[17] <= -0.3005\ngini = 0.4678\nsamples = 126\nvalue = [47, 79]"] ; 131 | 63 -> 64 ; 132 | 65 [label="X[14] <= 1.9403\ngini = 0.1653\nsamples = 55\nvalue = [5, 50]"] ; 133 | 64 -> 65 ; 134 | 66 [label="X[15] <= 0.4412\ngini = 0.074\nsamples = 52\nvalue = [2, 50]"] ; 135 | 65 -> 66 ; 136 | 67 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 137 | 66 -> 67 ; 138 | 68 [label="gini = 0.0\nsamples = 50\nvalue = [0, 50]"] ; 139 | 66 -> 68 ; 140 | 69 [label="gini = 0.0\nsamples = 3\nvalue = [3, 0]"] ; 141 | 65 -> 69 ; 142 | 70 [label="X[14] <= -0.6587\ngini = 0.4832\nsamples = 71\nvalue = [42, 29]"] ; 143 | 64 -> 70 ; 144 | 71 [label="X[3] <= -1.1441\ngini = 0.4224\nsamples = 33\nvalue = [10, 23]"] ; 145 | 70 -> 71 ; 146 | 72 [label="gini = 0.0\nsamples = 19\nvalue = [0, 19]"] ; 147 | 71 -> 72 ; 148 | 73 [label="X[11] <= 0.348\ngini = 0.4082\nsamples = 14\nvalue = [10, 4]"] ; 149 | 71 -> 73 ; 150 | 74 [label="X[16] <= 1.2091\ngini = 0.32\nsamples = 5\nvalue = [1, 4]"] ; 151 | 73 -> 74 ; 152 | 75 [label="gini = 0.0\nsamples = 4\nvalue = [0, 4]"] ; 153 | 74 -> 75 ; 154 | 76 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 155 | 74 -> 76 ; 156 | 77 [label="gini = 0.0\nsamples = 9\nvalue = [9, 0]"] ; 157 | 73 -> 77 ; 158 | 78 [label="X[7] <= -2.3547\ngini = 0.2659\nsamples = 38\nvalue = [32, 6]"] ; 159 | 70 -> 78 ; 160 | 79 [label="X[17] <= 0.2786\ngini = 0.4444\nsamples = 6\nvalue = [2, 4]"] ; 161 | 78 -> 79 ; 162 | 80 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 163 | 79 -> 80 ; 164 | 81 [label="gini = 0.0\nsamples = 4\nvalue = [0, 4]"] ; 165 | 79 -> 81 ; 166 | 82 [label="X[3] <= -0.1846\ngini = 0.1172\nsamples = 32\nvalue = [30, 2]"] ; 167 | 78 -> 82 ; 168 | 83 [label="X[13] <= 1.4711\ngini = 0.0624\nsamples = 31\nvalue = [30, 1]"] ; 169 | 82 -> 83 ; 170 | 84 [label="gini = 0.0\nsamples = 30\nvalue = [30, 0]"] ; 171 | 83 -> 84 ; 172 | 85 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 173 | 83 -> 85 ; 174 | 86 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 175 | 82 -> 86 ; 176 | 87 [label="X[14] <= -3.4134\ngini = 0.2055\nsamples = 86\nvalue = [76, 10]"] ; 177 | 63 -> 87 ; 178 | 88 [label="gini = 0.0\nsamples = 3\nvalue = [0, 3]"] ; 179 | 87 -> 88 ; 180 | 89 [label="X[17] <= -2.5889\ngini = 0.1544\nsamples = 83\nvalue = [76, 7]"] ; 181 | 87 -> 89 ; 182 | 90 [label="gini = 0.0\nsamples = 2\nvalue = [0, 2]"] ; 183 | 89 -> 90 ; 184 | 91 [label="X[11] <= -1.6572\ngini = 0.1158\nsamples = 81\nvalue = [76, 5]"] ; 185 | 89 -> 91 ; 186 | 92 [label="X[17] <= -0.254\ngini = 0.5\nsamples = 8\nvalue = [4, 4]"] ; 187 | 91 -> 92 ; 188 | 93 [label="gini = 0.0\nsamples = 4\nvalue = [0, 4]"] ; 189 | 92 -> 93 ; 190 | 94 [label="gini = 0.0\nsamples = 4\nvalue = [4, 0]"] ; 191 | 92 -> 94 ; 192 | 95 [label="X[11] <= -0.5606\ngini = 0.027\nsamples = 73\nvalue = [72, 1]"] ; 193 | 91 -> 95 ; 194 | 96 [label="X[3] <= 0.3508\ngini = 0.2778\nsamples = 6\nvalue = [5, 1]"] ; 195 | 95 -> 96 ; 196 | 97 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 197 | 96 -> 97 ; 198 | 98 [label="gini = 0.0\nsamples = 5\nvalue = [5, 0]"] ; 199 | 96 -> 98 ; 200 | 99 [label="gini = 0.0\nsamples = 67\nvalue = [67, 0]"] ; 201 | 95 -> 99 ; 202 | 100 [label="X[4] <= -2.27\ngini = 0.3759\nsamples = 263\nvalue = [197, 66]"] ; 203 | 0 -> 100 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 204 | 101 [label="X[15] <= -1.5703\ngini = 0.4957\nsamples = 108\nvalue = [59, 49]"] ; 205 | 100 -> 101 ; 206 | 102 [label="X[14] <= 0.4991\ngini = 0.4297\nsamples = 64\nvalue = [44, 20]"] ; 207 | 101 -> 102 ; 208 | 103 [label="X[3] <= 3.2039\ngini = 0.1884\nsamples = 38\nvalue = [34, 4]"] ; 209 | 102 -> 103 ; 210 | 104 [label="X[11] <= 0.4285\ngini = 0.1049\nsamples = 36\nvalue = [34, 2]"] ; 211 | 103 -> 104 ; 212 | 105 [label="X[0] <= 3.5075\ngini = 0.0571\nsamples = 34\nvalue = [33, 1]"] ; 213 | 104 -> 105 ; 214 | 106 [label="gini = 0.0\nsamples = 32\nvalue = [32, 0]"] ; 215 | 105 -> 106 ; 216 | 107 [label="X[4] <= -8.3996\ngini = 0.5\nsamples = 2\nvalue = [1, 1]"] ; 217 | 105 -> 107 ; 218 | 108 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 219 | 107 -> 108 ; 220 | 109 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 221 | 107 -> 109 ; 222 | 110 [label="X[2] <= -0.6655\ngini = 0.5\nsamples = 2\nvalue = [1, 1]"] ; 223 | 104 -> 110 ; 224 | 111 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 225 | 110 -> 111 ; 226 | 112 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 227 | 110 -> 112 ; 228 | 113 [label="gini = 0.0\nsamples = 2\nvalue = [0, 2]"] ; 229 | 103 -> 113 ; 230 | 114 [label="X[17] <= -1.7722\ngini = 0.4734\nsamples = 26\nvalue = [10, 16]"] ; 231 | 102 -> 114 ; 232 | 115 [label="X[14] <= 1.7181\ngini = 0.2778\nsamples = 6\nvalue = [5, 1]"] ; 233 | 114 -> 115 ; 234 | 116 [label="gini = 0.0\nsamples = 5\nvalue = [5, 0]"] ; 235 | 115 -> 116 ; 236 | 117 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 237 | 115 -> 117 ; 238 | 118 [label="X[16] <= -0.9179\ngini = 0.375\nsamples = 20\nvalue = [5, 15]"] ; 239 | 114 -> 118 ; 240 | 119 [label="gini = 0.0\nsamples = 3\nvalue = [3, 0]"] ; 241 | 118 -> 119 ; 242 | 120 [label="X[9] <= 0.9569\ngini = 0.2076\nsamples = 17\nvalue = [2, 15]"] ; 243 | 118 -> 120 ; 244 | 121 [label="gini = 0.0\nsamples = 14\nvalue = [0, 14]"] ; 245 | 120 -> 121 ; 246 | 122 [label="X[17] <= 1.381\ngini = 0.4444\nsamples = 3\nvalue = [2, 1]"] ; 247 | 120 -> 122 ; 248 | 123 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 249 | 122 -> 123 ; 250 | 124 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 251 | 122 -> 124 ; 252 | 125 [label="X[14] <= 1.5526\ngini = 0.4494\nsamples = 44\nvalue = [15, 29]"] ; 253 | 101 -> 125 ; 254 | 126 [label="X[8] <= -2.1011\ngini = 0.327\nsamples = 34\nvalue = [7, 27]"] ; 255 | 125 -> 126 ; 256 | 127 [label="X[19] <= 1.9316\ngini = 0.4444\nsamples = 6\nvalue = [4, 2]"] ; 257 | 126 -> 127 ; 258 | 128 [label="gini = 0.0\nsamples = 4\nvalue = [4, 0]"] ; 259 | 127 -> 128 ; 260 | 129 [label="gini = 0.0\nsamples = 2\nvalue = [0, 2]"] ; 261 | 127 -> 129 ; 262 | 130 [label="X[3] <= -2.1439\ngini = 0.1913\nsamples = 28\nvalue = [3, 25]"] ; 263 | 126 -> 130 ; 264 | 131 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 265 | 130 -> 131 ; 266 | 132 [label="X[16] <= -1.8957\ngini = 0.1372\nsamples = 27\nvalue = [2, 25]"] ; 267 | 130 -> 132 ; 268 | 133 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 269 | 132 -> 133 ; 270 | 134 [label="X[19] <= 1.7477\ngini = 0.074\nsamples = 26\nvalue = [1, 25]"] ; 271 | 132 -> 134 ; 272 | 135 [label="gini = 0.0\nsamples = 24\nvalue = [0, 24]"] ; 273 | 134 -> 135 ; 274 | 136 [label="X[19] <= 1.9454\ngini = 0.5\nsamples = 2\nvalue = [1, 1]"] ; 275 | 134 -> 136 ; 276 | 137 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 277 | 136 -> 137 ; 278 | 138 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 279 | 136 -> 138 ; 280 | 139 [label="X[0] <= 4.6086\ngini = 0.32\nsamples = 10\nvalue = [8, 2]"] ; 281 | 125 -> 139 ; 282 | 140 [label="gini = 0.0\nsamples = 8\nvalue = [8, 0]"] ; 283 | 139 -> 140 ; 284 | 141 [label="gini = 0.0\nsamples = 2\nvalue = [0, 2]"] ; 285 | 139 -> 141 ; 286 | 142 [label="X[17] <= -1.468\ngini = 0.1953\nsamples = 155\nvalue = [138, 17]"] ; 287 | 100 -> 142 ; 288 | 143 [label="X[11] <= -2.7307\ngini = 0.4102\nsamples = 59\nvalue = [42, 17]"] ; 289 | 142 -> 143 ; 290 | 144 [label="X[2] <= 0.4413\ngini = 0.2449\nsamples = 7\nvalue = [1, 6]"] ; 291 | 143 -> 144 ; 292 | 145 [label="gini = 0.0\nsamples = 6\nvalue = [0, 6]"] ; 293 | 144 -> 145 ; 294 | 146 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 295 | 144 -> 146 ; 296 | 147 [label="X[15] <= 1.3555\ngini = 0.3336\nsamples = 52\nvalue = [41, 11]"] ; 297 | 143 -> 147 ; 298 | 148 [label="X[13] <= -0.8878\ngini = 0.2311\nsamples = 45\nvalue = [39, 6]"] ; 299 | 147 -> 148 ; 300 | 149 [label="X[1] <= 2.7349\ngini = 0.375\nsamples = 4\nvalue = [1, 3]"] ; 301 | 148 -> 149 ; 302 | 150 [label="gini = 0.0\nsamples = 3\nvalue = [0, 3]"] ; 303 | 149 -> 150 ; 304 | 151 [label="gini = 0.0\nsamples = 1\nvalue = [1, 0]"] ; 305 | 149 -> 151 ; 306 | 152 [label="X[10] <= 2.2552\ngini = 0.1356\nsamples = 41\nvalue = [38, 3]"] ; 307 | 148 -> 152 ; 308 | 153 [label="X[8] <= 1.4963\ngini = 0.095\nsamples = 40\nvalue = [38, 2]"] ; 309 | 152 -> 153 ; 310 | 154 [label="X[3] <= 0.7676\ngini = 0.0526\nsamples = 37\nvalue = [36, 1]"] ; 311 | 153 -> 154 ; 312 | 155 [label="gini = 0.0\nsamples = 34\nvalue = [34, 0]"] ; 313 | 154 -> 155 ; 314 | 156 [label="X[0] <= 0.141\ngini = 0.4444\nsamples = 3\nvalue = [2, 1]"] ; 315 | 154 -> 156 ; 316 | 157 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 317 | 156 -> 157 ; 318 | 158 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 319 | 156 -> 158 ; 320 | 159 [label="X[9] <= 0.3202\ngini = 0.4444\nsamples = 3\nvalue = [2, 1]"] ; 321 | 153 -> 159 ; 322 | 160 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 323 | 159 -> 160 ; 324 | 161 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 325 | 159 -> 161 ; 326 | 162 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 327 | 152 -> 162 ; 328 | 163 [label="X[0] <= 1.1083\ngini = 0.4082\nsamples = 7\nvalue = [2, 5]"] ; 329 | 147 -> 163 ; 330 | 164 [label="gini = 0.0\nsamples = 2\nvalue = [2, 0]"] ; 331 | 163 -> 164 ; 332 | 165 [label="gini = 0.0\nsamples = 5\nvalue = [0, 5]"] ; 333 | 163 -> 165 ; 334 | 166 [label="gini = 0.0\nsamples = 96\nvalue = [96, 0]"] ; 335 | 142 -> 166 ; 336 | } -------------------------------------------------------------------------------- /notebooks/images/dt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/dt.png -------------------------------------------------------------------------------- /notebooks/images/gbc-t1.dot: -------------------------------------------------------------------------------- 1 | digraph Tree { 2 | node [shape=box] ; 3 | 0 [label="X[1] <= 1.2335\nfriedman_mse = 0.2385\nsamples = 800\nvalue = 0.0"] ; 4 | 1 [label="friedman_mse = 0.2274\nsamples = 537\nvalue = 0.4502"] ; 5 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 6 | 2 [label="friedman_mse = 0.183\nsamples = 263\nvalue = -0.9192"] ; 7 | 0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 8 | } -------------------------------------------------------------------------------- /notebooks/images/gbc-t1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/gbc-t1.png -------------------------------------------------------------------------------- /notebooks/images/practical_xgboost_in_python_notebook_header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/practical_xgboost_in_python_notebook_header.png -------------------------------------------------------------------------------- /notebooks/images/underfitting_overfitting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ParrotPrediction/docker-course-xgboost/49f8de97cbc1695dcbeb09391e2662dbedf30ee1/notebooks/images/underfitting_overfitting.png --------------------------------------------------------------------------------