├── .github └── FUNDING.yml ├── .gitignore ├── 01_machine_learning_intro.ipynb ├── 02_machine_learning_setup.ipynb ├── 03_getting_started_with_iris.ipynb ├── 04_model_training.ipynb ├── 05_model_evaluation.ipynb ├── 06_linear_regression.ipynb ├── 07_cross_validation.ipynb ├── 08_grid_search.ipynb ├── 09_classification_metrics.ipynb ├── 10_categorical_features.ipynb ├── README.md ├── data ├── Advertising.csv └── pima-indians-diabetes.data ├── images ├── 01_clustering.png ├── 01_robot.png ├── 01_spam_filter.png ├── 01_supervised_learning.png ├── 02_ipython_header.png ├── 02_jupyter_logo.svg ├── 02_sklearn_algorithms.png ├── 02_sklearn_logo.png ├── 03_iris.png ├── 04_1nn_map.png ├── 04_5nn_map.png ├── 04_knn_dataset.png ├── 05_overfitting.png ├── 05_train_test_split.png ├── 07_cross_validation_diagram.png ├── 09_confusion_matrix_1.png ├── 09_confusion_matrix_2.png └── youtube.png └── styles └── custom.css /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | patreon: dataschool 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | *.pyc 3 | v3/ 4 | .DS_Store 5 | extras/ 6 | -------------------------------------------------------------------------------- /01_machine_learning_intro.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# What is Machine Learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos)." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "![Machine Learning](images/01_robot.png)" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Agenda\n", 24 | "\n", 25 | "- What is Machine Learning?\n", 26 | "- What are the two main categories of Machine Learning?\n", 27 | "- What are some examples of Machine Learning?\n", 28 | "- How does Machine Learning \"work\"?" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## What is Machine Learning?\n", 36 | "\n", 37 | "One definition: \"Machine Learning is the semi-automated extraction of knowledge from data\"\n", 38 | "\n", 39 | "- **Knowledge from data**: Starts with a question that might be answerable using data\n", 40 | "- **Automated extraction**: A computer provides the insight\n", 41 | "- **Semi-automated**: Requires many smart decisions by a human" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## What are the two main categories of Machine Learning?\n", 49 | "\n", 50 | "**Supervised learning**: Making predictions using data\n", 51 | " \n", 52 | "- Example: Is a given email \"spam\" or \"ham\"?\n", 53 | "- There is an outcome we are trying to predict" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "![Spam filter](images/01_spam_filter.png)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "**Unsupervised learning**: Extracting structure from data\n", 68 | "\n", 69 | "- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n", 70 | "- There is no \"right answer\"" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "![Clustering](images/01_clustering.png)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "## How does Machine Learning \"work\"?\n", 85 | "\n", 86 | "High-level steps of supervised learning:\n", 87 | "\n", 88 | "1. First, train a **Machine Learning model** using **labeled data**\n", 89 | "\n", 90 | " - \"Labeled data\" has been labeled with the outcome\n", 91 | " - \"Machine Learning model\" learns the relationship between the attributes of the data and its outcome\n", 92 | "\n", 93 | "2. Then, make **predictions** on **new data** for which the label is unknown" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "![Supervised learning](images/01_supervised_learning.png)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "## Questions about Machine Learning\n", 115 | "\n", 116 | "- How do I choose **which attributes** of my data to include in the model?\n", 117 | "- How do I choose **which model** to use?\n", 118 | "- How do I **optimize** this model for best performance?\n", 119 | "- How do I ensure that I'm building a model that will **generalize** to unseen data?\n", 120 | "- Can I **estimate** how well my model is likely to perform on unseen data?" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "## Resources\n", 128 | "\n", 129 | "- Book: [An Introduction to Statistical Learning](https://www.statlearning.com/) (section 2.1, 14 pages)\n", 130 | "- Video: [Learning Paradigms](https://www.youtube.com/watch?v=mbyG85GZ0PI&t=2162s) (13 minutes, starting at 36:02)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Comments or Questions?\n", 138 | "\n", 139 | "- Email: \n", 140 | "- Website: https://www.dataschool.io\n", 141 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 142 | "\n", 143 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 144 | ] 145 | } 146 | ], 147 | "metadata": { 148 | "kernelspec": { 149 | "display_name": "Python 3", 150 | "language": "python", 151 | "name": "python3" 152 | }, 153 | "language_info": { 154 | "codemirror_mode": { 155 | "name": "ipython", 156 | "version": 3 157 | }, 158 | "file_extension": ".py", 159 | "mimetype": "text/x-python", 160 | "name": "python", 161 | "nbconvert_exporter": "python", 162 | "pygments_lexer": "ipython3", 163 | "version": "3.9.4" 164 | } 165 | }, 166 | "nbformat": 4, 167 | "nbformat_minor": 1 168 | } 169 | -------------------------------------------------------------------------------- /02_machine_learning_setup.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video #2](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** Since the video recording, the official name of the \"IPython Notebook\" was changed to \"Jupyter Notebook\". However, the functionality is the same." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- What are the benefits and drawbacks of scikit-learn?\n", 21 | "- How do I install scikit-learn?\n", 22 | "- How do I use the Jupyter Notebook?\n", 23 | "- What are some good resources for learning Python?" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "![scikit-learn algorithm map](images/02_sklearn_algorithms.png)" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## Benefits and drawbacks of scikit-learn\n", 38 | "\n", 39 | "### Benefits:\n", 40 | "\n", 41 | "- **Consistent interface** to Machine Learning models\n", 42 | "- Provides many **tuning parameters** but with **sensible defaults**\n", 43 | "- Exceptional **documentation**\n", 44 | "- Rich set of functionality for **companion tasks**\n", 45 | "- **Active community** for development and support\n", 46 | "\n", 47 | "### Potential drawbacks:\n", 48 | "\n", 49 | "- Harder (than R) to **get started with Machine Learning**\n", 50 | "- Less emphasis (than R) on **model interpretability**\n", 51 | "\n", 52 | "### Further reading:\n", 53 | "\n", 54 | "- Ben Lorica: [Six reasons why I recommend scikit-learn](https://www.oreilly.com/content/six-reasons-why-i-recommend-scikit-learn/)\n", 55 | "- scikit-learn authors: [API design for machine learning software](https://arxiv.org/pdf/1309.0238v1.pdf)\n", 56 | "- Data School: [Should you teach Python or R for data science?](https://www.dataschool.io/python-or-r-for-data-science/)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "![scikit-learn logo](images/02_sklearn_logo.png)" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "## Installing scikit-learn\n", 71 | "\n", 72 | "**Option 1:** [Install scikit-learn library](https://scikit-learn.org/stable/install.html) and dependencies (NumPy and SciPy)\n", 73 | "\n", 74 | "**Option 2:** [Install Anaconda distribution](https://www.anaconda.com/products/individual) of Python, which includes:\n", 75 | "\n", 76 | "- Hundreds of useful packages (including scikit-learn)\n", 77 | "- IPython and Jupyter Notebook\n", 78 | "- conda package manager\n", 79 | "- Spyder IDE" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "![Jupyter logo](images/02_jupyter_logo.svg)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Using the Jupyter Notebook\n", 94 | "\n", 95 | "### Components:\n", 96 | "\n", 97 | "- **IPython interpreter:** enhanced version of the standard Python interpreter\n", 98 | "- **Browser-based notebook interface:** weave together code, formatted text, and plots\n", 99 | "\n", 100 | "### Installation:\n", 101 | "\n", 102 | "- **Option 1:** [Install the Jupyter notebook](https://jupyter.readthedocs.io/en/latest/install.html) (includes IPython)\n", 103 | "- **Option 2:** Included with the Anaconda distribution\n", 104 | "\n", 105 | "### Launching the Notebook:\n", 106 | "\n", 107 | "- Type **jupyter notebook** at the command line to open the dashboard\n", 108 | "- Don't close the command line window while the Notebook is running\n", 109 | "\n", 110 | "### Keyboard shortcuts:\n", 111 | "\n", 112 | "**Command mode** (gray border)\n", 113 | "\n", 114 | "- Create new cells above (**a**) or below (**b**) the current cell\n", 115 | "- Navigate using the **up arrow** and **down arrow**\n", 116 | "- Convert the cell type to Markdown (**m**) or code (**y**)\n", 117 | "- See keyboard shortcuts using **h**\n", 118 | "- Switch to Edit mode using **Enter**\n", 119 | "\n", 120 | "**Edit mode** (green border)\n", 121 | "\n", 122 | "- **Ctrl+Enter** to run a cell\n", 123 | "- Switch to Command mode using **Esc**\n", 124 | "\n", 125 | "### IPython, Jupyter, and Markdown resources:\n", 126 | "\n", 127 | "- [nbviewer](https://nbviewer.jupyter.org/): view notebooks online as static documents\n", 128 | "- [IPython documentation](https://ipython.readthedocs.io/en/stable/)\n", 129 | "- [Jupyter Notebook quickstart](https://jupyter.readthedocs.io/en/latest/content-quickstart.html)\n", 130 | "- [GitHub's Mastering Markdown](https://guides.github.com/features/mastering-markdown/): short guide with lots of examples" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Resources for learning Python\n", 138 | "\n", 139 | "- [Codecademy's Python course](https://www.codecademy.com/learn/learn-python): browser-based, tons of exercises\n", 140 | "- [DataQuest](https://www.dataquest.io/): browser-based, teaches Python in the context of data science\n", 141 | "- [Google's Python class](https://developers.google.com/edu/python/): slightly more advanced, includes videos and downloadable exercises (with solutions)\n", 142 | "- [Python for Everybody](https://www.py4e.com/): beginner-oriented book, includes slides and videos" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "## Comments or Questions?\n", 150 | "\n", 151 | "- Email: \n", 152 | "- Website: https://www.dataschool.io\n", 153 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 154 | "\n", 155 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 156 | ] 157 | } 158 | ], 159 | "metadata": { 160 | "kernelspec": { 161 | "display_name": "Python 3", 162 | "language": "python", 163 | "name": "python3" 164 | }, 165 | "language_info": { 166 | "codemirror_mode": { 167 | "name": "ipython", 168 | "version": 3 169 | }, 170 | "file_extension": ".py", 171 | "mimetype": "text/x-python", 172 | "name": "python", 173 | "nbconvert_exporter": "python", 174 | "pygments_lexer": "ipython3", 175 | "version": "3.9.4" 176 | } 177 | }, 178 | "nbformat": 4, 179 | "nbformat_minor": 1 180 | } 181 | -------------------------------------------------------------------------------- /03_getting_started_with_iris.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Getting started in scikit-learn with the famous iris dataset ([video #3](https://www.youtube.com/watch?v=hd1W4CyPX58&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=3))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- What is the famous iris dataset, and how does it relate to Machine Learning?\n", 21 | "- How do we load the iris dataset into scikit-learn?\n", 22 | "- How do we describe a dataset using Machine Learning terminology?\n", 23 | "- What are scikit-learn's four key requirements for working with data?" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Introducing the iris dataset" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "![Iris](images/03_iris.png)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "- 50 samples of 3 different species of iris (150 samples total)\n", 45 | "- Measurements: sepal length, sepal width, petal length, petal width" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 1, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "# added empty cell so that the cell numbering matches the video" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": { 61 | "scrolled": false 62 | }, 63 | "outputs": [ 64 | { 65 | "data": { 66 | "text/html": [ 67 | "\n", 68 | " \n", 75 | " " 76 | ], 77 | "text/plain": [ 78 | "" 79 | ] 80 | }, 81 | "execution_count": 2, 82 | "metadata": {}, 83 | "output_type": "execute_result" 84 | } 85 | ], 86 | "source": [ 87 | "from IPython.display import IFrame\n", 88 | "IFrame('https://www.dataschool.io/files/iris.txt', width=300, height=200)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Machine Learning on the iris dataset\n", 96 | "\n", 97 | "- Framed as a **supervised learning** problem: Predict the species of an iris using the measurements\n", 98 | "- Famous dataset for Machine Learning because prediction is **easy**\n", 99 | "- Learn more about the iris dataset: [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/Iris)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "## Loading the iris dataset into scikit-learn" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 3, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "# import load_iris function from datasets module\n", 116 | "from sklearn.datasets import load_iris" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 4, 122 | "metadata": {}, 123 | "outputs": [ 124 | { 125 | "data": { 126 | "text/plain": [ 127 | "sklearn.utils.Bunch" 128 | ] 129 | }, 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "output_type": "execute_result" 133 | } 134 | ], 135 | "source": [ 136 | "# save \"bunch\" object containing iris dataset and its attributes\n", 137 | "iris = load_iris()\n", 138 | "type(iris)" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 5, 144 | "metadata": { 145 | "scrolled": true 146 | }, 147 | "outputs": [ 148 | { 149 | "name": "stdout", 150 | "output_type": "stream", 151 | "text": [ 152 | "[[5.1 3.5 1.4 0.2]\n", 153 | " [4.9 3. 1.4 0.2]\n", 154 | " [4.7 3.2 1.3 0.2]\n", 155 | " [4.6 3.1 1.5 0.2]\n", 156 | " [5. 3.6 1.4 0.2]\n", 157 | " [5.4 3.9 1.7 0.4]\n", 158 | " [4.6 3.4 1.4 0.3]\n", 159 | " [5. 3.4 1.5 0.2]\n", 160 | " [4.4 2.9 1.4 0.2]\n", 161 | " [4.9 3.1 1.5 0.1]\n", 162 | " [5.4 3.7 1.5 0.2]\n", 163 | " [4.8 3.4 1.6 0.2]\n", 164 | " [4.8 3. 1.4 0.1]\n", 165 | " [4.3 3. 1.1 0.1]\n", 166 | " [5.8 4. 1.2 0.2]\n", 167 | " [5.7 4.4 1.5 0.4]\n", 168 | " [5.4 3.9 1.3 0.4]\n", 169 | " [5.1 3.5 1.4 0.3]\n", 170 | " [5.7 3.8 1.7 0.3]\n", 171 | " [5.1 3.8 1.5 0.3]\n", 172 | " [5.4 3.4 1.7 0.2]\n", 173 | " [5.1 3.7 1.5 0.4]\n", 174 | " [4.6 3.6 1. 0.2]\n", 175 | " [5.1 3.3 1.7 0.5]\n", 176 | " [4.8 3.4 1.9 0.2]\n", 177 | " [5. 3. 1.6 0.2]\n", 178 | " [5. 3.4 1.6 0.4]\n", 179 | " [5.2 3.5 1.5 0.2]\n", 180 | " [5.2 3.4 1.4 0.2]\n", 181 | " [4.7 3.2 1.6 0.2]\n", 182 | " [4.8 3.1 1.6 0.2]\n", 183 | " [5.4 3.4 1.5 0.4]\n", 184 | " [5.2 4.1 1.5 0.1]\n", 185 | " [5.5 4.2 1.4 0.2]\n", 186 | " [4.9 3.1 1.5 0.2]\n", 187 | " [5. 3.2 1.2 0.2]\n", 188 | " [5.5 3.5 1.3 0.2]\n", 189 | " [4.9 3.6 1.4 0.1]\n", 190 | " [4.4 3. 1.3 0.2]\n", 191 | " [5.1 3.4 1.5 0.2]\n", 192 | " [5. 3.5 1.3 0.3]\n", 193 | " [4.5 2.3 1.3 0.3]\n", 194 | " [4.4 3.2 1.3 0.2]\n", 195 | " [5. 3.5 1.6 0.6]\n", 196 | " [5.1 3.8 1.9 0.4]\n", 197 | " [4.8 3. 1.4 0.3]\n", 198 | " [5.1 3.8 1.6 0.2]\n", 199 | " [4.6 3.2 1.4 0.2]\n", 200 | " [5.3 3.7 1.5 0.2]\n", 201 | " [5. 3.3 1.4 0.2]\n", 202 | " [7. 3.2 4.7 1.4]\n", 203 | " [6.4 3.2 4.5 1.5]\n", 204 | " [6.9 3.1 4.9 1.5]\n", 205 | " [5.5 2.3 4. 1.3]\n", 206 | " [6.5 2.8 4.6 1.5]\n", 207 | " [5.7 2.8 4.5 1.3]\n", 208 | " [6.3 3.3 4.7 1.6]\n", 209 | " [4.9 2.4 3.3 1. ]\n", 210 | " [6.6 2.9 4.6 1.3]\n", 211 | " [5.2 2.7 3.9 1.4]\n", 212 | " [5. 2. 3.5 1. ]\n", 213 | " [5.9 3. 4.2 1.5]\n", 214 | " [6. 2.2 4. 1. ]\n", 215 | " [6.1 2.9 4.7 1.4]\n", 216 | " [5.6 2.9 3.6 1.3]\n", 217 | " [6.7 3.1 4.4 1.4]\n", 218 | " [5.6 3. 4.5 1.5]\n", 219 | " [5.8 2.7 4.1 1. ]\n", 220 | " [6.2 2.2 4.5 1.5]\n", 221 | " [5.6 2.5 3.9 1.1]\n", 222 | " [5.9 3.2 4.8 1.8]\n", 223 | " [6.1 2.8 4. 1.3]\n", 224 | " [6.3 2.5 4.9 1.5]\n", 225 | " [6.1 2.8 4.7 1.2]\n", 226 | " [6.4 2.9 4.3 1.3]\n", 227 | " [6.6 3. 4.4 1.4]\n", 228 | " [6.8 2.8 4.8 1.4]\n", 229 | " [6.7 3. 5. 1.7]\n", 230 | " [6. 2.9 4.5 1.5]\n", 231 | " [5.7 2.6 3.5 1. ]\n", 232 | " [5.5 2.4 3.8 1.1]\n", 233 | " [5.5 2.4 3.7 1. ]\n", 234 | " [5.8 2.7 3.9 1.2]\n", 235 | " [6. 2.7 5.1 1.6]\n", 236 | " [5.4 3. 4.5 1.5]\n", 237 | " [6. 3.4 4.5 1.6]\n", 238 | " [6.7 3.1 4.7 1.5]\n", 239 | " [6.3 2.3 4.4 1.3]\n", 240 | " [5.6 3. 4.1 1.3]\n", 241 | " [5.5 2.5 4. 1.3]\n", 242 | " [5.5 2.6 4.4 1.2]\n", 243 | " [6.1 3. 4.6 1.4]\n", 244 | " [5.8 2.6 4. 1.2]\n", 245 | " [5. 2.3 3.3 1. ]\n", 246 | " [5.6 2.7 4.2 1.3]\n", 247 | " [5.7 3. 4.2 1.2]\n", 248 | " [5.7 2.9 4.2 1.3]\n", 249 | " [6.2 2.9 4.3 1.3]\n", 250 | " [5.1 2.5 3. 1.1]\n", 251 | " [5.7 2.8 4.1 1.3]\n", 252 | " [6.3 3.3 6. 2.5]\n", 253 | " [5.8 2.7 5.1 1.9]\n", 254 | " [7.1 3. 5.9 2.1]\n", 255 | " [6.3 2.9 5.6 1.8]\n", 256 | " [6.5 3. 5.8 2.2]\n", 257 | " [7.6 3. 6.6 2.1]\n", 258 | " [4.9 2.5 4.5 1.7]\n", 259 | " [7.3 2.9 6.3 1.8]\n", 260 | " [6.7 2.5 5.8 1.8]\n", 261 | " [7.2 3.6 6.1 2.5]\n", 262 | " [6.5 3.2 5.1 2. ]\n", 263 | " [6.4 2.7 5.3 1.9]\n", 264 | " [6.8 3. 5.5 2.1]\n", 265 | " [5.7 2.5 5. 2. ]\n", 266 | " [5.8 2.8 5.1 2.4]\n", 267 | " [6.4 3.2 5.3 2.3]\n", 268 | " [6.5 3. 5.5 1.8]\n", 269 | " [7.7 3.8 6.7 2.2]\n", 270 | " [7.7 2.6 6.9 2.3]\n", 271 | " [6. 2.2 5. 1.5]\n", 272 | " [6.9 3.2 5.7 2.3]\n", 273 | " [5.6 2.8 4.9 2. ]\n", 274 | " [7.7 2.8 6.7 2. ]\n", 275 | " [6.3 2.7 4.9 1.8]\n", 276 | " [6.7 3.3 5.7 2.1]\n", 277 | " [7.2 3.2 6. 1.8]\n", 278 | " [6.2 2.8 4.8 1.8]\n", 279 | " [6.1 3. 4.9 1.8]\n", 280 | " [6.4 2.8 5.6 2.1]\n", 281 | " [7.2 3. 5.8 1.6]\n", 282 | " [7.4 2.8 6.1 1.9]\n", 283 | " [7.9 3.8 6.4 2. ]\n", 284 | " [6.4 2.8 5.6 2.2]\n", 285 | " [6.3 2.8 5.1 1.5]\n", 286 | " [6.1 2.6 5.6 1.4]\n", 287 | " [7.7 3. 6.1 2.3]\n", 288 | " [6.3 3.4 5.6 2.4]\n", 289 | " [6.4 3.1 5.5 1.8]\n", 290 | " [6. 3. 4.8 1.8]\n", 291 | " [6.9 3.1 5.4 2.1]\n", 292 | " [6.7 3.1 5.6 2.4]\n", 293 | " [6.9 3.1 5.1 2.3]\n", 294 | " [5.8 2.7 5.1 1.9]\n", 295 | " [6.8 3.2 5.9 2.3]\n", 296 | " [6.7 3.3 5.7 2.5]\n", 297 | " [6.7 3. 5.2 2.3]\n", 298 | " [6.3 2.5 5. 1.9]\n", 299 | " [6.5 3. 5.2 2. ]\n", 300 | " [6.2 3.4 5.4 2.3]\n", 301 | " [5.9 3. 5.1 1.8]]\n" 302 | ] 303 | } 304 | ], 305 | "source": [ 306 | "# print the iris data\n", 307 | "print(iris.data)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "## Machine Learning terminology\n", 315 | "\n", 316 | "- Each row is an **observation** (also known as: sample, example, instance, record)\n", 317 | "- Each column is a **feature** (also known as: predictor, attribute, independent variable, input, regressor, covariate)" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 6, 323 | "metadata": {}, 324 | "outputs": [ 325 | { 326 | "name": "stdout", 327 | "output_type": "stream", 328 | "text": [ 329 | "['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n" 330 | ] 331 | } 332 | ], 333 | "source": [ 334 | "# print the names of the four features\n", 335 | "print(iris.feature_names)" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 7, 341 | "metadata": {}, 342 | "outputs": [ 343 | { 344 | "name": "stdout", 345 | "output_type": "stream", 346 | "text": [ 347 | "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", 348 | " 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n", 349 | " 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2\n", 350 | " 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\n", 351 | " 2 2]\n" 352 | ] 353 | } 354 | ], 355 | "source": [ 356 | "# print integers representing the species of each observation\n", 357 | "print(iris.target)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 8, 363 | "metadata": {}, 364 | "outputs": [ 365 | { 366 | "name": "stdout", 367 | "output_type": "stream", 368 | "text": [ 369 | "['setosa' 'versicolor' 'virginica']\n" 370 | ] 371 | } 372 | ], 373 | "source": [ 374 | "# print the encoding scheme for species: 0 = setosa, 1 = versicolor, 2 = virginica\n", 375 | "print(iris.target_names)" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "- Each value we are predicting is the **response** (also known as: target, outcome, label, dependent variable)\n", 383 | "- **Classification** is supervised learning in which the response is categorical\n", 384 | "- **Regression** is supervised learning in which the response is ordered and continuous" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "## Requirements for working with data in scikit-learn\n", 392 | "\n", 393 | "1. Features and response are **separate objects**\n", 394 | "2. Features should always be **numeric**, and response should be **numeric** for regression problems\n", 395 | "3. Features and response should be **NumPy arrays**\n", 396 | "4. Features and response should have **specific shapes**" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": 9, 402 | "metadata": {}, 403 | "outputs": [ 404 | { 405 | "name": "stdout", 406 | "output_type": "stream", 407 | "text": [ 408 | "\n", 409 | "\n" 410 | ] 411 | } 412 | ], 413 | "source": [ 414 | "# check the types of the features and response\n", 415 | "print(type(iris.data))\n", 416 | "print(type(iris.target))" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 10, 422 | "metadata": {}, 423 | "outputs": [ 424 | { 425 | "name": "stdout", 426 | "output_type": "stream", 427 | "text": [ 428 | "(150, 4)\n" 429 | ] 430 | } 431 | ], 432 | "source": [ 433 | "# check the shape of the features (first dimension = number of observations, second dimensions = number of features)\n", 434 | "print(iris.data.shape)" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 11, 440 | "metadata": {}, 441 | "outputs": [ 442 | { 443 | "name": "stdout", 444 | "output_type": "stream", 445 | "text": [ 446 | "(150,)\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "# check the shape of the response (single dimension matching the number of observations)\n", 452 | "print(iris.target.shape)" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 12, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "# store feature matrix in \"X\"\n", 462 | "X = iris.data\n", 463 | "\n", 464 | "# store response vector in \"y\"\n", 465 | "y = iris.target" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "## Resources\n", 473 | "\n", 474 | "- scikit-learn documentation: [Dataset loading utilities](https://scikit-learn.org/stable/datasets.html)\n", 475 | "- Jake VanderPlas: Fast Numerical Computing with NumPy ([slides](https://speakerdeck.com/jakevdp/losing-your-loops-fast-numerical-computing-with-numpy-pycon-2015), [video](https://www.youtube.com/watch?v=EEUXKG97YRw))\n", 476 | "- Scott Shell: [An Introduction to NumPy](https://sites.engineering.ucsb.edu/~shell/che210d/numpy.pdf) (PDF)" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "## Comments or Questions?\n", 484 | "\n", 485 | "- Email: \n", 486 | "- Website: https://www.dataschool.io\n", 487 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 488 | "\n", 489 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 490 | ] 491 | } 492 | ], 493 | "metadata": { 494 | "kernelspec": { 495 | "display_name": "Python 3", 496 | "language": "python", 497 | "name": "python3" 498 | }, 499 | "language_info": { 500 | "codemirror_mode": { 501 | "name": "ipython", 502 | "version": 3 503 | }, 504 | "file_extension": ".py", 505 | "mimetype": "text/x-python", 506 | "name": "python", 507 | "nbconvert_exporter": "python", 508 | "pygments_lexer": "ipython3", 509 | "version": "3.9.4" 510 | } 511 | }, 512 | "nbformat": 4, 513 | "nbformat_minor": 1 514 | } 515 | -------------------------------------------------------------------------------- /04_model_training.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Training a Machine Learning model with scikit-learn ([video #4](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- What is the **K-nearest neighbors** classification model?\n", 21 | "- What are the four steps for **model training and prediction** in scikit-learn?\n", 22 | "- How can I apply this pattern to **other Machine Learning models**?" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Reviewing the iris dataset" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# added empty cell so that the cell numbering matches the video" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/html": [ 49 | "\n", 50 | " \n", 57 | " " 58 | ], 59 | "text/plain": [ 60 | "" 61 | ] 62 | }, 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "from IPython.display import IFrame\n", 70 | "IFrame('https://www.dataschool.io/files/iris.txt', width=300, height=200)" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "- 150 **observations**\n", 78 | "- 4 **features** (sepal length, sepal width, petal length, petal width)\n", 79 | "- **Response** variable is the iris species\n", 80 | "- **Classification** problem since response is categorical\n", 81 | "- More information in the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/Iris)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "## K-nearest neighbors (KNN) classification" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "1. Pick a value for K.\n", 96 | "2. Search for the K observations in the training data that are \"nearest\" to the measurements of the unknown iris.\n", 97 | "3. Use the most popular response value from the K nearest neighbors as the predicted response value for the unknown iris." 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### Example training data\n", 105 | "\n", 106 | "![Training data](images/04_knn_dataset.png)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "### KNN classification map (K=1)\n", 114 | "\n", 115 | "![1NN classification map](images/04_1nn_map.png)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### KNN classification map (K=5)\n", 123 | "\n", 124 | "![5NN classification map](images/04_5nn_map.png)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "*Image Credits: [Data3classes](https://commons.wikimedia.org/wiki/File:Data3classes.png#/media/File:Data3classes.png), [Map1NN](https://commons.wikimedia.org/wiki/File:Map1NN.png#/media/File:Map1NN.png), [Map5NN](https://commons.wikimedia.org/wiki/File:Map5NN.png#/media/File:Map5NN.png) by Agor153. Licensed under CC BY-SA 3.0*" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "## Loading the data" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 3, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "# import load_iris function from datasets module\n", 148 | "from sklearn.datasets import load_iris\n", 149 | "\n", 150 | "# save \"bunch\" object containing iris dataset and its attributes\n", 151 | "iris = load_iris()\n", 152 | "\n", 153 | "# store feature matrix in \"X\"\n", 154 | "X = iris.data\n", 155 | "\n", 156 | "# store response vector in \"y\"\n", 157 | "y = iris.target" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 4, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "(150, 4)\n", 170 | "(150,)\n" 171 | ] 172 | } 173 | ], 174 | "source": [ 175 | "# print the shapes of X and y\n", 176 | "print(X.shape)\n", 177 | "print(y.shape)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "## scikit-learn 4-step modeling pattern" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "**Step 1:** Import the class you plan to use" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "from sklearn.neighbors import KNeighborsClassifier" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "**Step 2:** \"Instantiate\" the \"estimator\"\n", 208 | "\n", 209 | "- \"Estimator\" is scikit-learn's term for model\n", 210 | "- \"Instantiate\" means \"make an instance of\"" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 6, 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "knn = KNeighborsClassifier(n_neighbors=1)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "- Name of the object does not matter\n", 227 | "- Can specify tuning parameters (aka \"hyperparameters\") during this step\n", 228 | "- All parameters not specified are set to their defaults" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 7, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "name": "stdout", 238 | "output_type": "stream", 239 | "text": [ 240 | "KNeighborsClassifier(n_neighbors=1)\n" 241 | ] 242 | } 243 | ], 244 | "source": [ 245 | "print(knn)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "**Step 3:** Fit the model with data (aka \"model training\")\n", 253 | "\n", 254 | "- Model is learning the relationship between X and y\n", 255 | "- Occurs in-place" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 8, 261 | "metadata": {}, 262 | "outputs": [ 263 | { 264 | "data": { 265 | "text/plain": [ 266 | "KNeighborsClassifier(n_neighbors=1)" 267 | ] 268 | }, 269 | "execution_count": 8, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "knn.fit(X, y)" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "**Step 4:** Predict the response for a new observation\n", 283 | "\n", 284 | "- New observations are called \"out-of-sample\" data\n", 285 | "- Uses the information it learned during the model training process" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 9, 291 | "metadata": {}, 292 | "outputs": [ 293 | { 294 | "data": { 295 | "text/plain": [ 296 | "array([2])" 297 | ] 298 | }, 299 | "execution_count": 9, 300 | "metadata": {}, 301 | "output_type": "execute_result" 302 | } 303 | ], 304 | "source": [ 305 | "knn.predict([[3, 5, 4, 2]])" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "- Returns a NumPy array\n", 313 | "- Can predict for multiple observations at once" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 10, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "data": { 323 | "text/plain": [ 324 | "array([2, 1])" 325 | ] 326 | }, 327 | "execution_count": 10, 328 | "metadata": {}, 329 | "output_type": "execute_result" 330 | } 331 | ], 332 | "source": [ 333 | "X_new = [[3, 5, 4, 2], [5, 4, 3, 2]]\n", 334 | "knn.predict(X_new)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "## Using a different value for K" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": 11, 347 | "metadata": {}, 348 | "outputs": [ 349 | { 350 | "data": { 351 | "text/plain": [ 352 | "array([1, 1])" 353 | ] 354 | }, 355 | "execution_count": 11, 356 | "metadata": {}, 357 | "output_type": "execute_result" 358 | } 359 | ], 360 | "source": [ 361 | "# instantiate the model (using the value K=5)\n", 362 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 363 | "\n", 364 | "# fit the model with data\n", 365 | "knn.fit(X, y)\n", 366 | "\n", 367 | "# predict the response for new observations\n", 368 | "knn.predict(X_new)" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "## Using a different classification model" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 12, 381 | "metadata": {}, 382 | "outputs": [ 383 | { 384 | "data": { 385 | "text/plain": [ 386 | "array([2, 0])" 387 | ] 388 | }, 389 | "execution_count": 12, 390 | "metadata": {}, 391 | "output_type": "execute_result" 392 | } 393 | ], 394 | "source": [ 395 | "# import the class\n", 396 | "from sklearn.linear_model import LogisticRegression\n", 397 | "\n", 398 | "# instantiate the model\n", 399 | "logreg = LogisticRegression(solver='liblinear')\n", 400 | "\n", 401 | "# fit the model with data\n", 402 | "logreg.fit(X, y)\n", 403 | "\n", 404 | "# predict the response for new observations\n", 405 | "logreg.predict(X_new)" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "## Resources\n", 413 | "\n", 414 | "- [Nearest Neighbors](https://scikit-learn.org/stable/modules/neighbors.html) (user guide), [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) (class documentation)\n", 415 | "- [Logistic Regression](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) (user guide), [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) (class documentation)\n", 416 | "- [Videos from An Introduction to Statistical Learning](https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/)\n", 417 | " - Classification Problems and K-Nearest Neighbors (Chapter 2)\n", 418 | " - Introduction to Classification (Chapter 4)\n", 419 | " - Logistic Regression and Maximum Likelihood (Chapter 4)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "## Comments or Questions?\n", 427 | "\n", 428 | "- Email: \n", 429 | "- Website: https://www.dataschool.io\n", 430 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 431 | "\n", 432 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 433 | ] 434 | } 435 | ], 436 | "metadata": { 437 | "kernelspec": { 438 | "display_name": "Python 3", 439 | "language": "python", 440 | "name": "python3" 441 | }, 442 | "language_info": { 443 | "codemirror_mode": { 444 | "name": "ipython", 445 | "version": 3 446 | }, 447 | "file_extension": ".py", 448 | "mimetype": "text/x-python", 449 | "name": "python", 450 | "nbconvert_exporter": "python", 451 | "pygments_lexer": "ipython3", 452 | "version": "3.9.4" 453 | } 454 | }, 455 | "nbformat": 4, 456 | "nbformat_minor": 1 457 | } 458 | -------------------------------------------------------------------------------- /05_model_evaluation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Comparing Machine Learning models in scikit-learn ([video #5](https://www.youtube.com/watch?v=0pP4EwWJgIU&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=5))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- How do I choose **which model to use** for my supervised learning task?\n", 21 | "- How do I choose the **best tuning parameters** for that model?\n", 22 | "- How do I estimate the **likely performance of my model** on out-of-sample data?" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Review\n", 30 | "\n", 31 | "- Classification task: Predicting the species of an unknown iris\n", 32 | "- Used three classification models: KNN (K=1), KNN (K=5), logistic regression\n", 33 | "- Need a way to choose between the models\n", 34 | "\n", 35 | "**Solution:** Model evaluation procedures" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Evaluation procedure #1: Train and test on the entire dataset" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "1. Train the model on the **entire dataset**.\n", 50 | "2. Test the model on the **same dataset**, and evaluate how well we did by comparing the **predicted** response values with the **true** response values." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "# added empty cell so that the cell numbering matches the video" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "# read in the iris data\n", 69 | "from sklearn.datasets import load_iris\n", 70 | "iris = load_iris()\n", 71 | "\n", 72 | "# create X (features) and y (response)\n", 73 | "X = iris.data\n", 74 | "y = iris.target" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### Logistic regression" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 3, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", 93 | " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", 94 | " 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", 95 | " 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1,\n", 96 | " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", 97 | " 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,\n", 98 | " 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])" 99 | ] 100 | }, 101 | "execution_count": 3, 102 | "metadata": {}, 103 | "output_type": "execute_result" 104 | } 105 | ], 106 | "source": [ 107 | "# import the class\n", 108 | "from sklearn.linear_model import LogisticRegression\n", 109 | "\n", 110 | "# instantiate the model\n", 111 | "logreg = LogisticRegression(solver='liblinear')\n", 112 | "\n", 113 | "# fit the model with data\n", 114 | "logreg.fit(X, y)\n", 115 | "\n", 116 | "# predict the response values for the observations in X\n", 117 | "logreg.predict(X)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 4, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "data": { 127 | "text/plain": [ 128 | "150" 129 | ] 130 | }, 131 | "execution_count": 4, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "# store the predicted response values\n", 138 | "y_pred = logreg.predict(X)\n", 139 | "\n", 140 | "# check how many predictions were generated\n", 141 | "len(y_pred)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "Classification accuracy:\n", 149 | "\n", 150 | "- **Proportion** of correct predictions\n", 151 | "- Common **evaluation metric** for classification problems" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 5, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | "0.96\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "# compute classification accuracy for the logistic regression model\n", 169 | "from sklearn import metrics\n", 170 | "print(metrics.accuracy_score(y, y_pred))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "- Known as **training accuracy** when you train and test the model on the same data" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "### KNN (K=5)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 6, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | "0.9666666666666667\n" 197 | ] 198 | } 199 | ], 200 | "source": [ 201 | "from sklearn.neighbors import KNeighborsClassifier\n", 202 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 203 | "knn.fit(X, y)\n", 204 | "y_pred = knn.predict(X)\n", 205 | "print(metrics.accuracy_score(y, y_pred))" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### KNN (K=1)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 7, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "1.0\n" 225 | ] 226 | } 227 | ], 228 | "source": [ 229 | "knn = KNeighborsClassifier(n_neighbors=1)\n", 230 | "knn.fit(X, y)\n", 231 | "y_pred = knn.predict(X)\n", 232 | "print(metrics.accuracy_score(y, y_pred))" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "### Problems with training and testing on the same data\n", 240 | "\n", 241 | "- Goal is to estimate likely performance of a model on **out-of-sample data**\n", 242 | "- But, maximizing training accuracy rewards **overly complex models** that won't necessarily generalize\n", 243 | "- Unnecessarily complex models **overfit** the training data" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "![Overfitting](images/05_overfitting.png)" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "*Image Credit: [Overfitting](https://commons.wikimedia.org/wiki/File:Overfitting.svg#/media/File:Overfitting.svg) by Chabacano. Licensed under GFDL via Wikimedia Commons.*" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "## Evaluation procedure #2: Train/test split" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "1. Split the dataset into two pieces: a **training set** and a **testing set**.\n", 272 | "2. Train the model on the **training set**.\n", 273 | "3. Test the model on the **testing set**, and evaluate how well we did." 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 8, 279 | "metadata": {}, 280 | "outputs": [ 281 | { 282 | "name": "stdout", 283 | "output_type": "stream", 284 | "text": [ 285 | "(150, 4)\n", 286 | "(150,)\n" 287 | ] 288 | } 289 | ], 290 | "source": [ 291 | "# print the shapes of X and y\n", 292 | "print(X.shape)\n", 293 | "print(y.shape)" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 9, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "# STEP 1: split X and y into training and testing sets\n", 303 | "from sklearn.model_selection import train_test_split\n", 304 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "![Train/test split](images/05_train_test_split.png)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "What did this accomplish?\n", 319 | "\n", 320 | "- Model can be trained and tested on **different data**\n", 321 | "- Response values are known for the testing set, and thus **predictions can be evaluated**\n", 322 | "- **Testing accuracy** is a better estimate than training accuracy of out-of-sample performance" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 10, 328 | "metadata": {}, 329 | "outputs": [], 330 | "source": [ 331 | "# added empty cell so that the cell numbering matches the video" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 11, 337 | "metadata": {}, 338 | "outputs": [ 339 | { 340 | "name": "stdout", 341 | "output_type": "stream", 342 | "text": [ 343 | "(90, 4)\n", 344 | "(60, 4)\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "# print the shapes of the new X objects\n", 350 | "print(X_train.shape)\n", 351 | "print(X_test.shape)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 12, 357 | "metadata": {}, 358 | "outputs": [ 359 | { 360 | "name": "stdout", 361 | "output_type": "stream", 362 | "text": [ 363 | "(90,)\n", 364 | "(60,)\n" 365 | ] 366 | } 367 | ], 368 | "source": [ 369 | "# print the shapes of the new y objects\n", 370 | "print(y_train.shape)\n", 371 | "print(y_test.shape)" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 13, 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/plain": [ 382 | "LogisticRegression(solver='liblinear')" 383 | ] 384 | }, 385 | "execution_count": 13, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "# STEP 2: train the model on the training set\n", 392 | "logreg = LogisticRegression(solver='liblinear')\n", 393 | "logreg.fit(X_train, y_train)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 14, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "name": "stdout", 403 | "output_type": "stream", 404 | "text": [ 405 | "0.9333333333333333\n" 406 | ] 407 | } 408 | ], 409 | "source": [ 410 | "# STEP 3: make predictions on the testing set\n", 411 | "y_pred = logreg.predict(X_test)\n", 412 | "\n", 413 | "# compare actual response values (y_test) with predicted response values (y_pred)\n", 414 | "print(metrics.accuracy_score(y_test, y_pred))" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "Repeat for KNN with K=5:" 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": 15, 427 | "metadata": {}, 428 | "outputs": [ 429 | { 430 | "name": "stdout", 431 | "output_type": "stream", 432 | "text": [ 433 | "0.9666666666666667\n" 434 | ] 435 | } 436 | ], 437 | "source": [ 438 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 439 | "knn.fit(X_train, y_train)\n", 440 | "y_pred = knn.predict(X_test)\n", 441 | "print(metrics.accuracy_score(y_test, y_pred))" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "Repeat for KNN with K=1:" 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": 16, 454 | "metadata": {}, 455 | "outputs": [ 456 | { 457 | "name": "stdout", 458 | "output_type": "stream", 459 | "text": [ 460 | "0.95\n" 461 | ] 462 | } 463 | ], 464 | "source": [ 465 | "knn = KNeighborsClassifier(n_neighbors=1)\n", 466 | "knn.fit(X_train, y_train)\n", 467 | "y_pred = knn.predict(X_test)\n", 468 | "print(metrics.accuracy_score(y_test, y_pred))" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "Can we locate an even better value for K?" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": 17, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [ 484 | "# try K=1 through K=25 and record testing accuracy\n", 485 | "k_range = list(range(1, 26))\n", 486 | "scores = []\n", 487 | "for k in k_range:\n", 488 | " knn = KNeighborsClassifier(n_neighbors=k)\n", 489 | " knn.fit(X_train, y_train)\n", 490 | " y_pred = knn.predict(X_test)\n", 491 | " scores.append(metrics.accuracy_score(y_test, y_pred))" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 18, 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "data": { 501 | "text/plain": [ 502 | "Text(0, 0.5, 'Testing Accuracy')" 503 | ] 504 | }, 505 | "execution_count": 18, 506 | "metadata": {}, 507 | "output_type": "execute_result" 508 | }, 509 | { 510 | "data": { 511 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEKCAYAAAAFJbKyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAw1ElEQVR4nO3de5xc9Xnf8c93Z6+j285gSUia4WKiGBSKBV7LsZ3QOIQUHDfYNE6hSYxdXEwTbOPEbSm92EnahNpglyQEIsck0NgmDjFBTYnBJRfq1AWEEXcplrnNSGtJMKPrzN6f/nHOzA6j2d2Z2Tk7uzvP+/Xa1+65zfkdrXae+V2e309mhnPOOVevrnYXwDnn3NLigcM551xDPHA455xriAcO55xzDfHA4ZxzriEeOJxzzjUk0sAh6RJJeyTtlXRDjeMJSfdJelrSY5LOrTj2KUnPSXpW0tck9Yf7Pytpn6Rd4dd7o3wG55xzbxRZ4JAUA24DLgW2AFdK2lJ12o3ALjM7D/gQcGt47SbgE8CQmZ0LxIArKq77opltDb8eiOoZnHPOnSzKGsc2YK+ZvWhmY8A9wGVV52wBHgYws93AGZLWh8e6gQFJ3UAc2B9hWZ1zztWpO8LX3gRkKrazwDuqznkKuBz4tqRtwOlAysyekHQz8CpQBB4ys4cqrrtO0oeAncCvmVm++uaSrgGuAVixYsXbzj777BY9lnPOdYYnnnjiNTNbW70/ysChGvuq5ze5CbhV0i7gGeBJYEJSgqB2ciZwGPgzSb9oZn8C3A78ZvhavwncAvzLk25kth3YDjA0NGQ7d+5swSM551znkPRKrf1RBo4skK7YTlHV3GRmR4GPAEgS8FL49U+Al8zsUHjsG8C7gD8xswOl6yV9CfjLCJ/BOedclSj7OB4HNks6U1IvQef2jsoTJA2GxwA+CjwSBpNXgR+VFA8DykXAC+E1Gype4gPAsxE+g3POuSqR1TjMbELSdcCDBKOi7jSz5yRdGx6/AzgHuFvSJPA8cHV47FFJ9wLfBSYImrC2hy/9OUlbCZqqXgY+FtUzOOecO5k6YVp17+NwzrnGSXrCzIaq93vmuHPOuYZ44HDOOdcQDxzOOecaEuVwXLeETU0Zd/79Sxwtjre7KG6JGDojyYU/fFKuWMuMjE9y/659fPBtabq6aqWJuYXigcPV9PS+I/yX//UCAPK/UTcHMzgtGeeRf/ueyO7x0PMH+Hd//gw/tG4Vbzs9Edl93Nw8cLiaXs0VAHjw+gt5y6mr2lwat9h9/sHd/MHfvcjklBGLqDbw6usnAMjkCh442sz7OFxNmTBwpBIDbS6JWwrSiTgTU8bwkWJk98jkiuH3QmT3cPXxwOFqyuYLJFf0sqLPK6VubqlEHJh+c49CJl94w3fXPh44XE2ZXJG01zZcndLJ4P9KlG/q5cARYXBy9fHA4WrK5AukkvF2F8MtERsHB+gSZCNqRpqYnGL/4RHAaxyLgQcOd5LJKWP/4SLphAcOV5+eWBcb1gyQyUdTGxg+MsLklPGmlX0MHxlhYnIqkvu4+njgcCc5cHSE8UkrNz84V49UYoBsRLWBbBiQ3nXWKUxOGcNHRiK5j6uPBw53ktKoFa9xuEakk/HI+h9KzVPvOuuUN2y79vDA4U5Sam5Iex+Ha0A6EefAsRFGJyZb/trZXIEuwdvPTIbb3kHeTh443EkyuQISbBzsb3dR3BKSSgxgBvsi6OfI5Iucurqf05JxuuQ1jnbzwOFOkskXWL+qn77uWLuL4paQUg01ig7yTC4Y5VfuhPckwLbywOFOks0VvWPcNaycyxHBm3omXyj3uaWT0Y3ecvXxwOFOkq34I3WuXutX9dMb6yqPgGqV0YlJDhwdLQemdCIe2egtVx8PHO4NxiamGD464sl/rmFdXWJTYqDl/Q+lPpPpGkecA0dHGRlvfSe8q48HDvcG+w8XMcOnG3FNSSUGWp49Xj3Kr1Tz2HfYm6vaxQOHe4PSp8WUN1W5JqQS8Zb3P1TP1Dw9oaI3V7VLpIFD0iWS9kjaK+mGGscTku6T9LSkxySdW3HsU5Kek/SspK9J6g/3JyV9S9L3wu8+MX8LlRK4vHPcNSOdHCB3YowToxMte81MvkBPTKxfHQwPLzVZeQd5+0QWOCTFgNuAS4EtwJWStlSddiOwy8zOAz4E3Bpeuwn4BDBkZucCMeCK8JobgIfNbDPwcLjtWiSTL9DdJTas8cDhGjf9pt662kA2V2TT4EB5gah1q/ro7e6KbEJFN7coaxzbgL1m9qKZjQH3AJdVnbOF4M0fM9sNnCFpfXisGxiQ1A3Egf3h/suAu8Kf7wLeH9kTdKBMrsDGij9S5xpRzuVoYWZ3Jl94wywGXV0iNdj6TnhXvygDxyYgU7GdDfdVegq4HEDSNuB0IGVm+4CbgVeBYeCImT0UXrPezIYBwu/rat1c0jWSdkraeejQoRY90vKXzXsOh2teaVBFK4fLZvPFk/rcUsl4y4f9uvpFGThqfWS1qu2bgISkXcDHgSeBibDf4jLgTGAjsELSLzZyczPbbmZDZja0du3ahgvfqTyHw81HckUv8d5Yy2ocJ0YnyJ0YO+nDTDrh2ePtFGXgyALpiu0U081NAJjZUTP7iJltJejjWAu8BPwU8JKZHTKzceAbwLvCyw5I2gAQfj8Y4TN0lMLYBK8dH/PJDV3TJJFOxFvWjFR6neoPM+lknHxhnOMt7IR39YsycDwObJZ0pqRegs7tHZUnSBoMjwF8FHjEzI4SNFH9qKS4JAEXAS+E5+0Argp/vgq4P8Jn6Cilqn/KczjcPKRaWBso1Vyq/0+Wtr3W0R6RBQ4zmwCuAx4keNP/upk9J+laSdeGp50DPCdpN8Hoq0+G1z4K3At8F3gmLOf28JqbgIslfQ+4ONx2LTA9Xt5rHK556bD/way6Zbpx5bVhqmrBac/laKvuKF/czB4AHqjad0fFz98BNs9w7WeAz9TY/zpBDcS12PQfqdc4XPNSiQGOj05wuDBOYkXv3BfMIpMvMNAT45Sq14lyJl43N88cd2WZfJH+ni7Wruxrd1HcElZ6U2/FqKfSKL+gxXpaIt7Dit6Y1zjaxAOHK8vmC6QS8ZP+SJ1rRCuTADO52qP8JJWbxNzC88DhyjK5ok9u6OYt1aJ1OcwsrHHU7nNL+fTqbeOBw5VlwhqHc/Oxur+HNQM9865xHA6H2840yq80eqsVnfCuMR44HABHCuMcG5nwjnHXEunkwLyTAOeaqTmdjHNibJJ8YXxe93GN88DhgJkTrZxrRiuSAOeaqTntuRxt44HDATOPl3euGaWO66mp5puRyh9mZvg/OT0k1wPHQvPA4YDpoZNe43CtkE4MMDYxxWvHR5t+jWy+wJqBHlb399S+RwuH/brGeOBwQPCpbVV/N2vitf9InWtEqgW1gUxu9pmaV/Z1k4j3eFNVG3jgcMDM4+Wda8b0lCDN1wYydczUnE62fqlaNzcPHA4IssZ9ckPXKvOdhHBqysJ1OGb/P5lKDPhKgG3ggcOFiVYF7xh3LdPfE2Ptqr6mm6oOHR9lbGJqzv+T6cT8O+Fd4zxwOA4dH2VkfMqzxl1LBYstNdeMVB7lN0dTVSoZZ2xyioPHmu+Ed43zwOEqxst7jcO1TjoZJ3u4uRpHeZTfHAmp5VwOH5K7oDxwuPJ8Px44XCulE3H2Hx5hYnKq4WvrXRtmekiuB46F5IHD+cp/LhLp5ACTU8bwkZGGr83kC6xd1Ud/T2zW8zYNljrhfWTVQvLA4cjkCpyyopd4b6TrerkOk5rH9OqZXH2j/Pp7Yqxb1ee5HAvMA4cLZsX1ZirXYqWO7WwTtYF6cjjK90nOf14s1xgPHM7X4XCR2DDYT5car3FMTE4xfGSk7pma5zN6yzXHA0eHm5wy9h+eebEc55rVE+tiw5qBhpuRho+MMDllDdU4ho8UGW+iE941xwNHh/vB0REmGvgjda4R6eRAw5MQzjUr7kn3SMSZMvhBE53wrjmRBg5Jl0jaI2mvpBtqHE9Iuk/S05Iek3RuuP8tknZVfB2VdH147LOS9lUce2+Uz7DcTU+n7k1VrvWaWZej1CdS74eZVi1V6+oX2TAaSTHgNuBiIAs8LmmHmT1fcdqNwC4z+4Cks8PzLzKzPcDWitfZB9xXcd0XzezmqMreSerN0HWuGelknANHRxkZn5xzaG1JJl+gS0EfSV33mMfoLdecKGsc24C9ZvaimY0B9wCXVZ2zBXgYwMx2A2dIWl91zkXA983slQjL2rEy+SJq4I/UuUaUhtTuO1x/c1UmV2DDmgF6YvW9PW1Y00+sS95BvoCiDBybgEzFdjbcV+kp4HIASduA04FU1TlXAF+r2ndd2Lx1p6RErZtLukbSTkk7Dx061OwzLHvZXIFTV/fT113fp0HnGlFepa+BZqRGZ2rujnWxYU2/1zgWUJSBQzX2VU9heROQkLQL+DjwJDBRfgGpF/hZ4M8qrrkdOIugKWsYuKXWzc1su5kNmdnQ2rVrm3yE5a+R8fLONWq6GamxGkejo/zSibj3cSygKANHFkhXbKeA/ZUnmNlRM/uImW0FPgSsBV6qOOVS4LtmdqDimgNmNmlmU8CXCJrEXJMyuWK5c9G5Vlu3qo/e7q6655IaGZ/k4LHRhj/MpJMDvqDTAooycDwObJZ0ZlhzuALYUXmCpMHwGMBHgUfM7GjFKVdS1UwlaUPF5geAZ1te8g4xOjHJgWMjXuNwkenqEqnBgbqzx0t9IY2O8ksn4hw6FnTCu+hFNqrKzCYkXQc8CMSAO83sOUnXhsfvAM4B7pY0CTwPXF26XlKcYETWx6pe+nOSthI0e71c47ir0/7DI5j5rLguWqkGpgSZHh7eaI2jNEtukR9at7KxArqGRTqrnZk9ADxQte+Oip+/A2ye4doCcEqN/b/U4mJ2rOmpq72pykUnlRjgmezhus7NNDlTc6piXQ4PHNHzzPEO1miGrnPNSCfi5AvjHB+dmPPcbK5Ab6yL9asaGx5ernF4B/mC8MDRwTK5Ij0xcepqz+Fw0Uk3kNmdyRfYlBigq6vWoMyZrV0ZdMJ7B/nC8MDRwTL5AhsHB4g1+EfqXCPKQ3LrCRx1rsNRratLpBKNT6jomjNn4JB0s6QfWYjCuIWVzXkOh4teZcf1XLL5xnM4yvdpYl4s15x6ahy7ge2SHpV0raQ1URfKLYxsvuiTG7rIJeI9rOiNzfmmfnx0gnxhvOkPM83MxOuaM2fgMLM/NLN3EyTonQE8Lemrkt4TdeFcdE6MTvD6ibHy8p7ORUVSsErfHLkc852pOZ2Ic7gwzrGR8aaud/Wrq48jnKH27PDrNYI5pn5V0j0Rls1FKNvksEfnmpFKDMyZPT49PLy5DzPlNc59ssPI1dPH8QVgD/Be4LfM7G1m9t/M7J8C50ddQBeNZhOtnGtGKpxLyqx6urpppRFRzS5jXB695f0ckasnAfBZ4D+GCXnVfJ6oJaqcw+FNVW4BpJNxToxNki+Mk1zRW/OcTK5AvDc24/E579HA6C03P/U0VeWBntJGOL/U+wHM7EhE5XIRy+SKDPTEeNPK5v5InWtEqRYxW3NVNl8knYgjNTc8fDDew8q+bu8gXwD1BI7PVAYIMzsMfCayErkFkc0XSCUGmv4jda4R0+tyzPymHgzFbb7PTVJdfSlu/uoJHLXOiXSOKxe9TL7o/RtuwVTOJVWLmZHJFeY9yq+e0Vtu/uoJHDslfUHSWZLeLOmLwBNRF8xFx8zI5go+osotmFX9PQzGe2bsf8gXxjkxNjnv/5OpxACZ/Oyd8G7+6gkcHwfGgD8lWIlvBPiVKAvlonWkOM6x0QnvGHcLKsjsrl0baNUov3QiTmFsktyJsXm9jpvdnE1OZnYCuGEByuIWSKkq71njbiGlkwPsHj5W81irRvmV+1LyRU5Z2Tev13IzqyePY62kz0t6QNJfl74WonAuGqU/Us8adwspnYiTzReZmjq5GalVH2YamYnXNa+epqqvEMxXdSbw6wSr7j0eYZlcxDz5z7VDKhlnbHKKQ8dHTzqWzRcYjPewqr+nxpX1K+dy+MiqSNUTOE4xsy8D42b2d2b2L4EfjbhcLkLZfJHV/d2sGZjfH6lzjSjlctSqDWTCHI75WtHXTXJFr+dyRKyewFGaMWxY0s9IOh9IRVgmF7HMPKaudq5Z0/0PJweObG5+ORxvuI+vyxG5egLHfwmnUv814NPAHwKfirRULlIZH4rr2mDTYKnG8cbawNSUkc0XW9bnlgr7Ulx0Zg0c4ay4m83siJk9a2bvCSc53FHPi0u6RNIeSXslnTQyS1JC0n2Snpb0mKRzw/1vkbSr4uuopOvDY0lJ35L0vfB7ovHH7lxmVp7awbmF1N8TY92qvpNqAwePjTI2OdX05IbVUskB9s3QCe9aY9bAYWaTwM8288Jh0LkNuBTYAlwpaUvVaTcCu8zsPIL1Pm4N77vHzLaa2VbgbUABuC+85gbgYTPbDDyMDxVuyKFjo4xOTHlTlWuLdPLkVfrKo/xa9H8ynQg64Q8cG2nJ67mT1dNU9X8l/Z6kH5d0Qemrjuu2AXvN7EUzGwPuAS6rOmcLwZs/ZrYbOEPS+qpzLgK+b2avhNuXAXeFP98FvL+OsrhQeby853C4NkgnTl6lL9vimZrrmRfLzU89geNdwI8AvwHcEn7dXMd1m4BMxXY23FfpKeByAEnbgNM5ueP9CuBrFdvrzWwYIPy+rtbNJV0jaaeknYcOHaqjuJ0hW17zwGscbuGlk3GGj4wwMTlV3ld6g29Vv1s9M/G6+aknc7zZJWJrTbta3eh4E3CrpF3AM8CTwET5BaRegqayf9/ozc1sO7AdYGhoyBs7Q/NdZc25+Ugn4kxOGcNHRipqBgXWreqjvyfWkntsStTuhHetM2fgkPSfa+03s9+Y49IskK7YTgH7q17jKPCR8D4CXgq/Si4FvmtmByr2HZC0wcyGJW0ADs71DG5aJlfkTSt7GehtzR+pc41IVeRyVA7PbeUov77uGOtX93kSYITqaao6UfE1SfBmfkYd1z0ObJZ0ZlhzuAJ4w2iscFGo0kpCHwUeCYNJyZW8sZmK8DWuCn++Cri/jrK4UPBH6rUN1x61cjkyudZP8Z8Ol6p10ainqeqWym1JN1MVAGa4bkLSdcCDQAy408yek3RtePwO4BzgbkmTwPPA1RX3iQMXAx+reumbgK9Luhp4FfjgXGVx0zL5AlvTPoLZtceGNf3EulRuRhqfnGL4SJF0orr7c37SyTiPvZRr6Wu6ac0syBQH3lzPiWb2APBA1b47Kn7+DrB5hmsLwCk19r9OMNLKNWhicor9h0f4p+f5iCrXHt2xLjas6S/XOIYPjzBlrR/ll04McP+uIuOTU/TE6mlYcY2op4/jGaY7tWPAWoIRVm6JGT4ywuSUeQ6Ha6t0RWZ3q4filqSScaYM9h8ucvopK1r62q6+Gsf7Kn6eAA6Y2cRMJ7vFy4fiusUgnRzgb/cEQ+Sn84pa38cBwf95DxytV08dbgOQM7NXzGwf0C/pHRGXy0XAk//cYpBOxDl4bJSR8UkyuSKxLrFhTX9r7+HrckSqnsBxO3C8YrsQ7nNLTDZXQIINazxwuPZJJUsJekUy+QKnru6nu8X9EKeuDjvhfUhuJOr5bckqVn43syma61R3bZbJF9mwup/ebu8sdO1TudhSpoXTqVfqjnWxcbDfkwAjUs87yIuSPiGpJ/z6JPBi1AVzrZfJFVo2kZxzzSr1Z2RzhZYt4FTzPomTJ1R0rVFP4LiWYL6qfQTZ4O8AromyUC4amXzBO8Zd261d2Udvdxd7Dx7n0LHRyEb5BUmAXuOIQj0JgAcJsr7dEjY6McmBo6PeMe7arqtLpBIDPBom6EX1fzKdHOC140EnfKvmwXKBOWscku6SNFixnZB0Z6Slci23z4fiukUknYiz+wfHyj9Hco9Sk5g3V7VcPU1V55nZ4dKGmeWB8yMrkYtEJt/aqaudm4/K/4dRzZ2W8llyI1NP4OiqXJ5VUhIfVbXklMaze9a4WwxK/w97u7tYt6ovmnskTp5Q0bVGPQHgFoJVAO8Ntz8I/FZ0RXJRyOQL9MTE+tWtTbRyrhmlN/XU4ABdXbWW7pm/tav66Ovu8iTACNTTOX63pJ3ATxIsznS5mT0feclcS2VzRTYNDhCL6I/UuUaUOsSjHB4uBZ3w3lTVenU1OYWB4nlJZwFXSvq6mZ0bbdE6w4nRCT6z4zlOjEY7/dejL+X4kY2rI72Hc/Uq1TjSEfe5pZNxHn3pdf71nzwR6X2acV5qkH/9E2e1uxhNqWd23A3APwf+BXAe8NsECyy5Fnj85Rz3PpHl9FPi9EWY0f2mlb2877wNkb2+c40YjPfwzy5Icem50f6ffN95G9l/uMj3Dx2f++QF9PrxMR5+4SAfu/DNkTXVRWnGwCHpXxEEiBTwdYIV+u43s19foLJ1hNJopz+95p2c2uKJ3pxbrCRxy8+/NfL7/NzbUvzc21KR36dR/+P/vcJ/+otnOXhsdEn+3c9W47gN+A7wL8xsJ4Akm+V814RsrhDpyBLn3OJTaqLL5AtLMnDM1jayEbgH+IKkPZJ+E+hZmGJ1jmy+GOnIEufc4rPUkxNnDBxm9pqZ3W5mFxIs1XoEOCjpBUk+HLdFMnmfeNC5TrNpcGknJ9bVG2tmWTO72czeBrwfGI20VB0kkytEPrLEObe49PfEWL+6b8nmmDScAW5mewDvIG+B46MT5AvjkU254JxbvFJLeNr3SFf0kXRJ2D+yV9INNY4nJN0n6WlJj0k6t+LYoKR7Je0Om8feGe7/rKR9knaFX++N8hmiND0NiNc4nOs06SWcnBhZ4JAUIxiZdSmwhSBxcEvVaTcCu8zsPOBDwK0Vx24FvmlmZwNvBV6oOPZFM9safj0Q1TNErRw4vMbhXMdJJ+MMHykyPjnV7qI0rJ4EwAtq7D4CvGJms6U7bwP2mtmL4evcA1wGVE5XsoUgoRAz2y3pDEnrgSJwIfDh8NgYMDbn0ywx2dJU59457lzHSSfiTBkMHx7htFOW1ntAPTWO3wf+H7Ad+BJBbsc9wD9I+ulZrtsEZCq2s+G+Sk8BlwNI2gacTpBw+GbgEPBHkp6U9IeSVlRcd13YvHVn5cy9lSRdI2mnpJ2HDh2q4zEXXiZfYEVvjETcRzk712lSYRP1UhySW0/geBk438yGwlFV5wPPAj8FfG6W62olJlQnEN4EJCTtAj4OPAlMENSELgBuN7PzgRNAqY/kduAsYCswTDB778k3Mtselnlo7dq1czxie2RyRdLJOJLncDjXaZbytO/1jKo628yeK22Y2fOSzjezF+d4w8sC6YrtFLC/8gQzOwp8BEDBi70UfsWBrJk9Gp56L2HgMLMDpeslfQn4yzqeYVHK5gu+sJJzHWrDmn5iXVqSHeT11Dj2SLpd0j8Ov36foJmqDxif5brHgc2SzpTUS7Bu+Y7KE8KRU73h5keBR8zsqJn9AMhIekt47CLCvpFw0sWSDxDUfpYcMyOTK/hQXOc6VHesiw1r+pdtjePDwC8D1xM0P30b+DRB0HjPTBeZ2YSk64AHgRhwp5k9J+na8PgdwDnA3ZImCQLD1RUv8XHgK2FgeZGwZgJ8TtJWgmavl4GP1fEMi06+MM6JsUnvGHeug6UT8SWZBFjPQk5Fgn6EWn0Js85VHA6VfaBq3x0VP38H2DzDtbuAoRr7f2muMi8F00NxvanKuU6VTg7wN3sW5+Cd2dQzHPfdwGcJRjyVzzezN0dXrOXPh+I659KJOIeOjTIyPkl/T6zdxalbPU1VXwY+BTwBTEZbnM5Ratf0wOFc55qeJbfID61b2ebS1K+ewHHEzP4q8pJ0mEyuQCLew8q+hqcLc84tE6XphjL5wrILHH8j6fPAN6iYFdfMvhtZqTpAJl/0EVXOdbjSe0B2iXWQ1xM43hF+r+yoNuAnW1+czpHNFTh7w6p2F8M510ZrV/bR291VXkJ6qahnVNWMQ25dc6amjGy+yMVb1re7KM65NurqEqnEwJIbkjtj4JD0i2b2J5J+tdZxM/tCdMVa3g4dH2VscspX/nPOBbkcSywJcLYaR2lSwVrtKdVzTrkGeA6Hc64knRxgV+Zwu4vRkBkDh5n9Qfjj/zazv688FuZ2uCb5UFznXEk6EedIcZyjI+Os7l8aM2XXM1fV79a5z9WpNKlZacF651znmh5ZtXQ6yGfr43gn8C5gbVU/x2qCuadckzK5AutW9S2pTFHnXDQqczm2bFzd5tLUZ7Y+jl5gZXhOZT/HUeDnoizUcpfJF7yZyjkHVKzLsYRGVs3Wx/F3wN9J+mMzewVAUhewMlxHwzUpkyvy9jNqLlzonOswg+EMEtkllMtRTx/Hb0taHS7d+jzB+hz/JuJyLVsTk1P84OiI1ziccwBISy+Xo57AsSWsYbyfYIr004BlMbV5OwwfGWFyysrVU+ecSyeXVi5HPYGjR1IPQeC438zG8TyOppU+VZQWqnfOuXQiTjZfxGxpvLXWEzj+gGClvRXAI5JOJ+ggd00o53B4jcM5F0olBiiMTZI7MdbuotRlzsBhZr9jZpvM7L0WeIVZlox1s8vkisS6xIY1/e0uinNukSj1eS6VyQ7nDByS1kv6sqS/Cre3AFdFXrJlKpMvsGFNP92xeip7zrlOUM7lWCId5PW8e/0x8CCwMdz+B+D6iMqz7GXzRW+mcs69QTmXY4l0kM8YOCSVcjzeZGZfB6YAzGyCOpeQlXSJpD2S9kq6ocbxhKT7JD0t6TFJ51YcG5R0r6Tdkl4IM9mRlJT0LUnfC78vqYSITK5Q/nThnHMAK/q6Sa7oLU9HtNjNVuN4LPx+QtIphCOpJP0ocGSuF5YUA24DLgW2AFeGzVyVbgR2mdl5wIeAWyuO3Qp808zOBt4KvBDuvwF42Mw2Aw+H20vCyPgkB4+Neo3DOXeSdGKA7FKvcQAKv/8qsAM4S9LfA3cDH6/jtbcBe83sRTMbA+4BLqs6ZwvBmz9mths4I+xTWQ1cCHw5PDZmZofDay4D7gp/votgmPCSUMoM9aG4zrlqqXBI7lIw21xVlZMb3keQ/CeCdcd/Cnh6jtfeBGQqtrNML0Nb8hRwOfBtSduA04EUQVPYIeCPJL0VeAL4pJmdANab2TCAmQ1LWjdHORYNH4rrnJtJKjnAt54/wNSU0dWluS9oo9lqHDGCSQ5XEeRwdIf74tRe3KlarSevzm65CUhI2kVQi3kSmAjvdQFwu5mdD5ygwSYpSddI2ilp56FDhxq5NDKlBel9uhHnXLV0Is7Y5BQHjo20uyhzmq3GMWxmvzGP184C6YrtFLC/8oRwKpOPAEgS8FL4FQeyZvZoeOq9TAeOA5I2hLWNDcDBWjc3s+3AdoChoaFFkY6ZyRfp7e5i7cq+dhfFObfIlHM5ckU2rFnczdn19HE063Fgs6QzJfUCVxD0lUzfIBg51RtufhR4xMyOmtkPgIykt4THLiKYYJHwNUp5JFcB98+znAsmmy+QSgws+mqoc27hlZaSXgq5HLPVOC6azwub2YSk6whyQGLAnWb2nKRrw+N3AOcAd0uaJAgMV1e8xMeBr4SB5UXCmglB89bXJV0NvAp8cD7lXEiZnOdwOOdq25QYQFoauRyzrceRm++Lm9kDBJ3qlfvuqPj5O8DmGa7dBQzV2P868wxq7ZLJF3hrek27i+GcW4T6umOsX9W/JHI5fN6LBXJsZJzDhfHy+sLOOVcttURyOTxwLJDSpwhvqnLOzSSdXBq5HB44Fkg5h8OT/5xzM0gnBhg+UmR8cqrdRZmVB44FUvoU4TUO59xMUsk4Uwb7Dy/uWocHjgWSyRVY2dfNYLyn3UVxzi1S5VlyF3kHuQeOBVLK4QjyHJ1z7mTldTkWeQe5B44FkskVfUSVc25Wp67uJ9alRZ8E6IFjAZgZmbyvw+Gcm113rIuNg/2LfmSVB44FkDsxRmFs0jvGnXNzSifi3lTlpheg91lxnXNzSSfi3jnuKGeCelOVc24u6eQArx0fpThW1wrdbeGBYwF41rhzrl6llonFPPWIB44FkMkXSK7oZUXfbJMRO+cc5dGXi7mfwwPHAsjkghwO55yby/S6HIu3n8MDxwLI5n0dDudcfdau6qOvu8ubqjrZ1JSxL18k5R3jzrk6SCKVGPAaRyc7cGyEsckpr3E45+qWTi7uXA4PHBHLeg6Hc65BQS6HB46OVfrlp71z3DlXp3RygKMjExwpjre7KDV54IhYqZ1y46AHDudcfcpDchdprcMDR8Qy+QLrV/fR3xNrd1Gcc0tEqU90sY6sijRwSLpE0h5JeyXdUON4QtJ9kp6W9JikcyuOvSzpGUm7JO2s2P9ZSfvC/bskvTfKZ5ivTK7gHePOuYaUpidarLPkRpbKLCkG3AZcDGSBxyXtMLPnK067EdhlZh+QdHZ4/kUVx99jZq/VePkvmtnNUZW9lbL5ItvOTLa7GM65JWTNQA+r+ro7sqlqG7DXzF40szHgHuCyqnO2AA8DmNlu4AxJ6yMs04Ian5xi+EjRO8adcw2RRCoZL8+svdhEGTg2AZmK7Wy4r9JTwOUAkrYBpwOp8JgBD0l6QtI1VdddFzZv3SkpUevmkq6RtFPSzkOHDs33WZoyfHiEKQsWoHfOuUakEwMdWeOotbi2VW3fBCQk7QI+DjwJTITH3m1mFwCXAr8i6cJw/+3AWcBWYBi4pdbNzWy7mQ2Z2dDatWvn8xxNKyXweB+Hc65R6WScbL6IWfXbZvtFOV1rFkhXbKeA/ZUnmNlR4CMAkgS8FH5hZvvD7wcl3UfQ9PWImR0oXS/pS8BfRvgM81L6tOATHDrnGpVKDFAcn+S142OsXdXX7uK8QZQ1jseBzZLOlNQLXAHsqDxB0mB4DOCjBIHhqKQVklaF56wAfhp4NtzeUPESHyjtX4wy+QKxLrFhTX+7i+KcW2LSi3h69chqHGY2Iek64EEgBtxpZs9JujY8fgdwDnC3pEngeeDq8PL1wH1BJYRu4Ktm9s3w2OckbSVo9noZ+FhUzzBfmVyRjYP9dMc8XcY515jpBZ2KXHBaza7ctol0ZSEzewB4oGrfHRU/fwfYXOO6F4G3zvCav9TiYkYmk/ccDudcc1LldTkWX43DPwpHyNfhcM41a0VfN6es6F2U2eMeOCIyMj7JoWOj5QxQ55xrVCoZX5TrcnjgiEjpU0LKaxzOuSalEgOLsnPcA0dESp8SvMbhnGtWOhFn/+Eik1OLK5fDA0dEPPnPOTdf6eQA45PGD46OtLsob+CBIyKZXIG+7q5Fl7jjnFs6ytOrL7KRVR44IpLNF0klBghzUZxzrmGlXI7FNtmhB46IZPIFX2fcOTcvGwf7kRZfLocHjohkcp7D4Zybn77uGKeu7l90I6s8cETg6Mg4R4rjPrmhc27eUokBsossl8MDRwRK1UpvqnLOzVc6EfcaRyco53B4U5Vzbp5SyTg/ODrC6MRku4tS5oEjAqWscU/+c87NVzoxgFmwouhi4YEjAtl8kVV93awZ6Gl3UZxzS9z0kNzF01zlgSMCmVyBVDLuORzOuXkrB45F1EHugSMCmXzBR1Q551ri1NX9dHfJaxzLmZl5DodzrmViXWLj4MCiSgL0wNFir58Yozg+6R3jzrmWSScHFtW0Ix44Wqycw+E1Dudci6QT8UU10aEHjhbL5kvrcHjgcM61RjoZ5/UTYxTGJtpdFMADR8tlyiv/eVOVc641Su8n2UXSXBVp4JB0iaQ9kvZKuqHG8YSk+yQ9LekxSedWHHtZ0jOSdknaWbE/Kelbkr4Xfk9E+QyNyuSKnLKilxV93e0uinNumZgekrs4mqsiCxySYsBtwKXAFuBKSVuqTrsR2GVm5wEfAm6tOv4eM9tqZkMV+24AHjazzcDD4faikfWhuM65Fiu9pyyWwBHlx+JtwF4zexFA0j3AZcDzFedsAX4bwMx2SzpD0nozOzDL614G/ET4813A3wL/rrVFD/zuw99jx1P7G7rmlVyBi7esj6I4zrkOtXZlH/09XfzOX+/lK4++2tC1v3X5P+LtZyRbWp4oA8cmIFOxnQXeUXXOU8DlwLclbQNOB1LAAcCAhyQZ8Admtj28Zr2ZDQOY2bCkdbVuLuka4BqA0047rakHWLuqj83rVzZ0zQ+vX8UvvuP0pu7nnHO1SOLXLn4LT2byDV870BNreXmiDBy15tuwqu2bgFsl7QKeAZ4ESsMG3m1m+8PA8C1Ju83skXpvHgaa7QBDQ0PV963LFdtO44ptzQUd55xrpX914ZvbXYSyKANHFkhXbKeAN7T7mNlR4CMACiZ2ein8wsz2h98PSrqPoOnrEeCApA1hbWMDcDDCZ3DOOVclylFVjwObJZ0pqRe4AthReYKkwfAYwEeBR8zsqKQVklaF56wAfhp4NjxvB3BV+PNVwP0RPoNzzrkqkdU4zGxC0nXAg0AMuNPMnpN0bXj8DuAc4G5JkwSd5leHl68H7gtnl+0Gvmpm3wyP3QR8XdLVwKvAB6N6BueccyeTWVPN/0vK0NCQ7dy5c+4TnXPOlUl6oiodAvDMceeccw3ywOGcc64hHjicc841xAOHc865hnRE57ikQ8ArwJuA19pcnHbq5Ofv5GeHzn7+Tn52mN/zn25ma6t3dkTgKJG0s9YIgU7Ryc/fyc8Onf38nfzsEM3ze1OVc865hnjgcM4515BOCxzb5z5lWevk5+/kZ4fOfv5OfnaI4Pk7qo/DOefc/HVajcM559w8eeBwzjnXkI4JHJIukbRH0l5Ji2qd8oUg6WVJz0jaJWlZz/go6U5JByU9W7EvKelbkr4Xfk+0s4xRmeHZPytpX/i73yXpve0sY1QkpSX9jaQXJD0n6ZPh/k753c/0/C3//XdEH4ekGPAPwMUEC0w9DlxpZs/PeuEyIullYMjMln0ilKQLgePA3WZ2brjvc0DOzG4KPzgkzCySterbaYZn/yxw3MxubmfZohYu7LbBzL4brufzBPB+4MN0xu9+puf/eVr8+++UGsc2YK+ZvWhmY8A9wGVtLpOLSLjEcK5q92XAXeHPdxH8QS07Mzx7RzCzYTP7bvjzMeAFYBOd87uf6flbrlMCxyYgU7GdJaJ/0EXMgIckPSHpmnYXpg3Wm9kwBH9gwLo2l2ehXSfp6bApa1k21VSSdAZwPvAoHfi7r3p+aPHvv1MCh2rsW/5tdG/0bjO7ALgU+JWwScN1htuBs4CtwDBwS1tLEzFJK4E/B643s6PtLs9Cq/H8Lf/9d0rgyALpiu0UsL9NZWkLM9sffj8I3EfQfNdJDoRtwKW24INtLs+CMbMDZjZpZlPAl1jGv3tJPQRvml8xs2+Euzvmd1/r+aP4/XdK4Hgc2CzpTEm9wBXAjjaXacFIWhF2liFpBfDTwLOzX7Xs7ACuCn++Cri/jWVZUKU3zdAHWKa/e0kCvgy8YGZfqDjUEb/7mZ4/it9/R4yqAgiHoP13IAbcaWb/tb0lWjiS3kxQywDoBr66nJ9f0teAnyCYTvoA8BngL4CvA6cBrwIfNLNl14k8w7P/BEEzhQEvAx8rtfkvJ5J+DPg/wDPAVLj7RoJ2/k743c/0/FfS4t9/xwQO55xzrdEpTVXOOedaxAOHc865hnjgcM451xAPHM455xrigcM551xDPHC4ZUPS30r6J1X7rpf0+3NcMxRxub4WTvfwqar9n5X06fDn/nDm1s/UuP6D4YynfzOPMhyv+Pm94Uyxp4VlKEhaN8O5JumWiu1Ph5Mmug7mgcMtJ18jSO6sdEW4vy0knQq8y8zOM7MvznBOL0G27xNm9us1Trka+GUze0+d9+ye5dhFwO8Cl5jZq+Hu14Bfm+GSUeBySW+q596uM3jgcMvJvcD7JPVBeaK3jcC3Jd0uaWe4TkGtN+fqT9o/J+mPw5/XSvpzSY+HX++ucW2/pD9SsObJk5JKb/IPAevCdRB+vMZtuwlma/6emZ20Toyk/wz8GHCHpM/PdB9JH5b0Z5L+Z3jPWs/34wRTTvyMmX2/4tCdwD+XlKxx2QTBmtWfqnHMdSgPHG7ZMLPXgceAS8JdVwB/akGW638wsyHgPOAfSzqvgZe+Ffiimb0d+GfAH9Y451fCMvwjgkzduyT1Az8LfN/MtprZ/6lx3b8FJszs+hme6TeAncAvmNm/meU+AO8ErjKzn6zxUn0EU22838x2Vx07ThA8PlmrDMBtwC9IWjPDcddhPHC45aayuaqymernJX0XeBL4EWBLA6/5U8DvSdpFMO/R6tLcXxV+DPgfAOEb8yvAD9fx2t8G3impnnPnus+3ZplKYxz4vwTNXrX8DnCVpNXVB8IZVu8GPlFnGd0y54HDLTd/AVwk6QJgIFwN7Uzg08BFZnYe8L+A/hrXVs6/U3m8C3hnWGvYamabwoVyKtWaur8ejwDXA38laWMd5892nxOzHJsiWAnu7ZJurD5oZoeBrwK/PMP1/50g6Kyoo4xumfPA4ZYVMzsO/C1B00uptrGa4E31iKT1BGuS1HJA0jmSughmES15CLiutCFpa41rHwF+ITz+wwQT6u2ps8x/Dnwe+KakwTlOn899CsD7CJqdatU8vgB8jKDfpfraHMFEgTPVWFwH8cDhlqOvAW8l6HTGzJ4iaKJ6jiCg/P0M190A/CXw1wQL3pR8AhgKh9Q+D1xb49rfB2KSngH+FPiwmY3WW2AzuwP4BrCjos+ilvneJ0fQB/QfJV1Wdew1glmU+2a4/BaCWXddh/PZcZ1zzjXEaxzOOeca4oHDOedcQzxwOOeca4gHDueccw3xwOGcc64hHjicc841xAOHc865hvx/9DC05L98ys4AAAAASUVORK5CYII=\n", 512 | "text/plain": [ 513 | "
" 514 | ] 515 | }, 516 | "metadata": { 517 | "needs_background": "light" 518 | }, 519 | "output_type": "display_data" 520 | } 521 | ], 522 | "source": [ 523 | "# import Matplotlib (scientific plotting library)\n", 524 | "import matplotlib.pyplot as plt\n", 525 | "\n", 526 | "# allow plots to appear within the notebook\n", 527 | "%matplotlib inline\n", 528 | "\n", 529 | "# plot the relationship between K and testing accuracy\n", 530 | "plt.plot(k_range, scores)\n", 531 | "plt.xlabel('Value of K for KNN')\n", 532 | "plt.ylabel('Testing Accuracy')" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "- **Training accuracy** rises as model complexity increases\n", 540 | "- **Testing accuracy** penalizes models that are too complex or not complex enough\n", 541 | "- For KNN models, complexity is determined by the **value of K** (lower value = more complex)" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "metadata": {}, 547 | "source": [ 548 | "## Making predictions on out-of-sample data" 549 | ] 550 | }, 551 | { 552 | "cell_type": "code", 553 | "execution_count": 19, 554 | "metadata": {}, 555 | "outputs": [ 556 | { 557 | "data": { 558 | "text/plain": [ 559 | "array([1])" 560 | ] 561 | }, 562 | "execution_count": 19, 563 | "metadata": {}, 564 | "output_type": "execute_result" 565 | } 566 | ], 567 | "source": [ 568 | "# instantiate the model with the best known parameters\n", 569 | "knn = KNeighborsClassifier(n_neighbors=11)\n", 570 | "\n", 571 | "# train the model with X and y (not X_train and y_train)\n", 572 | "knn.fit(X, y)\n", 573 | "\n", 574 | "# make a prediction for an out-of-sample observation\n", 575 | "knn.predict([[3, 5, 4, 2]])" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "metadata": {}, 581 | "source": [ 582 | "## Downsides of train/test split?" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "- Provides a **high-variance estimate** of out-of-sample accuracy\n", 590 | "- **K-fold cross-validation** overcomes this limitation\n", 591 | "- But, train/test split is still useful because of its **flexibility and speed**" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "## Resources\n", 599 | "\n", 600 | "- Quora: [What is an intuitive explanation of overfitting?](https://www.quora.com/What-is-an-intuitive-explanation-of-over-fitting-particularly-with-a-small-sample-set-What-are-you-essentially-doing-by-over-fitting-How-does-the-over-promise-of-a-high-R%C2%B2-low-standard-error-occur/answer/Jessica-Su)\n", 601 | "- Video: [Estimating prediction error](https://www.youtube.com/watch?v=ngrOYWgJjb4&list=PL5-da3qGB5IA6E6ZNXu7dp89_uv8yocmf&index=1&t=154s) (12 minutes, starting at 2:34) by Hastie and Tibshirani\n", 602 | "- [Understanding the Bias-Variance Tradeoff](http://scott.fortmann-roe.com/docs/BiasVariance.html)\n", 603 | " - [Guiding questions](https://github.com/justmarkham/DAT8/blob/master/homework/09_bias_variance.md) when reading this article\n", 604 | "- Video: [Visualizing bias and variance](https://www.youtube.com/watch?v=zrEyxfl2-a8&t=1857s) (15 minutes, starting at 30:57) by Abu-Mostafa" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "## Comments or Questions?\n", 612 | "\n", 613 | "- Email: \n", 614 | "- Website: https://www.dataschool.io\n", 615 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 616 | "\n", 617 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 618 | ] 619 | } 620 | ], 621 | "metadata": { 622 | "kernelspec": { 623 | "display_name": "Python 3", 624 | "language": "python", 625 | "name": "python3" 626 | }, 627 | "language_info": { 628 | "codemirror_mode": { 629 | "name": "ipython", 630 | "version": 3 631 | }, 632 | "file_extension": ".py", 633 | "mimetype": "text/x-python", 634 | "name": "python", 635 | "nbconvert_exporter": "python", 636 | "pygments_lexer": "ipython3", 637 | "version": "3.9.4" 638 | } 639 | }, 640 | "nbformat": 4, 641 | "nbformat_minor": 1 642 | } 643 | -------------------------------------------------------------------------------- /07_cross_validation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Cross-validation for parameter tuning, model selection, and feature selection ([video #7](https://www.youtube.com/watch?v=6dbrR-WymjI&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=7))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- What is the drawback of using the **train/test split** procedure for model evaluation?\n", 21 | "- How does **K-fold cross-validation** overcome this limitation?\n", 22 | "- How can cross-validation be used for selecting **tuning parameters**, choosing between **models**, and selecting **features**?\n", 23 | "- What are some possible **improvements** to cross-validation?" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Review of model evaluation procedures" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "**Motivation:** Need a way to choose between Machine Learning models\n", 38 | "\n", 39 | "- Goal is to estimate likely performance of a model on **out-of-sample data**\n", 40 | "\n", 41 | "**Initial idea:** Train and test on the same data\n", 42 | "\n", 43 | "- But, maximizing **training accuracy** rewards overly complex models which **overfit** the training data\n", 44 | "\n", 45 | "**Alternative idea:** Train/test split\n", 46 | "\n", 47 | "- Split the dataset into two pieces, so that the model can be trained and tested on **different data**\n", 48 | "- **Testing accuracy** is a better estimate than training accuracy of out-of-sample performance\n", 49 | "- But, it provides a **high variance** estimate since changing which observations happen to be in the testing set can significantly change testing accuracy" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 1, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "# added empty cell so that the cell numbering matches the video" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "from sklearn.datasets import load_iris\n", 68 | "from sklearn.model_selection import train_test_split\n", 69 | "from sklearn.neighbors import KNeighborsClassifier\n", 70 | "from sklearn import metrics" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "# read in the iris data\n", 80 | "iris = load_iris()\n", 81 | "\n", 82 | "# create X (features) and y (response)\n", 83 | "X = iris.data\n", 84 | "y = iris.target" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 4, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "0.9736842105263158\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "# use train/test split with different random_state values\n", 102 | "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)\n", 103 | "\n", 104 | "# check classification accuracy of KNN with K=5\n", 105 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 106 | "knn.fit(X_train, y_train)\n", 107 | "y_pred = knn.predict(X_test)\n", 108 | "print(metrics.accuracy_score(y_test, y_pred))" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "**Question:** What if we created a bunch of train/test splits, calculated the testing accuracy for each, and averaged the results together?\n", 116 | "\n", 117 | "**Answer:** That's the essense of cross-validation!" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "## Steps for K-fold cross-validation" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "1. Split the dataset into K **equal** partitions (or \"folds\").\n", 132 | "2. Use fold 1 as the **testing set** and the union of the other folds as the **training set**.\n", 133 | "3. Calculate **testing accuracy**.\n", 134 | "4. Repeat steps 2 and 3 K times, using a **different fold** as the testing set each time.\n", 135 | "5. Use the **average testing accuracy** as the estimate of out-of-sample accuracy." 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "Diagram of **5-fold cross-validation:**\n", 143 | "\n", 144 | "![5-fold cross-validation](images/07_cross_validation_diagram.png)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "# added empty cell so that the cell numbering matches the video" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "# added empty cell so that the cell numbering matches the video" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 7, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "Iteration Training set observations Testing set observations\n", 175 | " 1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4] \n", 176 | " 2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9] \n", 177 | " 3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14] \n", 178 | " 4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19] \n", 179 | " 5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24] \n" 180 | ] 181 | } 182 | ], 183 | "source": [ 184 | "# simulate splitting a dataset of 25 observations into 5 folds\n", 185 | "from sklearn.model_selection import KFold\n", 186 | "kf = KFold(n_splits=5, shuffle=False).split(range(25))\n", 187 | "\n", 188 | "# print the contents of each training and testing set\n", 189 | "print('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations'))\n", 190 | "for iteration, data in enumerate(kf, start=1):\n", 191 | " print('{:^9} {} {:^25}'.format(iteration, data[0], str(data[1])))" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "- Dataset contains **25 observations** (numbered 0 through 24)\n", 199 | "- 5-fold cross-validation, thus it runs for **5 iterations**\n", 200 | "- For each iteration, every observation is either in the training set or the testing set, **but not both**\n", 201 | "- Every observation is in the testing set **exactly once**" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "## Comparing cross-validation to train/test split" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "Advantages of **cross-validation:**\n", 216 | "\n", 217 | "- More accurate estimate of out-of-sample accuracy\n", 218 | "- More \"efficient\" use of data (every observation is used for both training and testing)\n", 219 | "\n", 220 | "Advantages of **train/test split:**\n", 221 | "\n", 222 | "- Runs K times faster than K-fold cross-validation\n", 223 | "- Simpler to examine the detailed results of the testing process" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "## Cross-validation recommendations" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "1. K can be any number, but **K=10** is generally recommended\n", 238 | "2. For classification problems, **stratified sampling** is recommended for creating the folds\n", 239 | " - Each response class should be represented with equal proportions in each of the K folds\n", 240 | " - scikit-learn's `cross_val_score` function does this by default" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "## Cross-validation example: parameter tuning" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "**Goal:** Select the best tuning parameters (aka \"hyperparameters\") for KNN on the iris dataset" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 8, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "from sklearn.model_selection import cross_val_score" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 9, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "name": "stdout", 273 | "output_type": "stream", 274 | "text": [ 275 | "[1. 0.93333333 1. 1. 0.86666667 0.93333333\n", 276 | " 0.93333333 1. 1. 1. ]\n" 277 | ] 278 | } 279 | ], 280 | "source": [ 281 | "# 10-fold cross-validation with K=5 for KNN (the n_neighbors parameter)\n", 282 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 283 | "scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')\n", 284 | "print(scores)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 10, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "name": "stdout", 294 | "output_type": "stream", 295 | "text": [ 296 | "0.9666666666666668\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "# use average accuracy as an estimate of out-of-sample accuracy\n", 302 | "print(scores.mean())" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 11, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "name": "stdout", 312 | "output_type": "stream", 313 | "text": [ 314 | "[0.96, 0.9533333333333334, 0.9666666666666666, 0.9666666666666666, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9800000000000001, 0.9666666666666666, 0.9666666666666666, 0.9733333333333334, 0.96, 0.9666666666666666, 0.96, 0.9666666666666666, 0.9533333333333334, 0.9533333333333334, 0.9533333333333334]\n" 315 | ] 316 | } 317 | ], 318 | "source": [ 319 | "# search for an optimal value of K for KNN\n", 320 | "k_range = list(range(1, 31))\n", 321 | "k_scores = []\n", 322 | "for k in k_range:\n", 323 | " knn = KNeighborsClassifier(n_neighbors=k)\n", 324 | " scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')\n", 325 | " k_scores.append(scores.mean())\n", 326 | "print(k_scores)" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 12, 332 | "metadata": {}, 333 | "outputs": [ 334 | { 335 | "data": { 336 | "text/plain": [ 337 | "Text(0, 0.5, 'Cross-Validated Accuracy')" 338 | ] 339 | }, 340 | "execution_count": 12, 341 | "metadata": {}, 342 | "output_type": "execute_result" 343 | }, 344 | { 345 | "data": { 346 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEGCAYAAABy53LJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAA/Z0lEQVR4nO3deXhb93Xg/e8huIAiCUALRVAibMm2vMiWSCeqm6Rp1mZP68Sdtnbaxs00TTwTZ5tpO560nSTt276ZNkszTRq/Tuu8dieJmybxxO14stRt4i5pbDkGLNmWbVm2BUoktQLgvuHMH/deCoIA8AIESAI4n+fhQ/Li3osfRBGHv+0cUVWMMcYYv1rWugHGGGPqiwUOY4wxZbHAYYwxpiwWOIwxxpTFAocxxpiytK51A1bDli1bdMeOHWvdDGOMqSuPPPLIKVXtzT/eFIFjx44d7N+/f62bYYwxdUVEXih03IaqjDHGlMUChzHGmLJY4DDGGFMWCxzGGGPKYoHDGGNMWWoaOETkjSLylIgcFpHbCjy+UUTuFZHHROQhEbkm57EPi8jjInJQRL4qIkH3+CYR+Z6IPON+3ljL12CMMeZ8NQscIhIAPg+8CdgN3CQiu/NO+wgQV9W9wDuBz7rXbgc+AOxT1WuAAHCje81twAOqugt4wP3eGGPMKqllj+M64LCqHlHVOeAe4Pq8c3bjvPmjqoeAHSLS5z7WCnSKSCuwATjuHr8euMv9+i7gbTV7BaYpPT02zr8cPrXWzaiabFb564ePMjW3UNV7fu3hJNNzi1W7p6kftQwc24FkzvfD7rFcCeAGABG5DrgYGFDVY8AngaPACJBW1e+61/Sp6giA+3lroScXkfeIyH4R2X/y5MkqvSTTDP7o/id531d+TKPUqnnk6Fn+yzcO8M0fH6vaPR96/gy//Y3H+O4To1W7p6kftQwcUuBY/m/iJ4CNIhIH3g88Ciy48xbXAzuBbUCXiPxKOU+uqneo6j5V3dfbe8GOeWMKUlUSyRSpqXmePz211s2pivjRlPM5marePd17HU/NVO2epn7UMnAMA7Gc7wc4N9wEgKpmVPVdqjqEM8fRCzwH/AzwnKqeVNV54JvAy9zLxkSkH8D9fKKGr8E0maNnpjg7NQ9AoopvtGspPpwCqvt6vHuNZSxwNKNaBo6HgV0islNE2nEmt+/LPUFEIu5jAO8GHlTVDM4Q1UtEZIOICPBa4En3vPuAm92vbwa+VcPXYJpM7l/l1fwLfS15b/KHT04wPjNf1XuOpKercj9TX2oWOFR1AbgV+A7Om/7XVPVxEblFRG5xT7sKeFxEDuGsvvqge+2PgK8DPwYOuO28w73mE8DrROQZ4HXu98ZURTyZItjWwosv3tgQgePUxCzDZ6f56V1bUIUDw+kV3/NEZobjaaenMZqZXfH9TP2paXZcVb0fuD/v2O05X/8Q2FXk2o8CHy1w/DROD8SYqkskU+zZHuZFF23kS//yPHMLWdpb63efrNcz+NWXXMw/PXOK+HCKl122ZUX39ALqJVu6GEvbUFUzqt/fCGOqbH4xy8HjGQYHIgzGIswtZjk0mlnrZq1IIpmiReDlu7awc0tXVeY5EsMpAi3Ca67cysmJWRazjbH6zPhngcMY16GRceYWsgzGnMAB9T/P8WgyxeV9PWxob2VwIFyV1xNPprgy2sOOLV0sZpVTEzZc1WwscBjj8lYfDcUibAsH6e3pqOvA4S0tvvaiCOC8rrHMLKMrGF7KZpXHkmmGYhGioSAAIzZc1XQscBjjih9NsbmrnYGNnYgIgwORug4cz52aJDOzwOBABCCnF3W24nseOTXB+OwCg7EI0bATOFYSiEx9ssBhjCsxnGIoFsFZAQ7XXhThyMlJ0tPVWcK62hJeD8rtcezeFqItIMSTla+s8q4dikXoc3sctpej+VjgMAbIzMzz7MmJpb/KgaW/1KuxhHUtJJJpNrQH2LW1B4CO1gC7+0MrmiBPJFN0d7RyaW83m7vaaQsIoxY4mo4FDmNwgoMq5wWOPQNhYGVDO2vp0WSKa7aHCbScy/4zGIvw2HCq4pVQcXe5cqBFaGkRtvYEbUluE7LAYQznVk8NusECINzZxqW9XSsa2lkrswuLPHk8w7U5gRCcIabJuUWePTlR9j1n5hd5ciRzXnCNhoPW42hCFjiMwRmC2bmli8iG9vOOD8acCfJ6y5T75Mg4c4vZ897kgRUtM35iJMNCVhnKDRyhoE2ONyELHKbpqSrxZOq83oZnKBbh1MTsUoqNeuHNY+QHjp2bu+gJtlYUOLwsu7mBoy/k9DjqLbCalbHAYZreaGaGE+Oz570herxj9ZYpN5FM0dvTwTZ3yaynpUUYikUqej2J4RTRUHBpGS5ANNzB1Nwi47PVKxJl1j8LHKbpFfvrHODKaIj21pa628/h9KDOLS3ONTgQ4dDoODPz5VXvSyRTDMbO75UtLcmtsx6ZWRkLHKbpPZpM0RYQdm8LXfBYe2sLV28L1VXgSE/Nc+TU5NKO8XxDsQiLWeXgMf+T/mcn53j+9NQFwbU/3AlgE+RNxgKHaXqJZIrd/SE6WgMFHx8ciHBgOM3CYnaVW1aZx46lgHP7UPLtjXnLjFO+75nISceSy9KONCcLHKapLWaVA8PpgsNUnqFYhOn5RZ45Uf4S1rXgTWLvKTDZD7C1J8j2SGdZgSOeTCECe7aff8+toQ7AhqqajQUO09QOn5hgcm6x4MS4p94myBPDKS7t7SLc2Vb0nKFYZKkX4eueyRSX9XbTEzz/nsG2ABs3tNlQVZOxwGGaWqmJcc/FmzcQ7myri3mOpaXFJV4PwGAsTPLMNKd9pERXVRLD6aLBtS8UtHxVTcYCh2lqjyZT9ARb2bm5q+g5IrK0EXC9O5aa5tTEXMkeFJyb//DT60iemebM5FzRYGS7x5uPBQ7T1BJJJyNuS8uFy1ZzDcUiPD02ztTc+t6vkMjJXlvKnoEwLYKvdCrxIhPjnv6w7R5vNhY4TNOanlvkqbHxoquPcg3FwmR1/WfKjSfP0t7awpXRC5cW59rQ3srlfT2+5m0SyRQdrS1cEe0p+HhfKMipiTnmFupj1ZlZOQscpmkdPJ5mMS/3UjHlDO2spUQyzdXbnE2Ly7n2ImeCfLl0IXE3y25boPA9vSW5J8at19EsLHCYpuX9tb03VnjZaq7N3R3ENnUuDQWtRwuLWQ4cS/vqQYETDFNT87xweqroOfOLWQ4eKz4xDtAXtoJOzcYCh2lajyZTbI90srUnuPzJsO5LyT49NsH0fOmlxbm8ye5SvainRseZXbgwy24ur8cxml5+hZZpDBY4TNPyJsb9GopFOJaaXrdDMsV2dxdzeV8PG9oDPOpuGCzEC5T5dT1y9Ye93ePTvp7X1D8LHKYpnZqYZfjs9AVJ+0o5txFwfQ5XxY+mCHe2cfHmDb7OD7QI12wPl+xxJJIpNnW1M7Cxs+g54c42OlpbbKiqiVjgME1paeOfz/kAgKu3OSVT1+sO8sSws/GvUEbcYoZiER4/nim6IsqrU1LqniLi7uWwoapmYYHDNKVEMkWLFM/nVEhne4Aroz3rcmXV5OwCT4+NlzX0Bk7gmFvIcmg0c8Fj4zPzHD45wVBs47L36QtZ7fFmYoHDNKX4cNod428t6zpvB3k2u74q3h04liarzn6TcgyWyMN14FgaVXwN50VDtnu8mdQ0cIjIG0XkKRE5LCK3FXh8o4jcKyKPichDInKNe/wKEYnnfGRE5EPuYx8TkWM5j725lq/BNB5VJZFMFa1XUcpQLML4zALPnZ6sfsNWoJKhN4Bt4SC9PR08WiBwxMu4p5d2xErINoeaBQ4RCQCfB94E7AZuEpHdead9BIir6l7gncBnAVT1KVUdUtUh4MXAFHBvznWf8R5X1ftr9RpMY3r+9BTp6fmy32Rh/WbKTQyniG3qZHN3R1nXiQiDA4VLySaSKXZs3sDGrvZl7xMNBZlbyHJ2ar6s5zf1qZY9juuAw6p6RFXngHuA6/PO2Q08AKCqh4AdItKXd85rgWdV9YUattU0ET8ZcYu5tLebrvbAutvPET+aqigQgjO89ezJSTIz57/pJ5Kl65Tk8uqQW86q5lDLwLEdSOZ8P+wey5UAbgAQkeuAi4GBvHNuBL6ad+xWd3jrThEpOHMnIu8Rkf0isv/kyZOVvgbTgOLJFBvaA1zeVzj3UimBFmFvkb/Q18qJzAzH0zNlT4x7vMnvx3KWGY+mZxjN+L/nUu1xm+doCssGDhH5pIhcXcG9C63fyx8A/QSwUUTiwPuBR4Gl9KMi0g78HPA3Odd8AbgUGAJGgE8VenJVvUNV96nqvt7e3gqabxqVl3spsExG3GIGYxGeGMkwM79Y5ZZVxuv9VBo4vJVluavF4mX2ypZ6HBY4moKfHsch4A4R+ZGI3CIifpdtDAOxnO8HgOO5J6hqRlXf5c5lvBPoBZ7LOeVNwI9VdSznmjFVXVTVLPBFnCExY3yZXVjkieOZit9kwRnamV9Unhy5cAnrWkgMpwi0CFdvK29FlSfc2cYlvV3n7SCPJ1O0BYTd/aWz7Hq29nQgYkNVzWLZwKGqf6GqP4Xzxr4DeExEviIir17m0oeBXSKy0+053Ajcl3uCiETcxwDeDTyoqrm/jTeRN0wlIv05374dOLjcazDGc2hknLnF7AoDhzO0s16GqxLJNFdGe+hsD1R8jyF3mbG3KiqRTHFVf4hgm797tgVa2NLdYYGjSfia43BXSF3pfpzCmZv4TyJyT7FrVHUBuBX4DvAk8DVVfdzttdzinnYV8LiIHMLpXXww5zk3AK8Dvpl36z8WkQMi8hjwauDDfl6DMXBuOKaSiXFPNBykL9SxLibIs1lnafFKXg84gePUxCwj6RkWs1pWll2P7eVoHsvufhKRT+PMMzwA/JGqPuQ+9N9F5KlS17pLZe/PO3Z7ztc/BHYVuXYK2Fzg+K8u12ZjiokfTdHb08G2sL+MuMUMxSIk1kFRpyOnJhmfXVhRDwrOzY/Ekyku29rNxOxC2cGoLxRk+GzxFO2mcfjpcRwE9qrqe3OChsfmF0xdiQ87y1bLyedUyGAswnOnJklNzVWpZZVJrHBi3HNlNER7oIVEMlXxZHs03GE9jibhJ3CcBdq8b9x5ibcBqOra/8lljE/p6XmOnJwsOy1HIUNLFQHX9lcgnkzR1R7g0t7uFd2nvbWF3dtCxJMpEskUPcFWLtnSVdY9oqEgqan5dbPazNSOn8Dx0dwAoaop4KM1a5ExNfLYUr2K5ZP2LWfPQBiRtZ8gTwyn2DsQqXhpca6hWIQDx9I88sJZBgcitJR5z76QbQJsFn4CR6FzyssMZ8w64L3Jl5MRt5ieYBuX9Xav6QT5zPwiT45kVjwx7hmKRZiaW+TQ6HhZdUo8/WGnZocNVzU+P4Fjv4h8WkQuFZFLROQzwCO1bpgx1RZPprikt4twZ9vyJ/swGHN2kK9VYr8nRjLML2pVht7g/JVmlaQviYadPFm2e7zx+ek5vB/4PeCvcXaDfxd4Xy0bZRqTqqJK2UMg1XrueDLNKy7fUrV7DsUifP2RYQ4eyyztnF5NP3z2tNuOlQ+9AezYvIFwZxvp6fmKJttXc6gqm1VEWPEiB1OZZQOHqk4CF6REN6Zc//3bT/HDZ0/xrVtfvurPfTw9w6mJ2YoTARbipWX/2c/9c9XuWa5oKFi1oCUiXHtRhGfGJtgaKv+ePcE2utoDqzJU9bvfOshIapovvcsWdq4FP/s4eoHfBq4Glv43qepratgu04C+/9QJDo2Oc2Zyjk0+UnVXU9xNp7HSZau5dveH+Nw7rl3TVOLXbPOXEsSvP7j+GsZnFpY/sYi+cHBVhqq+f+gEc4tW+2Ot+Bmq+jLOMNVbgVuAmwFLN2vKMjXnlDYFZyXQq6/YuqrPnxhO0R5o4SqfuZf8EBHeundb1e63HsQ2bVjR9f3hICM1HqrysgGLwPxilraAFTJdbX7+xTer6l8C86r6A1X998BLatwu02AOHsvgVVtdiyWs8WSK3dtCtLfam0wtrUbtcW/vjCqcGJ+t6XOZwvz8Fnn98BEReYuIXMuFNTOMKSmePAs4Y/KrvYR1YTHLgeF0VYepTGHRUJAT47M1rcnu/V8C2zOyVvwEjv/HTaX+n4HfBP4CSyxoypRIpolt6uQVl29Z9SWsz5yYYHp+0QLHKoiGgyxklVOTtesJJJJpNriZgC1wrI2SgcPNirtLVdOqelBVX62qL1bV+0pdZ0y+eNLJETUYi3B2ap6jZ1YvGd5KSsWa8ixVAkzXJnB42YC9OTLbbLg2SgYOVV3EyYxrTMVOjM9wLDXNUCxyXhbW1RJPpgh3trFj88omfs3yom7gGElP1+T+XjbgV17RS3tri202XCN+hqr+VUQ+JyI/LSIv8j5q3jLTMLxa1kOxCJf39RBsayGRXL3kgHG3XoVtFqu9/nBta497vcdrYxGn/ocNVa0JP8txX+Z+/v2cYwrYPg7jSzx5rrRpW6CFa7aFz5vgrKXJWWcZ8Ot3963K8zW7zd0dBFqkZkNI8WSK7o5WLuntJhq2wlFrxc/O8eVKxBpTUmI4dV5p06FYhLv/7YVVWYN/8FiarMKQu8vb1FagRdja08FojeY4nGzAYQItsiYr9IzDz87x/1bouKr+fqHjxuTyJjPfOnhuo9xgLMLcPz/HU6PjXLO9Ogn6ilkqFVvFVCOmtL5QbXaPe9mA3/3TlwDOCq7Rx2dQVRuGXGV+/tybzPlYxKkNvqOGbTIN5LnTk2Rmzi9t6n396Cr8tRhPpoht6mRzd0fNn8s4oqFgTSbHvWzA3h8BfaEgcwtZUmuY8qVZLRs4VPVTOR9/CLwK2F7zlpmGUKi06cDGTjZ3ta/KDvJEMm29jVUWDQcZy1R/qGppYtwddvRWcNk8x+qrZIB5A3BJtRtiGlOh0qYiwmAsUvPx6dxlwGb1RMNBJmYXmJitPFliIfFkimgouLRXxKv/YYFj9S0bOETkgIg85n48DjwFfLb2TTONIJEsXNp0KBbh2ZMTZGZqN8yQyFkGbFZPtEZ1ORLJ1HmVCaNexUFbkrvq/PQ43gr8rPvxemCbqn6upq0yDWF2YZEnipQ2HYxFUIWDw7Xbz5HIWQZsVs/S7vEq9gRSU3M8f3rqvKJVW3s6ELHAsRb8BI5+4IyqvqCqx4CgiPxkjdtlGsATx4uXNh10637XcoI8MZziir5zy4DN6vAKS1UzvXp8KW3Muf9LbYEWNnd12O7xNeAncHwBmMj5fso9ZkxJ5ybGLyxtGtnQzs4tXTWbIM9mlXgyZfs31kC0Bj2ORDKNCOzJW74dDXfYHMca8BM4RHNSmapqFn87zk2TiydT9IU6ipY2HRwIE69RptznTk8yPrPAkK2oWnWd7QHCnW1VHUKKJ8+ya2s3PcG2845b2pG14SdwHBGRD4hIm/vxQeBIrRtm6l9imRoYQ7EIJ8Zna/IX41KpWOtxrIloqHrpQFSVxHDhZdWWdmRt+Akct+DkqzoGDAM/Cbynlo0y9S81NcdzpyZLpjL3HqvFcFVi+MJlwGb1VLP2+PDZac5MzhX8vxQNBUlNzTMzv1iV5zL++NkAeEJVb1TVrarap6rvUNUTfm4uIm8UkadE5LCI3Fbg8Y0icq+71PchEbnGPX6FiMRzPjIi8iH3sU0i8j0Recb9fOEAullzXnnPUkNFV/WHaAtITSbIE8kUe9ycRmb1RUMdVRtCerTAJlJPLVZwmeX52cdxl4hEcr7fKCJ3+rguAHweJ0XJbuAmEdmdd9pHgLiq7gXeibs/RFWfUtUhVR0CXowzIX+ve81twAOqugt4wP3erDOJZMqZzBwovhQ22BZgd3+o6j2OmXlnGXChSXmzOqKhICcnZplfzK74Xolkio7WFq6I9lz4POHa7BkxpfkZqtqrqinvG1U9C1zr47rrgMOqekRV54B7gOvzztmN8+aPqh4CdohIfv7r1wLPquoL7vfXA3e5X98FvM1HW8wqSyRTXNZ74WRmvsFYhAPDaRarWKP6yZHiy4DN6oiGO1GFk+MrTz2SSKa4Znu4YCZlSzuyNvwEjpbc4SAR2YS/VVXbgWTO98NcmOMqAdzg3vc64GJgIO+cG4Gv5nzfp6ojAO7nrYWeXETeIyL7RWT/yZMnfTTXVIuqLhVPWs7gQITJuUUOn5hY9ly/rFTs2qtWOpD5xSwHjhVfZNFnPY414SdwfAqnCuAfiMgfAP8K/ImP6woNLuf/WfkJYKOIxIH3A48CSwluRKQdp3Tt3/h4vvOfSPUOVd2nqvt6e3vLvdyswPDZaU5PzvlK9eGteqrmcJW3DLjfTUlhVt+52uMre0N/anSc2YVs0T8Cejpa6WoPWI9jlfmZHL8b+HlgDDgB3OAeW84wEMv5fgA4nnfvjKq+y53LeCfQCzyXc8qbgB+r6ljOsTER6QdwP/uaqDerJ15iMjPfzs1d9ARbqzpBXmzpplk91RpC8v4vXVvk/5KIVHUFl/HHV3ZcVX3CzU91P3CDiBz0cdnDwC4R2en2HG4E7ss9QUQi7mMA7wYeVNVMzik3cf4wFe49bna/vhn4lp/XYFZPqcnMfC0twlAsUrUeh7cM2PZvrK1NXe20B1pWPISUSKbY1NXOwMbivUfbBLj6/Kyq6heRD4nIQ8DjQADnDb0kVV0AbgW+AzwJfE1VHxeRW0TkFve0q4DHReQQTu/igznPuwF4HfDNvFt/AnidiDzjPv6J5dpiVldiuPhkZiGDAxGeGhtnem7la/H9LAM2tScibA2tPB1IYjjF4EC4ZIW/aKg29T9McUUnuUXkN3ACxADwNZwewbdU9eN+b66q9+P0UnKP3Z7z9Q+BXUWunQI2Fzh+GmellVmHvMnMd1x3se9rBmMRFrPKweNpfmLHphU9v59lwGZ19IdX1hMYn5nnmRMTvGXPtpLneUNV2azSYvt2VkWpPwk/j9O7eIeq/q6qPsaFk9vGnOfpsXFm5rNlDRV5GU+rMVzldxmwqb2V1h4/cCyN6vkZcQvpDwdZyCqnJq3XsVpKBY5tOHsvPu3u/v4DwH4bTUlLE+NlDBVt7QmyPdK54gnycpYBm9rz8lVVmsRyKZX6Mv+Xzq3gssCxWooGDlU9papfUNVX4AwNpYETIvKkiPzRqrXQ1BVvMjO2qbylsIOx8Ip7HN4yYAsc60M0HGRmPkt6urIqj4lkih2bN7Cxq73kebYJcPX5XVU1rKqfVNUX4+zUttBuCkok08tOZhYyFIswfHaaUxOV/9dabummWV19K3xDTyTTvv4IWEo7YoFj1fhb9pLDzSPle4LcNI+J2QWePjFe0V/83nDESnod5SwDNrXXv4Jd3aPpGUYzM77242zp7iDQIivebGj8KztwGFPMgWFnMtPPxr98ewbCtMgKA0eZy4BNba0kc+3SXJmPRRaBFqG3u6OqpWpNafYbZqomMZwClp/MLGRDeyuX9/UQd/dhlMtbBmw7xtePpaGqCiatE8MpWluE3f0hX+dHbff4qiq1j+NFpS5U1R9XvzmmnsWPprjYx2RmMUOxCP/n4CiqWvYcibcMeLmlm2b1tLe2sLmrvaK5h/jRFFf1hwi2BXydHw0FOXyyeokyTWmlstx+yv0cBPbhZLIVYC/wI+DltW2aqTeJ4RTX7ax8A99QLMI9Dyd5/vQUO7d0lffcSaencq3V4FhX+kJBRtPTZV2zmFUOHEvz9mvzk2kXFw0H+ZfDp8ptnqlQqeW4r1bVVwMvAC9yM82+GKcWx+HVaqCpD2OZGUbS/iYzi/Em1ePJs2VfG0+eZeOGtrKXAZva6g8HGS0zHcizJyeYmF0oa5FFXyjI+OwCE7MLy59sVszPHMeVqnrA+0ZVDwJDNWuRqUvxKtTA2LW1m862wFLvoRze0s1yh7hMbVWSubac7MqepfofNkG+KvwEjidF5C9E5FUi8koR+SJO0kJjliSSzmTm1dv8TWYW0hpoYc9AeOmNwy9vGXAlq7lMbUVDQc5MzjG74D+BZSKZoqejlUvKGK6Mhpyepk2Qrw4/geNdOFlxPwh8CHjCPWbMkniyvMnMYoZiEZ44ninrjcZbBmw7xtcfb1f3iTKGq+LJFHtj4bISFlrt8dXlp5DTDHA7cJuqvl1VP+MeMwaAbFZ5bLh4ec9yDMUizC1mOTQy7vsabxmwpVJff7zSrn73WMzML3JotPzeo6UdWV1+6nH8HBAHvu1+PyQi95W8yDSVSiYzizk3QZ7yfc1KlwGb2in3Df3gsTSLWS17kUVne4BQsNWGqlaJn6GqjwLXASkAVY0DO2rWIlN3zk1mrnwPxbZwkC3dHWXtIHeK/URW/Nym+rwhJL/pQCqZGM99Lts9vjr8BI4FVa1sO69pColhbzKze8X3EnFKycbd4afleMuAbWJ8fQoFW+lsC/jucSSG02wLB9nq9lTKsdL6H8Y/P4HjoIi8AwiIyC4R+TPgX2vcLlNHKpnMLGUoFubIyUnSU8un467GMmBTOyJCNBz0HTjiybMV/yxXWnHQ+Fdq57jn/cDv4KRS/wpODfE/qGWjGtm/HTnN3T98ngpr26xLh0bGec8rLqna/bw3jv/w5UcId5auHfbcqckVLwM2tdUX6uDfnj3Nf/ifj5Q8TxWSZ6b55Z/0X3Y4VzQU5NTELAuLWVorSHT590+MkZ6e5+dfPFDR8xfyvSfGmJpb4Poh/7vg64GfwPEWVf0dnOABgIj8AvA3NWtVA7vnoaP8/RMn2LFlw1o3pWou7+vhzXv6q3a/F1+8kZdcsolTE7O+6nO84ycvWvEyYFM7b927jbt/+DzP+sgltWd7mNft7qvoefrCQbIKJydm6Q+Xn0HgM3//NGOZGW540faqbST99PeeZnZ+sSkDx3/lwiBR6JjxYSQ9w2AszN/c8rK1bsq6taG9lXve89K1boapkl95ycX8yksq60WUw1vBNZKeKTtweMuAF7PKsdQ0AxtX/ofd1NwCT4+N09HaUlHizvWsVHbcNwFvBraLyP/IeSgEWEKYCo1lZthjK4CMqbpztcfLn+fwlgGDM29WjcBx8FiGxawyNbfI+OwCoWDpYdd6Umog8DiwH5gBHsn5uA94Q+2b1nhUldHMDNFQx1o3xZiG07+CErLeIovWFllRMbFcufdptOqERXscqpoAEiLyFVWtrNq8OU9meoGZ+ezSX0bGmOrZ1NVOe6ClosDhLQPuCwcrSrJZSO6S8tHMDLv6GqeksZ+lBztE5Osi8oSIHPE+at6yBuT9h/Y2RRljqkdE2BrqqOiv+3jyLEMXRRiKRThwLM3CYnbF7YkfTbFnu7MpttGWCfsJHF8CvoAzr/Fq4G7gr2rZqEY14ha0iVqPw5iaiIbK3z1+emKW5JlpBgecwDE9v8jTYyurJnhyfJZjqWnecLWzQqwZA0enqj4AiKq+oKofA15T22Y1Jm9Xqw1VGVMbldT/eMytcz8Yiyylrik3tX8+b37jJy/ZTGRDW8MlX/QTOGZEpAV4RkRuFZG3A1tr3K6GNJp29iRY4DCmNqIhZ5e6lrHD9tFkihZx9pBcvHkDkQ1tK54gTwynCLQI12wLE23AVCh+AseHgA3AB4AXA78K3Ozn5iLyRhF5SkQOi8htBR7fKCL3ishjIvKQiFyT81jEnVs5JCJPishL3eMfE5FjIhJ3P97spy3rwWhmhi3d7bS3lr+r1RizvP5wkJn5LJlp/zsGEskUl/f10NXRiogwOBBZStVfqXgyxRV9PXS2B5y6680WOFT1YVWdUNVhVX2Xqt6gqv+23HUiEgA+D7wJ2A3cJCK78077CBBX1b3AO4HP5jz2WeDbqnolMMj5VQc/o6pD7sf9y7VlvRjLzFhvw5ga6iszjbuqkhhOnZckcygW4emxcSYrrF+ezSqJZGopdY6TQ6u8uuvrXakNgH8LFO3vqerPLXPv64DDqnrEvd89wPU4FQQ9u4H/173fIRHZISJ9wDTwCuDX3MfmgLnlXsx6N5KeYZutqDKmZqI5ezmuiC6//PWF01OkpubPS6w4FIuQVThwLM1LLtlcdhuePz1JZmZhqcxAXyjI6clZ5hayDTPaUOpVfBL4FPAczhv5F92PCeCgj3tvB5I53w+7x3IlgBsAROQ64GJgALgEOAl8SUQedWue5xYgvtUd3rpTRDYWenIReY+I7BeR/SdPnvTR3Noby8wsVUQzxlTfUuEodwXjcrwhqdx6LnsHnDf8Suc5lipSxpy3pmg4iCqcGG+c4aqigUNVf6CqPwCuVdVfUtW/dT/eAbzcx70LJWbJ78F8AtgoInGcLLyP4iz7bQVeBHxBVa8FJgFvjuQLwKXAEDCCE9wKtf8OVd2nqvt6e3t9NLe2ZhcWOTM5Z0txjamhrW5WBr9DQ48eTdHZFuDyvnO1ZDZ3d3DRpg0Vr6yKH03R1R7gsq3OPb3f+UaaIPeT5LBXRC7JGXLaCfh5Jx4GYjnfD+CkMVmiqhngXe59Bad38xzOZPywqv7IPfXruIFDVce860Xki8Df+WjLmjuRcf4j2+Y/Y2qnozXA5q72MgpHOZv08tOwD8YiPPL8mYraEB9Os2cgTMCtT7M079JA8xx+Btw+DHxfRL4vIt8H/hFnpdVyHgZ2ichOEWkHbsTJc7XEXTnlFYp+N/CgqmZUdRRIisgV7mOvxZ0bEZHc/N1vx9+w2Zpb2jVuPQ5jaspvJcC5hSyPH88wWKDk8eBAmOPpGU6U2UuYXVjkyeOZ8+ZMVpJDa71atsehqt8WkV3Ale6hQ6q6bOhU1QURuRWn8FMAuFNVHxeRW9zHbweuAu4WkUWcwPDrObd4P/BlN7Acwe2ZAH8sIkM4w17PA+9d9lWuA97OUetxGFNbUZ+VAA+NZphbyC7NReS69qII4Cyrff3VUd/P/eTIOHOLWYZy5kwiG9pob21pjqEqEXmNqv6DiNyQ99ClIoKqfnO5m7tLZe/PO3Z7ztc/BHYVuTYO7Ctw/FeXe971yPuPbMtxjamtvlDQ1/xEYqns8IU9jqu3OUNNieHyAod3zyE38IBbPreCVCjrWakexyuBfwB+tsBjCiwbOMw5o5kZOtsChIJ+ppWMMZWKhoKcmZxjdmGRjtbilSHjyTRbutvZHrmw6FOwLcCV0Z6yJ8jjyRRbezouGJKOhoINlVq9VFr1j7qf31XsHOPfaGaG/nCwoaqAGbMeRcPOyqoTmVlim4oXZIonzzIUixT9nRyKRbgvfpxsVmlp8fd76238y7+nk6495e8F1IFSQ1X/qdSFqvrp6jencY2lbde4Mash6paNHc3MFA0cmZl5nj05ydtK1AIfjEX48o+OcuTU5NLS2lLSU/McOTXJz7944ILH+sNBvvP4TMOUkC21qqpnmQ9ThtHMjE2MG7MKzm0CLD40dMDNiJs7F5HvWndllN/hqnMb/y68Z18oyNxCltRUY9TEKzVU9fHVbEgjy2bV8lQZs0r8BA4vGOzdHil6ziW93XR3tJJIpvh3BXoR+RLJFCKwZ+DCyfZoTg6tjV3tFzxeb5adqRWRIM4y2auBpXc+Vf33NWxXQzkzNcf8olqtcWNWQaizlWBb6RKy8WSKS7Z0Ed7QVvScQIuwZ3vYd6bcxHCKS3u7CQUvvKc37zKanuGq/pCv+61nfjYA/hUQBd4A/ABnB/h4LRvVaGwPhzGrx1v+WixwqCrxZKrgkFK+oYsiPDmSYWZ+seR53j1zc17lKjdr73rnJ3Bcpqq/B0yq6l3AW4A9tW1WYxlbqjV+4bI/Y0z1RcPFl7+OpGc4OT573u7uYgYHIswvKk+MZEqedyw1zamJuaWMuPm29iw/fFZP/AQObzYn5RZaCgM7ataiBmTpRoxZXaV6HOc2/kWWvY/XK4kfTZU8z5szKbQLHaC9tYUt3R0Ns3vcz260O9zU5b+Hk2uq2/3a+DSWnqFFYEt3/U+KGVMPvNrjhfZgxJMp2gMtXNW//OLQaDhINBRcdp4jkUzR3tpSsgZINNzRMENVpfZxPAF8GbhHVc/izG9csloNayQj6Rl6ezouyMBpjKmNaCjI/KJyZmqOLd3nL0qJJ1NctS1Ucld5rsFYeNnNe4lkmmu2hUoWaoqGggyf9VcnZL0r9U52E07v4rsi8iMR+VBeZlrj02hmxoapjFlFxZbkLmaVA8fSS3s0/BiKbeT501OcnSxchHRhMcuBY+llh74aqfZ4qUJOCVX9r6p6KfBBnOp8PxKRfxCR31i1FjaAMdv8Z8yq8ipt5s8pPHNinKm5xYKJDYvxzi02XPX02ATT84vLrtKKhoKkpuaXXaFVD3yNnajqv6nqh4F3AhuBz9W0VQ1mNG09DmNWU7EaGEsT40WWzRayZ3sYEWc4qpBSO8ZzRYsEs3q0bOAQkZ8QkU+LyAvAx4E7uLB2uCliam6BzMyC1Ro3ZhX1dnfQIhcOVcWTaULBVnZu6fJ9r55gG7u2dhNPni34ePxoisiGNi4qkVARzgWORliSW2py/I+AXwLOAvcAP6Wqw6vVsEaxtPnPehzGrJrWgLP89cLAUTh77XIGByI8cOhEwSSFiWFn499y94w20CbAUj2OWeBNqrpPVT+pqsMi8tbValijsD0cxqyNaPj8yeipuQWeHhv3tWM832AswpnJOZJnzl8VNTnr/559DdTjKDU5/nFVfTrv8O/XuD0NxxvPtKEqY1ZXfu3xx49nWMxqRYFjaSNg3gT5gWNpsrr8/AZAT0crG9oDDd/jKKT+E8mvstG0U57dehzGrK7+vNrj3u5vPzvG810R7aGjteWC/Rze93sLZMTNJyJOKpQmDBzvrUkrGthYZoaeYCtdHVYy1pjV1BcKkplZYGpuAXB6CwMbOy/YEOhHW6CFPdvDF9TmiCdTXLRpA5t93jMaCjb2UJVHRH5BRLx99G8QkW+KyItq3K6GYUtxjVkb+ZsAvbKulRqMRTh4LM38YnbpWLn3jIaCjGVmK27DeuGnx/F7qjouIi8HXgfcBXyhts1qHCO2+c+YNRHN2ctxamKW4bPTDJWxfyPfYCzC7EKWp0adqhInMjMcT88w6GOYypObQ6ue+Qkc3jbHtwC3q+q3AMvW55PVGjdmbXi/d2OZmaW5iFKlYpeTX0rW+3xtGfeMhoIsZJVTk/Xd6/ATOI6JyP8H/CJwv4h0+Lyu6S1mlZMTszZUZcwaOLfhbpZ4MkWgRbh6W+XV9wY2drKpq30pCCWGU7S2CFdv89/jWNo9nm78wPGLwHeAN6pqCtgE/FYtG9UoTk3MsphVG6oyZg10d7TS09HKWGaGeDLF5X09bGivfJGKiDA4ED6vx3Flfw/BNn9ZdqFxNgH6CRz9wP9W1WdE5FXALwAP1bJRjcJ2jRuztvrCQY6npkn4LBW7nKHYRg6fnCA9Pc9jyXRZOa/g/HmXeuYncHwDWBSRy4C/BHYCX6lpqxrEiNUaN2ZNRUNB9r9wlszMQtGyruUYjIVRhfsSxxmfXSh7ldaW7g4CLVK0rG298BM4sqq6ANwA/KmbJdfqcviwtGvcehzGrIm+UJAzbh2NYmVdy+H1Wu7+1+cByqrrARBoEXq7O5b+qKxXvmqOi8hNOCnV/8491ubn5iLyRhF5SkQOi8htBR7fKCL3ishjIvKQW9PceywiIl8XkUMi8qSIvNQ9vklEviciz7ifV/6/oUZGMzO0BYTNXbYIzZi1EA07G/M2tAe4bGv3iu8X2dDOjs0beObEBN0drVzSW/49G2H3uJ/A8S7gpcAfqupzIrIT+J/LXSQiAeDzwJuA3cBNIrI777SPAHFV3YsTmD6b89hngW+r6pXAIPCke/w24AFV3QU84H6/Lo2lZ9jaE7yg5rExZnVEw52AU1MjUKXfQ294qtJ7RhugEuCygUNVnwB+Ezjg9giGVfUTPu59HXBYVY+o6hxOavbr887ZjfPmj6oeAnaISJ+IhIBX4MypoKpz7oou3Hvc5X59F/A2H22pyPHUND989nTF14/a5j9j1pS3MGUl+zfyecNVld4zGg42/hyHu5LqGZzew58DT4vIK3zcezuQzPl+mAsLQCVw5k4QketwytMOAJcAJ4EvicijIvIXIuJVXulT1REA9/PWIu1+j4jsF5H9J0+e9NHcC/3ZPzzDe/9qP6qV7fK0WuPGrC2vYNNLdm6u2j2v27npvM/l6gsFGZ9dYHJ2oWptWm1+hqo+BbxeVV+pqq8A3gB8xsd1hfpw+e/AnwA2ikgceD/wKLCAU2DqRcAXVPVaYJIyh6RU9Q63lsi+3t7eci5dMjgQITOzwHOnJsu+VlUZtV3jxqypy7Z284PfehWvuqKy94BCrt4Wdu55eWX39OZd6nm4yk/gaFPVp7xv3BodfibHh4FYzvcDwPHcE1Q1o6rvUtUhnDmOXuA599phVf2Re+rXcQIJwJiI9AO4n0/4aEtFvK5osSL1pYzPLjA1t7j0n8QYszYu3txVdsW/Wt4zGnLmXeo5S66fwPGIiPyliLzK/fgi8IiP6x4GdonIThFpB24E7ss9wV055S05ejfwoBtMRoGkiFzhPvZa4An36/uAm92vbwa+5aMtFdm1tYcN7YGiRepL8cYwrcdhjMnVCLXH/ey/vwV4H/ABnOGnB3HmOkpS1QURuRUnXUkAuFNVHxeRW9zHbweuAu4WkUWcwPDrObd4P/BlN7AcwVndBc7w1tdE5NeBozg72Wsi0CJcsz3Mo3k5+P3wuqH97qoOY4yBxkg7UjJwiEgL8IiqXgN8utybq+r9wP15x27P+fqHwK4i18aBfQWOn8bpgayKa2MRvvQvzzO7sEhHq/+cNJZuxBhTSGd7gFCwta73cpQcqlLVLJAQkYtWqT3rzmAswtxilkMj42Vd5wWOrSGb4zDGnC8aru9KgH6GqvqBx0XkIZzVTQCo6s/VrFXryGBODv5y8tKMZmbYuKGtrMyZxpjm0Beq793jfgLHx2veinVsWzhIb0/HBUXqlzOWsaW4xpjC+sPBpUqC9aho4HCz4fap6g/yjr8COFbrhq0XTg7+yAVF6pdju8aNMcVEQ0FOTcyysJilNVB/dfFKtfhPgUIhccp9rGlce1GEI6cmSU/N+75mND1LvwUOY0wBfeEgWYWTE/VZCbBU4Nihqo/lH1TV/cCOmrVoHfKKtTx2LOXr/LmFLKcnZ22oyhhT0NKS3DqdIC8VOEq96zXV5oQ9A04BmPjRlK/zT4zPoGpLcY0xhXl/VNbrBHmpwPGwiPxG/kF3452fneMNI9zZxqW9Xb5TjywVcLKhKmNMAd4wdr0WdCq1qupDwL0i8sucCxT7gHbg7TVu17ozGIvw4NOnUNVlc9SMpp1xS+txGGMK2dTVTnugpW53jxftcajqmKq+DGc57vPux8dV9aVuLqmmMhSLcGpilmOp6WXPPZduxAKHMeZCIsLWUEfd1uVYdh+Hqv4j8I+r0JZ1zSvekkimGdi4oeS5Y5kZOlpbCHf6qrBrjGlC9VwJsP4WEK+RK6Mh2ltbiCfPLnvuSNrZw1HtVM7GmMbRFw4ylmm85bgmR3trC1dvC/lKsT5mBZyMMcvoDzn5qiqtMLqWLHCUYXAgwoFjaRYWsyXPs5KxxpjlRMNBpucXyUzXXwlZCxxlGIpFmJ5f5OmxiaLnqKqlGzHGLKuvjutyWOAow9IEeYn9HKmpeeYWstbjMMaUtFQJ0AJHY7t48wbCnW0lM+V6/wmsx2GMKcX747Iel+Ra4CiDiDAYK50pd9RqjRtjfPCKvFmPowkMxSI8PTbO5GzhCS3rcRhj/OhoDbC5q90CRzMYioXJKhw4VnhZ7mh6BhHY2mMlY40xpfWF6rOErAWOMnkp1ovNc4xlZtjS3UFbHRZnMcasrnqtPW7vbmXa3N1BbFNn0ZVVtofDGONXvdYet8BRgcGBSNHaHKO2a9wY41N/OMjpyTlmFxbXuillscBRgaFYhOPpGU4U+EvB2fxn8xvGmOV5oxMn6ixnlQWOCngbAfOX5c7ML5KamrehKmOML311ugnQAkcFrt4WJtAiF8xzLFX+s8BhjPGhXmuPW+CoQGd7gCujPRdkyvV++P3hpirJboypULROa49b4KjQYCxCIpkimz2XEvnc5j+b4zDGLC/U2UpnW8B6HLlE5I0i8pSIHBaR2wo8vlFE7hWRx0TkIRG5Juex50XkgIjERWR/zvGPicgx93hcRN5cy9dQzFAswvjsAkdOTS4ds3QjxphyiIizl8N6HA4RCQCfB94E7AZuEpHdead9BIir6l7gncBn8x5/taoOqeq+vOOfcY8Pqer9tWj/cs6Vkk0tHRvNzNDVHqAnaCVjjTH+9IU6bKgqx3XAYVU9oqpzwD3A9Xnn7AYeAFDVQ8AOEemrYZuq5tLebrraA+etrBrLzCytkjDGGD+ioSAjNlS1ZDuQzPl+2D2WKwHcACAi1wEXAwPuYwp8V0QeEZH35F13qzu8daeIbKx+05cXaBH2DkTOW1k1mrZd48aY8vSFg5zIzNZVCdlaBg4pcCz/X+YTwEYRiQPvBx4FvLSzP6WqL8IZ6nqfiLzCPf4F4FJgCBgBPlXwyUXeIyL7RWT/yZMnV/I6ihqMRXhyJMPMvLPrcywza1lxjTFliYaCzC1mOTM5t9ZN8a2WgWMYiOV8PwAczz1BVTOq+i5VHcKZ4+gFnnMfO+5+PgHcizP0haqOqeqiqmaBL3rH86nqHaq6T1X39fb2VvWFeYZiEeYXlSdGMmSzypjlqTLGlKm/DjcB1jJwPAzsEpGdItIO3Ajcl3uCiETcxwDeDTyoqhkR6RKRHvecLuD1wEH3+/6cW7zdO74WcifIT03OspBV63EYY8rSV4d7OVprdWNVXRCRW4HvAAHgTlV9XERucR+/HbgKuFtEFoEngF93L+8D7hURr41fUdVvu4/9sYgM4Qx7PQ+8t1avYTnRcJC+UAfxZIp9F28CbCmuMaY83h+b9TRBXrPAAeAulb0/79jtOV//ENhV4LojwGCRe/5qlZu5IkPuRsClzX8WOIwxZejt7qBF6qv2uO0cX6HBWITnT0/x1GgGODdeaYwxfrQGWtjS3WFzHM1kyK0I+J3Hxwi0CJu7Ld2IMaY8/eEgo3WUWt0CxwrtGQgj4tQg39rTQaCl0CpkY4wpri8UtKGqZtITbOOy3m7AJsaNMZWpt3xVFjiqwFuWaxPjxphK9IWCpKfnmZ6rjxKyFjiqYNALHDYxboypwFJBpzrpddR0OW6z8HocNlRljKmEtxrznXf+iGBroKr3/qMb9vATOzZV9Z4WOKrgqv4Q73/NZbx1b//yJxtjTJ6hiyL84r4BJmYXlj+5TJ1t1Q1EAFJPGRkrtW/fPt2/f//yJxpjjFkiIo8UqIdkcxzGGGPKY4HDGGNMWSxwGGOMKYsFDmOMMWWxwGGMMaYsFjiMMcaUxQKHMcaYsljgMMYYU5am2AAoIieBF/IObwFOrUFzaqXRXg803mtqtNcDjfeaGu31wMpe08Wq2pt/sCkCRyEisr/Qjsh61WivBxrvNTXa64HGe02N9nqgNq/JhqqMMcaUxQKHMcaYsjRz4LhjrRtQZY32eqDxXlOjvR5ovNfUaK8HavCamnaOwxhjTGWaucdhjDGmAhY4jDHGlKXpAoeIvFFEnhKRwyJy21q3pxpE5HkROSAicRGpu4pVInKniJwQkYM5xzaJyPdE5Bn388a1bGO5irymj4nIMffnFBeRN69lG8shIjER+UcReVJEHheRD7rH6/LnVOL11PPPKCgiD4lIwn1NH3ePV/1n1FRzHCISAJ4GXgcMAw8DN6nqE2vasBUSkeeBfapalxuXROQVwARwt6pe4x77Y+CMqn7CDfAbVfW/rGU7y1HkNX0MmFDVT65l2yohIv1Av6r+WER6gEeAtwG/Rh3+nEq8nl+kfn9GAnSp6oSItAH/DHwQuIEq/4yarcdxHXBYVY+o6hxwD3D9Grep6anqg8CZvMPXA3e5X9+F80tdN4q8prqlqiOq+mP363HgSWA7dfpzKvF66pY6Jtxv29wPpQY/o2YLHNuBZM73w9T5fxaXAt8VkUdE5D1r3Zgq6VPVEXB+yYGta9yearlVRB5zh7LqYlgnn4jsAK4FfkQD/JzyXg/U8c9IRAIiEgdOAN9T1Zr8jJotcEiBY40wVvdTqvoi4E3A+9xhErP+fAG4FBgCRoBPrWlrKiAi3cA3gA+pamat27NSBV5PXf+MVHVRVYeAAeA6EbmmFs/TbIFjGIjlfD8AHF+jtlSNqh53P58A7sUZkqt3Y+44tDcefWKN27Niqjrm/mJngS9SZz8nd9z8G8CXVfWb7uG6/TkVej31/jPyqGoK+D7wRmrwM2q2wPEwsEtEdopIO3AjcN8at2lFRKTLndxDRLqA1wMHS19VF+4Dbna/vhn41hq2pSq8X17X26mjn5M78fqXwJOq+umch+ry51Ts9dT5z6hXRCLu153AzwCHqMHPqKlWVQG4y+v+FAgAd6rqH65ti1ZGRC7B6WUAtAJfqbfXJCJfBV6Fk/55DPgo8L+ArwEXAUeBX1DVuplsLvKaXoUzBKLA88B7vbHn9U5EXg78E3AAyLqHP4IzL1B3P6cSr+cm6vdntBdn8juA0yn4mqr+vohspso/o6YLHMYYY1am2YaqjDHGrJAFDmOMMWWxwGGMMaYsFjiMMcaUxQKHMcaYsljgMA1DRL4vIm/IO/YhEfnzZa7ZV+N2fdVNYfHhvOMfE5HfdL8OuplLP1rg+l9ws7j+4wraMJHz9ZvdTKkXuW2YEpGtRc5VEflUzve/6SZrNE3MAodpJF/F2dSZ60b3+JoQkSjwMlXdq6qfKXJOO84O5kdU9eMFTvl14D+q6qt9PmdricdeC/wZ8EZVPeoePgX85yKXzAI3iMgWP89tmoMFDtNIvg68VUQ6YCl53Tbgn0XkCyKyP7dOQb68v7T/nYj8/+7XvSLyDRF52P34qQLXBkXkS+LURXlURLw3+e8CW8Wp7fDTBZ62FSdL8zOqekF9GBH5b8DLgdtF5E+KPY+I/JqI/I2I/K37nIVe30/jpNF4i6o+m/PQncAvicimApct4NSs/nCBx0yTssBhGoaqngYewsnPA05v46/V2eX6O6q6D9gLvNLdZevXZ4HPqOpPAD8P/EWBc97ntmEPzu7ju0QkCPwc8KyqDqnqPxW47reBBVX9UJHX9PvAfuCXVfW3SjwPwEuBm1X1NQVu1YGTauJtqnoo77EJnODxwUJtAD4P/LKIhIs8bpqMBQ7TaHKHq3KHqX5RRH4MPApcDewu454/A3zOTVd9HxDy8oPleDnwVwDuG/MLwOU+7v3PwEtFxM+5yz3P90qkkpgH/hVn2KuQ/wHcLCKh/AfcrLF3Ax/w2UbT4CxwmEbzv4DXisiLgE63wttO4DeB16rqXuB/A8EC1+bm38l9vAV4qdtrGFLV7W7xn1yFUvb78SDwIeD/iMg2H+eXep7JEo9lcarb/YSIfCT/QTeb6leA/1jk+j/FCTpdPtpoGpwFDtNQ3Apo38cZevF6GyGcN9W0iPTh1C0pZExErhKRFpzMqJ7vArd634jIUIFrHwR+2X38cpyEck/5bPM3gD8Bvu1lNy1hJc8zBbwVZ9ipUM/j08B7ceZd8q89g5Mor1iPxTQRCxymEX0VGMSZdEZVEzhDVI/jBJR/KXLdbcDfAf+AU8TH8wFgn7uk9gnglgLX/jkQEJEDwF8Dv6aqs34brKq3A98E7suZsyhkpc9zBmcO6HdF5Pq8x07hZFruKHL5p3Cy/ZomZ9lxjTHGlMV6HMYYY8pigcMYY0xZLHAYY4wpiwUOY4wxZbHAYYwxpiwWOIwxxpTFAocxxpiy/F9vUSqEq14JhwAAAABJRU5ErkJggg==\n", 347 | "text/plain": [ 348 | "
" 349 | ] 350 | }, 351 | "metadata": { 352 | "needs_background": "light" 353 | }, 354 | "output_type": "display_data" 355 | } 356 | ], 357 | "source": [ 358 | "import matplotlib.pyplot as plt\n", 359 | "%matplotlib inline\n", 360 | "\n", 361 | "# plot the value of K for KNN (x-axis) versus the cross-validated accuracy (y-axis)\n", 362 | "plt.plot(k_range, k_scores)\n", 363 | "plt.xlabel('Value of K for KNN')\n", 364 | "plt.ylabel('Cross-Validated Accuracy')" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "## Cross-validation example: model selection" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "**Goal:** Compare the best KNN model with logistic regression on the iris dataset" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 13, 384 | "metadata": {}, 385 | "outputs": [ 386 | { 387 | "name": "stdout", 388 | "output_type": "stream", 389 | "text": [ 390 | "0.9800000000000001\n" 391 | ] 392 | } 393 | ], 394 | "source": [ 395 | "# 10-fold cross-validation with the best KNN model\n", 396 | "knn = KNeighborsClassifier(n_neighbors=20)\n", 397 | "print(cross_val_score(knn, X, y, cv=10, scoring='accuracy').mean())" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": 14, 403 | "metadata": {}, 404 | "outputs": [ 405 | { 406 | "name": "stdout", 407 | "output_type": "stream", 408 | "text": [ 409 | "0.9533333333333334\n" 410 | ] 411 | } 412 | ], 413 | "source": [ 414 | "# 10-fold cross-validation with logistic regression\n", 415 | "from sklearn.linear_model import LogisticRegression\n", 416 | "logreg = LogisticRegression(solver='liblinear')\n", 417 | "print(cross_val_score(logreg, X, y, cv=10, scoring='accuracy').mean())" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "## Cross-validation example: feature selection" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "**Goal**: Select whether the Newspaper feature should be included in the linear regression model on the advertising dataset" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 15, 437 | "metadata": {}, 438 | "outputs": [], 439 | "source": [ 440 | "import pandas as pd\n", 441 | "import numpy as np\n", 442 | "from sklearn.linear_model import LinearRegression" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 16, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "# read in the advertising dataset\n", 452 | "data = pd.read_csv('data/Advertising.csv', index_col=0)" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 17, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "# create a Python list of three feature names\n", 462 | "feature_cols = ['TV', 'Radio', 'Newspaper']\n", 463 | "\n", 464 | "# use the list to select a subset of the DataFrame (X)\n", 465 | "X = data[feature_cols]\n", 466 | "\n", 467 | "# select the Sales column as the response (y)\n", 468 | "y = data.Sales" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 18, 474 | "metadata": {}, 475 | "outputs": [ 476 | { 477 | "name": "stdout", 478 | "output_type": "stream", 479 | "text": [ 480 | "[-3.56038438 -3.29767522 -2.08943356 -2.82474283 -1.3027754 -1.74163618\n", 481 | " -8.17338214 -2.11409746 -3.04273109 -2.45281793]\n" 482 | ] 483 | } 484 | ], 485 | "source": [ 486 | "# 10-fold cross-validation with all three features\n", 487 | "lm = LinearRegression()\n", 488 | "scores = cross_val_score(lm, X, y, cv=10, scoring='neg_mean_squared_error')\n", 489 | "print(scores)" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 19, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "name": "stdout", 499 | "output_type": "stream", 500 | "text": [ 501 | "[3.56038438 3.29767522 2.08943356 2.82474283 1.3027754 1.74163618\n", 502 | " 8.17338214 2.11409746 3.04273109 2.45281793]\n" 503 | ] 504 | } 505 | ], 506 | "source": [ 507 | "# fix the sign of MSE scores\n", 508 | "mse_scores = -scores\n", 509 | "print(mse_scores)" 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": 20, 515 | "metadata": {}, 516 | "outputs": [ 517 | { 518 | "name": "stdout", 519 | "output_type": "stream", 520 | "text": [ 521 | "[1.88689808 1.81595022 1.44548731 1.68069713 1.14139187 1.31971064\n", 522 | " 2.85891276 1.45399362 1.7443426 1.56614748]\n" 523 | ] 524 | } 525 | ], 526 | "source": [ 527 | "# convert from MSE to RMSE\n", 528 | "rmse_scores = np.sqrt(mse_scores)\n", 529 | "print(rmse_scores)" 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": 21, 535 | "metadata": {}, 536 | "outputs": [ 537 | { 538 | "name": "stdout", 539 | "output_type": "stream", 540 | "text": [ 541 | "1.6913531708051797\n" 542 | ] 543 | } 544 | ], 545 | "source": [ 546 | "# calculate the average RMSE\n", 547 | "print(rmse_scores.mean())" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 22, 553 | "metadata": {}, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "1.6796748419090768\n" 560 | ] 561 | } 562 | ], 563 | "source": [ 564 | "# 10-fold cross-validation with two features (excluding Newspaper)\n", 565 | "feature_cols = ['TV', 'Radio']\n", 566 | "X = data[feature_cols]\n", 567 | "print(np.sqrt(-cross_val_score(lm, X, y, cv=10, scoring='neg_mean_squared_error')).mean())" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "## Improvements to cross-validation" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "**Repeated cross-validation**\n", 582 | "\n", 583 | "- Repeat cross-validation multiple times (with **different random splits** of the data) and average the results\n", 584 | "- More reliable estimate of out-of-sample performance by **reducing the variance** associated with a single trial of cross-validation\n", 585 | "\n", 586 | "**Creating a hold-out set**\n", 587 | "\n", 588 | "- \"Hold out\" a portion of the data **before** beginning the model building process\n", 589 | "- Locate the best model using cross-validation on the remaining data, and test it **using the hold-out set**\n", 590 | "- More reliable estimate of out-of-sample performance since hold-out set is **truly out-of-sample**\n", 591 | "\n", 592 | "**Feature engineering and selection within cross-validation iterations**\n", 593 | "\n", 594 | "- Normally, feature engineering and selection occurs **before** cross-validation\n", 595 | "- Instead, perform all feature engineering and selection **within each cross-validation iteration**\n", 596 | "- More reliable estimate of out-of-sample performance since it **better mimics** the application of the model to out-of-sample data" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "## Resources\n", 604 | "\n", 605 | "- scikit-learn documentation: [Cross-validation](https://scikit-learn.org/stable/modules/cross_validation.html), [Model evaluation](https://scikit-learn.org/stable/modules/model_evaluation.html)\n", 606 | "- scikit-learn issue on GitHub: [MSE is negative when returned by cross_val_score](https://github.com/scikit-learn/scikit-learn/issues/2439)\n", 607 | "- Section 5.1 of [An Introduction to Statistical Learning](https://www.statlearning.com/) (11 pages) and related videos: [K-fold and leave-one-out cross-validation](https://www.youtube.com/watch?v=rSGzUy13F_0&list=PL5-da3qGB5IA6E6ZNXu7dp89_uv8yocmf&index=2) (14 minutes), [Cross-validation the right and wrong ways](https://www.youtube.com/watch?v=r64tRyHFAJ8&list=PL5-da3qGB5IA6E6ZNXu7dp89_uv8yocmf&index=3) (10 minutes)\n", 608 | "- Scott Fortmann-Roe: [Accurately Measuring Model Prediction Error](http://scott.fortmann-roe.com/docs/MeasuringError.html)\n", 609 | "- Machine Learning Mastery: [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)\n", 610 | "- Harvard CS109: [Cross-Validation: The Right and Wrong Way](https://github.com/cs109/content/blob/master/lec_10_cross_val.ipynb)\n", 611 | "- Journal of Cheminformatics: [Cross-validation pitfalls when selecting and assessing regression and classification models](https://jcheminf.biomedcentral.com/track/pdf/10.1186/1758-2946-6-10.pdf)" 612 | ] 613 | }, 614 | { 615 | "cell_type": "markdown", 616 | "metadata": {}, 617 | "source": [ 618 | "## Comments or Questions?\n", 619 | "\n", 620 | "- Email: \n", 621 | "- Website: https://www.dataschool.io\n", 622 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 623 | "\n", 624 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 625 | ] 626 | } 627 | ], 628 | "metadata": { 629 | "kernelspec": { 630 | "display_name": "Python 3", 631 | "language": "python", 632 | "name": "python3" 633 | }, 634 | "language_info": { 635 | "codemirror_mode": { 636 | "name": "ipython", 637 | "version": 3 638 | }, 639 | "file_extension": ".py", 640 | "mimetype": "text/x-python", 641 | "name": "python", 642 | "nbconvert_exporter": "python", 643 | "pygments_lexer": "ipython3", 644 | "version": "3.9.4" 645 | } 646 | }, 647 | "nbformat": 4, 648 | "nbformat_minor": 1 649 | } 650 | -------------------------------------------------------------------------------- /10_categorical_features.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Building a Machine Learning workflow ([video #10](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10))\n", 8 | "\n", 9 | "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", 10 | "\n", 11 | "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 3.7 and scikit-learn 0.20.2." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Agenda\n", 19 | "\n", 20 | "- Why should you use a Pipeline?\n", 21 | "- How do you encode categorical features with OneHotEncoder?\n", 22 | "- How do you apply OneHotEncoder to selected columns with ColumnTransformer?\n", 23 | "- How do you build and cross-validate a Pipeline?\n", 24 | "- How do you make predictions on new data using a Pipeline?\n", 25 | "- Why should you use scikit-learn (rather than pandas) for preprocessing?" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Step 1: Load the dataset" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 1, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "import pandas as pd" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "df = pd.read_csv('http://bit.ly/kaggletrain')" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 3, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "(891, 12)" 62 | ] 63 | }, 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "df.shape" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## Step 2: Select features" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 4, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "data": { 87 | "text/plain": [ 88 | "Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n", 89 | " 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n", 90 | " dtype='object')" 91 | ] 92 | }, 93 | "execution_count": 4, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | } 97 | ], 98 | "source": [ 99 | "df.columns" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "PassengerId 0\n", 111 | "Survived 0\n", 112 | "Pclass 0\n", 113 | "Name 0\n", 114 | "Sex 0\n", 115 | "Age 177\n", 116 | "SibSp 0\n", 117 | "Parch 0\n", 118 | "Ticket 0\n", 119 | "Fare 0\n", 120 | "Cabin 687\n", 121 | "Embarked 2\n", 122 | "dtype: int64" 123 | ] 124 | }, 125 | "execution_count": 5, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "df.isna().sum()" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 6, 137 | "metadata": {}, 138 | "outputs": [], 139 | "source": [ 140 | "df = df.loc[df.Embarked.notna(), ['Survived', 'Pclass', 'Sex', 'Embarked']]" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 7, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "data": { 150 | "text/plain": [ 151 | "(889, 4)" 152 | ] 153 | }, 154 | "execution_count": 7, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "df.shape" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 8, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "data": { 170 | "text/plain": [ 171 | "Survived 0\n", 172 | "Pclass 0\n", 173 | "Sex 0\n", 174 | "Embarked 0\n", 175 | "dtype: int64" 176 | ] 177 | }, 178 | "execution_count": 8, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "df.isna().sum()" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 9, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "data": { 194 | "text/html": [ 195 | "
\n", 196 | "\n", 209 | "\n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | "
SurvivedPclassSexEmbarked
003maleS
111femaleC
213femaleS
311femaleS
403maleS
\n", 257 | "
" 258 | ], 259 | "text/plain": [ 260 | " Survived Pclass Sex Embarked\n", 261 | "0 0 3 male S\n", 262 | "1 1 1 female C\n", 263 | "2 1 3 female S\n", 264 | "3 1 1 female S\n", 265 | "4 0 3 male S" 266 | ] 267 | }, 268 | "execution_count": 9, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "df.head()" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "## Step 3: Cross-validate a model with one feature" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "metadata": {}, 288 | "outputs": [], 289 | "source": [ 290 | "X = df.loc[:, ['Pclass']]\n", 291 | "y = df.Survived" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 11, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "data": { 301 | "text/plain": [ 302 | "(889, 1)" 303 | ] 304 | }, 305 | "execution_count": 11, 306 | "metadata": {}, 307 | "output_type": "execute_result" 308 | } 309 | ], 310 | "source": [ 311 | "X.shape" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 12, 317 | "metadata": {}, 318 | "outputs": [ 319 | { 320 | "data": { 321 | "text/plain": [ 322 | "(889,)" 323 | ] 324 | }, 325 | "execution_count": 12, 326 | "metadata": {}, 327 | "output_type": "execute_result" 328 | } 329 | ], 330 | "source": [ 331 | "y.shape" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 13, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "from sklearn.linear_model import LogisticRegression" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 14, 346 | "metadata": {}, 347 | "outputs": [], 348 | "source": [ 349 | "logreg = LogisticRegression()" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 15, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "from sklearn.model_selection import cross_val_score" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 16, 364 | "metadata": {}, 365 | "outputs": [ 366 | { 367 | "data": { 368 | "text/plain": [ 369 | "0.6783406335301212" 370 | ] 371 | }, 372 | "execution_count": 16, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "cross_val_score(logreg, X, y, cv=5, scoring='accuracy').mean()" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 17, 384 | "metadata": {}, 385 | "outputs": [ 386 | { 387 | "data": { 388 | "text/plain": [ 389 | "0 0.617548\n", 390 | "1 0.382452\n", 391 | "Name: Survived, dtype: float64" 392 | ] 393 | }, 394 | "execution_count": 17, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "y.value_counts(normalize=True)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "## Step 4: Encode categorical features" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 18, 413 | "metadata": {}, 414 | "outputs": [ 415 | { 416 | "data": { 417 | "text/html": [ 418 | "
\n", 419 | "\n", 432 | "\n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | "
SurvivedPclassSexEmbarked
003maleS
111femaleC
213femaleS
311femaleS
403maleS
\n", 480 | "
" 481 | ], 482 | "text/plain": [ 483 | " Survived Pclass Sex Embarked\n", 484 | "0 0 3 male S\n", 485 | "1 1 1 female C\n", 486 | "2 1 3 female S\n", 487 | "3 1 1 female S\n", 488 | "4 0 3 male S" 489 | ] 490 | }, 491 | "execution_count": 18, 492 | "metadata": {}, 493 | "output_type": "execute_result" 494 | } 495 | ], 496 | "source": [ 497 | "df.head()" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 19, 503 | "metadata": {}, 504 | "outputs": [], 505 | "source": [ 506 | "# dummy encoding of categorical features\n", 507 | "from sklearn.preprocessing import OneHotEncoder\n", 508 | "ohe = OneHotEncoder(sparse=False)" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 20, 514 | "metadata": {}, 515 | "outputs": [ 516 | { 517 | "data": { 518 | "text/plain": [ 519 | "array([[0., 1.],\n", 520 | " [1., 0.],\n", 521 | " [1., 0.],\n", 522 | " ...,\n", 523 | " [1., 0.],\n", 524 | " [0., 1.],\n", 525 | " [0., 1.]])" 526 | ] 527 | }, 528 | "execution_count": 20, 529 | "metadata": {}, 530 | "output_type": "execute_result" 531 | } 532 | ], 533 | "source": [ 534 | "ohe.fit_transform(df[['Sex']])" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 21, 540 | "metadata": {}, 541 | "outputs": [ 542 | { 543 | "data": { 544 | "text/plain": [ 545 | "[array(['female', 'male'], dtype=object)]" 546 | ] 547 | }, 548 | "execution_count": 21, 549 | "metadata": {}, 550 | "output_type": "execute_result" 551 | } 552 | ], 553 | "source": [ 554 | "ohe.categories_" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 22, 560 | "metadata": {}, 561 | "outputs": [ 562 | { 563 | "data": { 564 | "text/plain": [ 565 | "array([[0., 0., 1.],\n", 566 | " [1., 0., 0.],\n", 567 | " [0., 0., 1.],\n", 568 | " ...,\n", 569 | " [0., 0., 1.],\n", 570 | " [1., 0., 0.],\n", 571 | " [0., 1., 0.]])" 572 | ] 573 | }, 574 | "execution_count": 22, 575 | "metadata": {}, 576 | "output_type": "execute_result" 577 | } 578 | ], 579 | "source": [ 580 | "ohe.fit_transform(df[['Embarked']])" 581 | ] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": 23, 586 | "metadata": {}, 587 | "outputs": [ 588 | { 589 | "data": { 590 | "text/plain": [ 591 | "[array(['C', 'Q', 'S'], dtype=object)]" 592 | ] 593 | }, 594 | "execution_count": 23, 595 | "metadata": {}, 596 | "output_type": "execute_result" 597 | } 598 | ], 599 | "source": [ 600 | "ohe.categories_" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "## Step 5: Cross-validate a Pipeline with all features" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 24, 613 | "metadata": {}, 614 | "outputs": [], 615 | "source": [ 616 | "X = df.drop('Survived', axis='columns')" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": 25, 622 | "metadata": {}, 623 | "outputs": [ 624 | { 625 | "data": { 626 | "text/html": [ 627 | "
\n", 628 | "\n", 641 | "\n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | "
PclassSexEmbarked
03maleS
11femaleC
23femaleS
31femaleS
43maleS
\n", 683 | "
" 684 | ], 685 | "text/plain": [ 686 | " Pclass Sex Embarked\n", 687 | "0 3 male S\n", 688 | "1 1 female C\n", 689 | "2 3 female S\n", 690 | "3 1 female S\n", 691 | "4 3 male S" 692 | ] 693 | }, 694 | "execution_count": 25, 695 | "metadata": {}, 696 | "output_type": "execute_result" 697 | } 698 | ], 699 | "source": [ 700 | "X.head()" 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "execution_count": 26, 706 | "metadata": {}, 707 | "outputs": [], 708 | "source": [ 709 | "# use when different features need different preprocessing\n", 710 | "from sklearn.compose import make_column_transformer" 711 | ] 712 | }, 713 | { 714 | "cell_type": "code", 715 | "execution_count": 27, 716 | "metadata": {}, 717 | "outputs": [], 718 | "source": [ 719 | "column_trans = make_column_transformer(\n", 720 | " (OneHotEncoder(), ['Sex', 'Embarked']),\n", 721 | " remainder='passthrough')" 722 | ] 723 | }, 724 | { 725 | "cell_type": "code", 726 | "execution_count": 28, 727 | "metadata": {}, 728 | "outputs": [ 729 | { 730 | "data": { 731 | "text/plain": [ 732 | "array([[0., 1., 0., 0., 1., 3.],\n", 733 | " [1., 0., 1., 0., 0., 1.],\n", 734 | " [1., 0., 0., 0., 1., 3.],\n", 735 | " ...,\n", 736 | " [1., 0., 0., 0., 1., 3.],\n", 737 | " [0., 1., 1., 0., 0., 1.],\n", 738 | " [0., 1., 0., 1., 0., 3.]])" 739 | ] 740 | }, 741 | "execution_count": 28, 742 | "metadata": {}, 743 | "output_type": "execute_result" 744 | } 745 | ], 746 | "source": [ 747 | "column_trans.fit_transform(X)" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": 29, 753 | "metadata": {}, 754 | "outputs": [], 755 | "source": [ 756 | "# chain sequential steps together\n", 757 | "from sklearn.pipeline import make_pipeline" 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": 30, 763 | "metadata": {}, 764 | "outputs": [], 765 | "source": [ 766 | "pipe = make_pipeline(column_trans, logreg)" 767 | ] 768 | }, 769 | { 770 | "cell_type": "code", 771 | "execution_count": 31, 772 | "metadata": {}, 773 | "outputs": [ 774 | { 775 | "data": { 776 | "text/plain": [ 777 | "0.7727924839713071" 778 | ] 779 | }, 780 | "execution_count": 31, 781 | "metadata": {}, 782 | "output_type": "execute_result" 783 | } 784 | ], 785 | "source": [ 786 | "# cross-validate the entire process\n", 787 | "# thus, preprocessing occurs within each fold of cross-validation\n", 788 | "cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()" 789 | ] 790 | }, 791 | { 792 | "cell_type": "markdown", 793 | "metadata": {}, 794 | "source": [ 795 | "## Step 6: Make predictions on \"new\" data" 796 | ] 797 | }, 798 | { 799 | "cell_type": "code", 800 | "execution_count": 32, 801 | "metadata": {}, 802 | "outputs": [], 803 | "source": [ 804 | "# added empty cell so that the cell numbering matches the video" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": 33, 810 | "metadata": { 811 | "scrolled": true 812 | }, 813 | "outputs": [ 814 | { 815 | "data": { 816 | "text/html": [ 817 | "
\n", 818 | "\n", 831 | "\n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | "
PclassSexEmbarked
5991maleC
5121maleS
2731maleC
2151femaleC
7903maleQ
\n", 873 | "
" 874 | ], 875 | "text/plain": [ 876 | " Pclass Sex Embarked\n", 877 | "599 1 male C\n", 878 | "512 1 male S\n", 879 | "273 1 male C\n", 880 | "215 1 female C\n", 881 | "790 3 male Q" 882 | ] 883 | }, 884 | "execution_count": 33, 885 | "metadata": {}, 886 | "output_type": "execute_result" 887 | } 888 | ], 889 | "source": [ 890 | "X_new = X.sample(5, random_state=99)\n", 891 | "X_new" 892 | ] 893 | }, 894 | { 895 | "cell_type": "code", 896 | "execution_count": 34, 897 | "metadata": { 898 | "scrolled": true 899 | }, 900 | "outputs": [ 901 | { 902 | "data": { 903 | "text/plain": [ 904 | "Pipeline(steps=[('columntransformer',\n", 905 | " ColumnTransformer(remainder='passthrough',\n", 906 | " transformers=[('onehotencoder',\n", 907 | " OneHotEncoder(),\n", 908 | " ['Sex', 'Embarked'])])),\n", 909 | " ('logisticregression', LogisticRegression())])" 910 | ] 911 | }, 912 | "execution_count": 34, 913 | "metadata": {}, 914 | "output_type": "execute_result" 915 | } 916 | ], 917 | "source": [ 918 | "pipe.fit(X, y)" 919 | ] 920 | }, 921 | { 922 | "cell_type": "code", 923 | "execution_count": 35, 924 | "metadata": {}, 925 | "outputs": [ 926 | { 927 | "data": { 928 | "text/plain": [ 929 | "array([1, 0, 1, 1, 0])" 930 | ] 931 | }, 932 | "execution_count": 35, 933 | "metadata": {}, 934 | "output_type": "execute_result" 935 | } 936 | ], 937 | "source": [ 938 | "pipe.predict(X_new)" 939 | ] 940 | }, 941 | { 942 | "cell_type": "markdown", 943 | "metadata": {}, 944 | "source": [ 945 | "## Recap" 946 | ] 947 | }, 948 | { 949 | "cell_type": "code", 950 | "execution_count": 36, 951 | "metadata": {}, 952 | "outputs": [], 953 | "source": [ 954 | "import pandas as pd\n", 955 | "from sklearn.compose import make_column_transformer\n", 956 | "from sklearn.preprocessing import OneHotEncoder\n", 957 | "from sklearn.linear_model import LogisticRegression\n", 958 | "from sklearn.pipeline import make_pipeline\n", 959 | "from sklearn.model_selection import cross_val_score" 960 | ] 961 | }, 962 | { 963 | "cell_type": "code", 964 | "execution_count": 37, 965 | "metadata": {}, 966 | "outputs": [], 967 | "source": [ 968 | "df = pd.read_csv('http://bit.ly/kaggletrain')\n", 969 | "df = df.loc[df.Embarked.notna(), ['Survived', 'Pclass', 'Sex', 'Embarked']]\n", 970 | "X = df.drop('Survived', axis='columns')\n", 971 | "y = df.Survived" 972 | ] 973 | }, 974 | { 975 | "cell_type": "code", 976 | "execution_count": 38, 977 | "metadata": {}, 978 | "outputs": [], 979 | "source": [ 980 | "column_trans = make_column_transformer(\n", 981 | " (OneHotEncoder(), ['Sex', 'Embarked']),\n", 982 | " remainder='passthrough')\n", 983 | "logreg = LogisticRegression(solver='lbfgs')" 984 | ] 985 | }, 986 | { 987 | "cell_type": "code", 988 | "execution_count": 39, 989 | "metadata": {}, 990 | "outputs": [], 991 | "source": [ 992 | "pipe = make_pipeline(column_trans, logreg)" 993 | ] 994 | }, 995 | { 996 | "cell_type": "code", 997 | "execution_count": 40, 998 | "metadata": {}, 999 | "outputs": [ 1000 | { 1001 | "data": { 1002 | "text/plain": [ 1003 | "0.7727924839713071" 1004 | ] 1005 | }, 1006 | "execution_count": 40, 1007 | "metadata": {}, 1008 | "output_type": "execute_result" 1009 | } 1010 | ], 1011 | "source": [ 1012 | "cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": 41, 1018 | "metadata": {}, 1019 | "outputs": [], 1020 | "source": [ 1021 | "X_new = X.sample(5, random_state=99)" 1022 | ] 1023 | }, 1024 | { 1025 | "cell_type": "code", 1026 | "execution_count": 42, 1027 | "metadata": {}, 1028 | "outputs": [ 1029 | { 1030 | "data": { 1031 | "text/plain": [ 1032 | "array([1, 0, 1, 1, 0])" 1033 | ] 1034 | }, 1035 | "execution_count": 42, 1036 | "metadata": {}, 1037 | "output_type": "execute_result" 1038 | } 1039 | ], 1040 | "source": [ 1041 | "pipe.fit(X, y)\n", 1042 | "pipe.predict(X_new)" 1043 | ] 1044 | }, 1045 | { 1046 | "cell_type": "markdown", 1047 | "metadata": {}, 1048 | "source": [ 1049 | "## Comments or Questions?\n", 1050 | "\n", 1051 | "- Email: \n", 1052 | "- Website: https://www.dataschool.io\n", 1053 | "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", 1054 | "\n", 1055 | "© 2021 [Data School](https://www.dataschool.io). All rights reserved." 1056 | ] 1057 | } 1058 | ], 1059 | "metadata": { 1060 | "kernelspec": { 1061 | "display_name": "Python 3", 1062 | "language": "python", 1063 | "name": "python3" 1064 | }, 1065 | "language_info": { 1066 | "codemirror_mode": { 1067 | "name": "ipython", 1068 | "version": 3 1069 | }, 1070 | "file_extension": ".py", 1071 | "mimetype": "text/x-python", 1072 | "name": "python", 1073 | "nbconvert_exporter": "python", 1074 | "pygments_lexer": "ipython3", 1075 | "version": "3.9.4" 1076 | } 1077 | }, 1078 | "nbformat": 4, 1079 | "nbformat_minor": 2 1080 | } 1081 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to Machine Learning with scikit-learn 2 | 3 | This video series will teach you how to solve Machine Learning problems using Python's popular scikit-learn library. There are **10 video tutorials** totaling 4.5 hours, each with a corresponding **Jupyter notebook**. 4 | 5 | You can [watch the entire series](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A) on YouTube and [view all of the notebooks](http://nbviewer.jupyter.org/github/justmarkham/scikit-learn-videos/tree/master/) using nbviewer. 6 | 7 | The series is also available as a [free online course](https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn) that includes updated content, quizzes, and a certificate of completion. 8 | 9 | [![Watch the first tutorial video](images/youtube.png)](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1 "Watch the first tutorial video") 10 | 11 | **Note:** The notebooks in this repository have been updated to use Python 3.9.1 and scikit-learn 0.23.2. The original notebooks (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive). You can read about how I updated the code in this [blog post](https://www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/). 12 | 13 | ## Table of Contents 14 | 15 | 1. What is Machine Learning, and how does it work? ([video](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1), [notebook](01_machine_learning_intro.ipynb)) 16 | - What is Machine Learning? 17 | - What are the two main categories of Machine Learning? 18 | - What are some examples of Machine Learning? 19 | - How does Machine Learning "work"? 20 | 21 | 2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2), [notebook](02_machine_learning_setup.ipynb)) 22 | - What are the benefits and drawbacks of scikit-learn? 23 | - How do I install scikit-learn? 24 | - How do I use the Jupyter Notebook? 25 | - What are some good resources for learning Python? 26 | 27 | 3. Getting started in scikit-learn with the famous iris dataset ([video](https://www.youtube.com/watch?v=hd1W4CyPX58&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=3), [notebook](03_getting_started_with_iris.ipynb)) 28 | - What is the famous iris dataset, and how does it relate to Machine Learning? 29 | - How do we load the iris dataset into scikit-learn? 30 | - How do we describe a dataset using Machine Learning terminology? 31 | - What are scikit-learn's four key requirements for working with data? 32 | 33 | 4. Training a Machine Learning model with scikit-learn ([video](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4), [notebook](04_model_training.ipynb)) 34 | - What is the K-nearest neighbors classification model? 35 | - What are the four steps for model training and prediction in scikit-learn? 36 | - How can I apply this pattern to other Machine Learning models? 37 | 38 | 5. Comparing Machine Learning models in scikit-learn ([video](https://www.youtube.com/watch?v=0pP4EwWJgIU&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=5), [notebook](05_model_evaluation.ipynb)) 39 | - How do I choose which model to use for my supervised learning task? 40 | - How do I choose the best tuning parameters for that model? 41 | - How do I estimate the likely performance of my model on out-of-sample data? 42 | 43 | 6. Data science pipeline: pandas, seaborn, scikit-learn ([video](https://www.youtube.com/watch?v=3ZWuPVWq7p4&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=6), [notebook](06_linear_regression.ipynb)) 44 | - How do I use the pandas library to read data into Python? 45 | - How do I use the seaborn library to visualize data? 46 | - What is linear regression, and how does it work? 47 | - How do I train and interpret a linear regression model in scikit-learn? 48 | - What are some evaluation metrics for regression problems? 49 | - How do I choose which features to include in my model? 50 | 51 | 7. Cross-validation for parameter tuning, model selection, and feature selection ([video](https://www.youtube.com/watch?v=6dbrR-WymjI&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=7), [notebook](07_cross_validation.ipynb)) 52 | - What is the drawback of using the train/test split procedure for model evaluation? 53 | - How does K-fold cross-validation overcome this limitation? 54 | - How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features? 55 | - What are some possible improvements to cross-validation? 56 | 57 | 8. Efficiently searching for optimal tuning parameters ([video](https://www.youtube.com/watch?v=Gol_qOgRqfA&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=8), [notebook](08_grid_search.ipynb)) 58 | - How can K-fold cross-validation be used to search for an optimal tuning parameter? 59 | - How can this process be made more efficient? 60 | - How do you search for multiple tuning parameters at once? 61 | - What do you do with those tuning parameters before making real predictions? 62 | - How can the computational expense of this process be reduced? 63 | 64 | 9. Evaluating a classification model ([video](https://www.youtube.com/watch?v=85dtiMz9tSo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=9), [notebook](09_classification_metrics.ipynb)) 65 | - What is the purpose of model evaluation, and what are some common evaluation procedures? 66 | - What is the usage of classification accuracy, and what are its limitations? 67 | - How does a confusion matrix describe the performance of a classifier? 68 | - What metrics can be computed from a confusion matrix? 69 | - How can you adjust classifier performance by changing the classification threshold? 70 | - What is the purpose of an ROC curve? 71 | - How does Area Under the Curve (AUC) differ from classification accuracy? 72 | 73 | 10. Building a Machine Learning workflow ([video](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10), [notebook](10_categorical_features.ipynb)) 74 | - Why should you use a Pipeline? 75 | - How do you encode categorical features with OneHotEncoder? 76 | - How do you apply OneHotEncoder to selected columns with ColumnTransformer? 77 | - How do you build and cross-validate a Pipeline? 78 | - How do you make predictions on new data using a Pipeline? 79 | - Why should you use scikit-learn (rather than pandas) for preprocessing? 80 | 81 | ## Bonus Video 82 | 83 | At the PyCon 2016 conference, I taught a **3-hour tutorial** that builds upon this video series and focuses on **text-based data**. You can watch the [tutorial video](https://www.youtube.com/watch?v=ZiKMIuYidY0&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=11) on YouTube. 84 | 85 | Here are the topics I covered: 86 | 87 | 1. Model building in scikit-learn (refresher) 88 | 2. Representing text as numerical data 89 | 3. Reading a text-based dataset into pandas 90 | 4. Vectorizing our dataset 91 | 5. Building and evaluating a model 92 | 6. Comparing models 93 | 7. Examining a model for further insight 94 | 8. Practicing this workflow on another dataset 95 | 9. Tuning the vectorizer (discussion) 96 | 97 | Visit this [GitHub repository](https://github.com/justmarkham/pycon-2016-tutorial) to access the tutorial notebooks and many other recommended resources. 98 | -------------------------------------------------------------------------------- /data/Advertising.csv: -------------------------------------------------------------------------------- 1 | ,TV,Radio,Newspaper,Sales 2 | 1,230.1,37.8,69.2,22.1 3 | 2,44.5,39.3,45.1,10.4 4 | 3,17.2,45.9,69.3,9.3 5 | 4,151.5,41.3,58.5,18.5 6 | 5,180.8,10.8,58.4,12.9 7 | 6,8.7,48.9,75,7.2 8 | 7,57.5,32.8,23.5,11.8 9 | 8,120.2,19.6,11.6,13.2 10 | 9,8.6,2.1,1,4.8 11 | 10,199.8,2.6,21.2,10.6 12 | 11,66.1,5.8,24.2,8.6 13 | 12,214.7,24,4,17.4 14 | 13,23.8,35.1,65.9,9.2 15 | 14,97.5,7.6,7.2,9.7 16 | 15,204.1,32.9,46,19 17 | 16,195.4,47.7,52.9,22.4 18 | 17,67.8,36.6,114,12.5 19 | 18,281.4,39.6,55.8,24.4 20 | 19,69.2,20.5,18.3,11.3 21 | 20,147.3,23.9,19.1,14.6 22 | 21,218.4,27.7,53.4,18 23 | 22,237.4,5.1,23.5,12.5 24 | 23,13.2,15.9,49.6,5.6 25 | 24,228.3,16.9,26.2,15.5 26 | 25,62.3,12.6,18.3,9.7 27 | 26,262.9,3.5,19.5,12 28 | 27,142.9,29.3,12.6,15 29 | 28,240.1,16.7,22.9,15.9 30 | 29,248.8,27.1,22.9,18.9 31 | 30,70.6,16,40.8,10.5 32 | 31,292.9,28.3,43.2,21.4 33 | 32,112.9,17.4,38.6,11.9 34 | 33,97.2,1.5,30,9.6 35 | 34,265.6,20,0.3,17.4 36 | 35,95.7,1.4,7.4,9.5 37 | 36,290.7,4.1,8.5,12.8 38 | 37,266.9,43.8,5,25.4 39 | 38,74.7,49.4,45.7,14.7 40 | 39,43.1,26.7,35.1,10.1 41 | 40,228,37.7,32,21.5 42 | 41,202.5,22.3,31.6,16.6 43 | 42,177,33.4,38.7,17.1 44 | 43,293.6,27.7,1.8,20.7 45 | 44,206.9,8.4,26.4,12.9 46 | 45,25.1,25.7,43.3,8.5 47 | 46,175.1,22.5,31.5,14.9 48 | 47,89.7,9.9,35.7,10.6 49 | 48,239.9,41.5,18.5,23.2 50 | 49,227.2,15.8,49.9,14.8 51 | 50,66.9,11.7,36.8,9.7 52 | 51,199.8,3.1,34.6,11.4 53 | 52,100.4,9.6,3.6,10.7 54 | 53,216.4,41.7,39.6,22.6 55 | 54,182.6,46.2,58.7,21.2 56 | 55,262.7,28.8,15.9,20.2 57 | 56,198.9,49.4,60,23.7 58 | 57,7.3,28.1,41.4,5.5 59 | 58,136.2,19.2,16.6,13.2 60 | 59,210.8,49.6,37.7,23.8 61 | 60,210.7,29.5,9.3,18.4 62 | 61,53.5,2,21.4,8.1 63 | 62,261.3,42.7,54.7,24.2 64 | 63,239.3,15.5,27.3,15.7 65 | 64,102.7,29.6,8.4,14 66 | 65,131.1,42.8,28.9,18 67 | 66,69,9.3,0.9,9.3 68 | 67,31.5,24.6,2.2,9.5 69 | 68,139.3,14.5,10.2,13.4 70 | 69,237.4,27.5,11,18.9 71 | 70,216.8,43.9,27.2,22.3 72 | 71,199.1,30.6,38.7,18.3 73 | 72,109.8,14.3,31.7,12.4 74 | 73,26.8,33,19.3,8.8 75 | 74,129.4,5.7,31.3,11 76 | 75,213.4,24.6,13.1,17 77 | 76,16.9,43.7,89.4,8.7 78 | 77,27.5,1.6,20.7,6.9 79 | 78,120.5,28.5,14.2,14.2 80 | 79,5.4,29.9,9.4,5.3 81 | 80,116,7.7,23.1,11 82 | 81,76.4,26.7,22.3,11.8 83 | 82,239.8,4.1,36.9,12.3 84 | 83,75.3,20.3,32.5,11.3 85 | 84,68.4,44.5,35.6,13.6 86 | 85,213.5,43,33.8,21.7 87 | 86,193.2,18.4,65.7,15.2 88 | 87,76.3,27.5,16,12 89 | 88,110.7,40.6,63.2,16 90 | 89,88.3,25.5,73.4,12.9 91 | 90,109.8,47.8,51.4,16.7 92 | 91,134.3,4.9,9.3,11.2 93 | 92,28.6,1.5,33,7.3 94 | 93,217.7,33.5,59,19.4 95 | 94,250.9,36.5,72.3,22.2 96 | 95,107.4,14,10.9,11.5 97 | 96,163.3,31.6,52.9,16.9 98 | 97,197.6,3.5,5.9,11.7 99 | 98,184.9,21,22,15.5 100 | 99,289.7,42.3,51.2,25.4 101 | 100,135.2,41.7,45.9,17.2 102 | 101,222.4,4.3,49.8,11.7 103 | 102,296.4,36.3,100.9,23.8 104 | 103,280.2,10.1,21.4,14.8 105 | 104,187.9,17.2,17.9,14.7 106 | 105,238.2,34.3,5.3,20.7 107 | 106,137.9,46.4,59,19.2 108 | 107,25,11,29.7,7.2 109 | 108,90.4,0.3,23.2,8.7 110 | 109,13.1,0.4,25.6,5.3 111 | 110,255.4,26.9,5.5,19.8 112 | 111,225.8,8.2,56.5,13.4 113 | 112,241.7,38,23.2,21.8 114 | 113,175.7,15.4,2.4,14.1 115 | 114,209.6,20.6,10.7,15.9 116 | 115,78.2,46.8,34.5,14.6 117 | 116,75.1,35,52.7,12.6 118 | 117,139.2,14.3,25.6,12.2 119 | 118,76.4,0.8,14.8,9.4 120 | 119,125.7,36.9,79.2,15.9 121 | 120,19.4,16,22.3,6.6 122 | 121,141.3,26.8,46.2,15.5 123 | 122,18.8,21.7,50.4,7 124 | 123,224,2.4,15.6,11.6 125 | 124,123.1,34.6,12.4,15.2 126 | 125,229.5,32.3,74.2,19.7 127 | 126,87.2,11.8,25.9,10.6 128 | 127,7.8,38.9,50.6,6.6 129 | 128,80.2,0,9.2,8.8 130 | 129,220.3,49,3.2,24.7 131 | 130,59.6,12,43.1,9.7 132 | 131,0.7,39.6,8.7,1.6 133 | 132,265.2,2.9,43,12.7 134 | 133,8.4,27.2,2.1,5.7 135 | 134,219.8,33.5,45.1,19.6 136 | 135,36.9,38.6,65.6,10.8 137 | 136,48.3,47,8.5,11.6 138 | 137,25.6,39,9.3,9.5 139 | 138,273.7,28.9,59.7,20.8 140 | 139,43,25.9,20.5,9.6 141 | 140,184.9,43.9,1.7,20.7 142 | 141,73.4,17,12.9,10.9 143 | 142,193.7,35.4,75.6,19.2 144 | 143,220.5,33.2,37.9,20.1 145 | 144,104.6,5.7,34.4,10.4 146 | 145,96.2,14.8,38.9,11.4 147 | 146,140.3,1.9,9,10.3 148 | 147,240.1,7.3,8.7,13.2 149 | 148,243.2,49,44.3,25.4 150 | 149,38,40.3,11.9,10.9 151 | 150,44.7,25.8,20.6,10.1 152 | 151,280.7,13.9,37,16.1 153 | 152,121,8.4,48.7,11.6 154 | 153,197.6,23.3,14.2,16.6 155 | 154,171.3,39.7,37.7,19 156 | 155,187.8,21.1,9.5,15.6 157 | 156,4.1,11.6,5.7,3.2 158 | 157,93.9,43.5,50.5,15.3 159 | 158,149.8,1.3,24.3,10.1 160 | 159,11.7,36.9,45.2,7.3 161 | 160,131.7,18.4,34.6,12.9 162 | 161,172.5,18.1,30.7,14.4 163 | 162,85.7,35.8,49.3,13.3 164 | 163,188.4,18.1,25.6,14.9 165 | 164,163.5,36.8,7.4,18 166 | 165,117.2,14.7,5.4,11.9 167 | 166,234.5,3.4,84.8,11.9 168 | 167,17.9,37.6,21.6,8 169 | 168,206.8,5.2,19.4,12.2 170 | 169,215.4,23.6,57.6,17.1 171 | 170,284.3,10.6,6.4,15 172 | 171,50,11.6,18.4,8.4 173 | 172,164.5,20.9,47.4,14.5 174 | 173,19.6,20.1,17,7.6 175 | 174,168.4,7.1,12.8,11.7 176 | 175,222.4,3.4,13.1,11.5 177 | 176,276.9,48.9,41.8,27 178 | 177,248.4,30.2,20.3,20.2 179 | 178,170.2,7.8,35.2,11.7 180 | 179,276.7,2.3,23.7,11.8 181 | 180,165.6,10,17.6,12.6 182 | 181,156.6,2.6,8.3,10.5 183 | 182,218.5,5.4,27.4,12.2 184 | 183,56.2,5.7,29.7,8.7 185 | 184,287.6,43,71.8,26.2 186 | 185,253.8,21.3,30,17.6 187 | 186,205,45.1,19.6,22.6 188 | 187,139.5,2.1,26.6,10.3 189 | 188,191.1,28.7,18.2,17.3 190 | 189,286,13.9,3.7,15.9 191 | 190,18.7,12.1,23.4,6.7 192 | 191,39.5,41.1,5.8,10.8 193 | 192,75.5,10.8,6,9.9 194 | 193,17.2,4.1,31.6,5.9 195 | 194,166.8,42,3.6,19.6 196 | 195,149.7,35.6,6,17.3 197 | 196,38.2,3.7,13.8,7.6 198 | 197,94.2,4.9,8.1,9.7 199 | 198,177,9.3,6.4,12.8 200 | 199,283.6,42,66.2,25.5 201 | 200,232.1,8.6,8.7,13.4 202 | -------------------------------------------------------------------------------- /data/pima-indians-diabetes.data: -------------------------------------------------------------------------------- 1 | 6,148,72,35,0,33.6,0.627,50,1 2 | 1,85,66,29,0,26.6,0.351,31,0 3 | 8,183,64,0,0,23.3,0.672,32,1 4 | 1,89,66,23,94,28.1,0.167,21,0 5 | 0,137,40,35,168,43.1,2.288,33,1 6 | 5,116,74,0,0,25.6,0.201,30,0 7 | 3,78,50,32,88,31,0.248,26,1 8 | 10,115,0,0,0,35.3,0.134,29,0 9 | 2,197,70,45,543,30.5,0.158,53,1 10 | 8,125,96,0,0,0,0.232,54,1 11 | 4,110,92,0,0,37.6,0.191,30,0 12 | 10,168,74,0,0,38,0.537,34,1 13 | 10,139,80,0,0,27.1,1.441,57,0 14 | 1,189,60,23,846,30.1,0.398,59,1 15 | 5,166,72,19,175,25.8,0.587,51,1 16 | 7,100,0,0,0,30,0.484,32,1 17 | 0,118,84,47,230,45.8,0.551,31,1 18 | 7,107,74,0,0,29.6,0.254,31,1 19 | 1,103,30,38,83,43.3,0.183,33,0 20 | 1,115,70,30,96,34.6,0.529,32,1 21 | 3,126,88,41,235,39.3,0.704,27,0 22 | 8,99,84,0,0,35.4,0.388,50,0 23 | 7,196,90,0,0,39.8,0.451,41,1 24 | 9,119,80,35,0,29,0.263,29,1 25 | 11,143,94,33,146,36.6,0.254,51,1 26 | 10,125,70,26,115,31.1,0.205,41,1 27 | 7,147,76,0,0,39.4,0.257,43,1 28 | 1,97,66,15,140,23.2,0.487,22,0 29 | 13,145,82,19,110,22.2,0.245,57,0 30 | 5,117,92,0,0,34.1,0.337,38,0 31 | 5,109,75,26,0,36,0.546,60,0 32 | 3,158,76,36,245,31.6,0.851,28,1 33 | 3,88,58,11,54,24.8,0.267,22,0 34 | 6,92,92,0,0,19.9,0.188,28,0 35 | 10,122,78,31,0,27.6,0.512,45,0 36 | 4,103,60,33,192,24,0.966,33,0 37 | 11,138,76,0,0,33.2,0.42,35,0 38 | 9,102,76,37,0,32.9,0.665,46,1 39 | 2,90,68,42,0,38.2,0.503,27,1 40 | 4,111,72,47,207,37.1,1.39,56,1 41 | 3,180,64,25,70,34,0.271,26,0 42 | 7,133,84,0,0,40.2,0.696,37,0 43 | 7,106,92,18,0,22.7,0.235,48,0 44 | 9,171,110,24,240,45.4,0.721,54,1 45 | 7,159,64,0,0,27.4,0.294,40,0 46 | 0,180,66,39,0,42,1.893,25,1 47 | 1,146,56,0,0,29.7,0.564,29,0 48 | 2,71,70,27,0,28,0.586,22,0 49 | 7,103,66,32,0,39.1,0.344,31,1 50 | 7,105,0,0,0,0,0.305,24,0 51 | 1,103,80,11,82,19.4,0.491,22,0 52 | 1,101,50,15,36,24.2,0.526,26,0 53 | 5,88,66,21,23,24.4,0.342,30,0 54 | 8,176,90,34,300,33.7,0.467,58,1 55 | 7,150,66,42,342,34.7,0.718,42,0 56 | 1,73,50,10,0,23,0.248,21,0 57 | 7,187,68,39,304,37.7,0.254,41,1 58 | 0,100,88,60,110,46.8,0.962,31,0 59 | 0,146,82,0,0,40.5,1.781,44,0 60 | 0,105,64,41,142,41.5,0.173,22,0 61 | 2,84,0,0,0,0,0.304,21,0 62 | 8,133,72,0,0,32.9,0.27,39,1 63 | 5,44,62,0,0,25,0.587,36,0 64 | 2,141,58,34,128,25.4,0.699,24,0 65 | 7,114,66,0,0,32.8,0.258,42,1 66 | 5,99,74,27,0,29,0.203,32,0 67 | 0,109,88,30,0,32.5,0.855,38,1 68 | 2,109,92,0,0,42.7,0.845,54,0 69 | 1,95,66,13,38,19.6,0.334,25,0 70 | 4,146,85,27,100,28.9,0.189,27,0 71 | 2,100,66,20,90,32.9,0.867,28,1 72 | 5,139,64,35,140,28.6,0.411,26,0 73 | 13,126,90,0,0,43.4,0.583,42,1 74 | 4,129,86,20,270,35.1,0.231,23,0 75 | 1,79,75,30,0,32,0.396,22,0 76 | 1,0,48,20,0,24.7,0.14,22,0 77 | 7,62,78,0,0,32.6,0.391,41,0 78 | 5,95,72,33,0,37.7,0.37,27,0 79 | 0,131,0,0,0,43.2,0.27,26,1 80 | 2,112,66,22,0,25,0.307,24,0 81 | 3,113,44,13,0,22.4,0.14,22,0 82 | 2,74,0,0,0,0,0.102,22,0 83 | 7,83,78,26,71,29.3,0.767,36,0 84 | 0,101,65,28,0,24.6,0.237,22,0 85 | 5,137,108,0,0,48.8,0.227,37,1 86 | 2,110,74,29,125,32.4,0.698,27,0 87 | 13,106,72,54,0,36.6,0.178,45,0 88 | 2,100,68,25,71,38.5,0.324,26,0 89 | 15,136,70,32,110,37.1,0.153,43,1 90 | 1,107,68,19,0,26.5,0.165,24,0 91 | 1,80,55,0,0,19.1,0.258,21,0 92 | 4,123,80,15,176,32,0.443,34,0 93 | 7,81,78,40,48,46.7,0.261,42,0 94 | 4,134,72,0,0,23.8,0.277,60,1 95 | 2,142,82,18,64,24.7,0.761,21,0 96 | 6,144,72,27,228,33.9,0.255,40,0 97 | 2,92,62,28,0,31.6,0.13,24,0 98 | 1,71,48,18,76,20.4,0.323,22,0 99 | 6,93,50,30,64,28.7,0.356,23,0 100 | 1,122,90,51,220,49.7,0.325,31,1 101 | 1,163,72,0,0,39,1.222,33,1 102 | 1,151,60,0,0,26.1,0.179,22,0 103 | 0,125,96,0,0,22.5,0.262,21,0 104 | 1,81,72,18,40,26.6,0.283,24,0 105 | 2,85,65,0,0,39.6,0.93,27,0 106 | 1,126,56,29,152,28.7,0.801,21,0 107 | 1,96,122,0,0,22.4,0.207,27,0 108 | 4,144,58,28,140,29.5,0.287,37,0 109 | 3,83,58,31,18,34.3,0.336,25,0 110 | 0,95,85,25,36,37.4,0.247,24,1 111 | 3,171,72,33,135,33.3,0.199,24,1 112 | 8,155,62,26,495,34,0.543,46,1 113 | 1,89,76,34,37,31.2,0.192,23,0 114 | 4,76,62,0,0,34,0.391,25,0 115 | 7,160,54,32,175,30.5,0.588,39,1 116 | 4,146,92,0,0,31.2,0.539,61,1 117 | 5,124,74,0,0,34,0.22,38,1 118 | 5,78,48,0,0,33.7,0.654,25,0 119 | 4,97,60,23,0,28.2,0.443,22,0 120 | 4,99,76,15,51,23.2,0.223,21,0 121 | 0,162,76,56,100,53.2,0.759,25,1 122 | 6,111,64,39,0,34.2,0.26,24,0 123 | 2,107,74,30,100,33.6,0.404,23,0 124 | 5,132,80,0,0,26.8,0.186,69,0 125 | 0,113,76,0,0,33.3,0.278,23,1 126 | 1,88,30,42,99,55,0.496,26,1 127 | 3,120,70,30,135,42.9,0.452,30,0 128 | 1,118,58,36,94,33.3,0.261,23,0 129 | 1,117,88,24,145,34.5,0.403,40,1 130 | 0,105,84,0,0,27.9,0.741,62,1 131 | 4,173,70,14,168,29.7,0.361,33,1 132 | 9,122,56,0,0,33.3,1.114,33,1 133 | 3,170,64,37,225,34.5,0.356,30,1 134 | 8,84,74,31,0,38.3,0.457,39,0 135 | 2,96,68,13,49,21.1,0.647,26,0 136 | 2,125,60,20,140,33.8,0.088,31,0 137 | 0,100,70,26,50,30.8,0.597,21,0 138 | 0,93,60,25,92,28.7,0.532,22,0 139 | 0,129,80,0,0,31.2,0.703,29,0 140 | 5,105,72,29,325,36.9,0.159,28,0 141 | 3,128,78,0,0,21.1,0.268,55,0 142 | 5,106,82,30,0,39.5,0.286,38,0 143 | 2,108,52,26,63,32.5,0.318,22,0 144 | 10,108,66,0,0,32.4,0.272,42,1 145 | 4,154,62,31,284,32.8,0.237,23,0 146 | 0,102,75,23,0,0,0.572,21,0 147 | 9,57,80,37,0,32.8,0.096,41,0 148 | 2,106,64,35,119,30.5,1.4,34,0 149 | 5,147,78,0,0,33.7,0.218,65,0 150 | 2,90,70,17,0,27.3,0.085,22,0 151 | 1,136,74,50,204,37.4,0.399,24,0 152 | 4,114,65,0,0,21.9,0.432,37,0 153 | 9,156,86,28,155,34.3,1.189,42,1 154 | 1,153,82,42,485,40.6,0.687,23,0 155 | 8,188,78,0,0,47.9,0.137,43,1 156 | 7,152,88,44,0,50,0.337,36,1 157 | 2,99,52,15,94,24.6,0.637,21,0 158 | 1,109,56,21,135,25.2,0.833,23,0 159 | 2,88,74,19,53,29,0.229,22,0 160 | 17,163,72,41,114,40.9,0.817,47,1 161 | 4,151,90,38,0,29.7,0.294,36,0 162 | 7,102,74,40,105,37.2,0.204,45,0 163 | 0,114,80,34,285,44.2,0.167,27,0 164 | 2,100,64,23,0,29.7,0.368,21,0 165 | 0,131,88,0,0,31.6,0.743,32,1 166 | 6,104,74,18,156,29.9,0.722,41,1 167 | 3,148,66,25,0,32.5,0.256,22,0 168 | 4,120,68,0,0,29.6,0.709,34,0 169 | 4,110,66,0,0,31.9,0.471,29,0 170 | 3,111,90,12,78,28.4,0.495,29,0 171 | 6,102,82,0,0,30.8,0.18,36,1 172 | 6,134,70,23,130,35.4,0.542,29,1 173 | 2,87,0,23,0,28.9,0.773,25,0 174 | 1,79,60,42,48,43.5,0.678,23,0 175 | 2,75,64,24,55,29.7,0.37,33,0 176 | 8,179,72,42,130,32.7,0.719,36,1 177 | 6,85,78,0,0,31.2,0.382,42,0 178 | 0,129,110,46,130,67.1,0.319,26,1 179 | 5,143,78,0,0,45,0.19,47,0 180 | 5,130,82,0,0,39.1,0.956,37,1 181 | 6,87,80,0,0,23.2,0.084,32,0 182 | 0,119,64,18,92,34.9,0.725,23,0 183 | 1,0,74,20,23,27.7,0.299,21,0 184 | 5,73,60,0,0,26.8,0.268,27,0 185 | 4,141,74,0,0,27.6,0.244,40,0 186 | 7,194,68,28,0,35.9,0.745,41,1 187 | 8,181,68,36,495,30.1,0.615,60,1 188 | 1,128,98,41,58,32,1.321,33,1 189 | 8,109,76,39,114,27.9,0.64,31,1 190 | 5,139,80,35,160,31.6,0.361,25,1 191 | 3,111,62,0,0,22.6,0.142,21,0 192 | 9,123,70,44,94,33.1,0.374,40,0 193 | 7,159,66,0,0,30.4,0.383,36,1 194 | 11,135,0,0,0,52.3,0.578,40,1 195 | 8,85,55,20,0,24.4,0.136,42,0 196 | 5,158,84,41,210,39.4,0.395,29,1 197 | 1,105,58,0,0,24.3,0.187,21,0 198 | 3,107,62,13,48,22.9,0.678,23,1 199 | 4,109,64,44,99,34.8,0.905,26,1 200 | 4,148,60,27,318,30.9,0.15,29,1 201 | 0,113,80,16,0,31,0.874,21,0 202 | 1,138,82,0,0,40.1,0.236,28,0 203 | 0,108,68,20,0,27.3,0.787,32,0 204 | 2,99,70,16,44,20.4,0.235,27,0 205 | 6,103,72,32,190,37.7,0.324,55,0 206 | 5,111,72,28,0,23.9,0.407,27,0 207 | 8,196,76,29,280,37.5,0.605,57,1 208 | 5,162,104,0,0,37.7,0.151,52,1 209 | 1,96,64,27,87,33.2,0.289,21,0 210 | 7,184,84,33,0,35.5,0.355,41,1 211 | 2,81,60,22,0,27.7,0.29,25,0 212 | 0,147,85,54,0,42.8,0.375,24,0 213 | 7,179,95,31,0,34.2,0.164,60,0 214 | 0,140,65,26,130,42.6,0.431,24,1 215 | 9,112,82,32,175,34.2,0.26,36,1 216 | 12,151,70,40,271,41.8,0.742,38,1 217 | 5,109,62,41,129,35.8,0.514,25,1 218 | 6,125,68,30,120,30,0.464,32,0 219 | 5,85,74,22,0,29,1.224,32,1 220 | 5,112,66,0,0,37.8,0.261,41,1 221 | 0,177,60,29,478,34.6,1.072,21,1 222 | 2,158,90,0,0,31.6,0.805,66,1 223 | 7,119,0,0,0,25.2,0.209,37,0 224 | 7,142,60,33,190,28.8,0.687,61,0 225 | 1,100,66,15,56,23.6,0.666,26,0 226 | 1,87,78,27,32,34.6,0.101,22,0 227 | 0,101,76,0,0,35.7,0.198,26,0 228 | 3,162,52,38,0,37.2,0.652,24,1 229 | 4,197,70,39,744,36.7,2.329,31,0 230 | 0,117,80,31,53,45.2,0.089,24,0 231 | 4,142,86,0,0,44,0.645,22,1 232 | 6,134,80,37,370,46.2,0.238,46,1 233 | 1,79,80,25,37,25.4,0.583,22,0 234 | 4,122,68,0,0,35,0.394,29,0 235 | 3,74,68,28,45,29.7,0.293,23,0 236 | 4,171,72,0,0,43.6,0.479,26,1 237 | 7,181,84,21,192,35.9,0.586,51,1 238 | 0,179,90,27,0,44.1,0.686,23,1 239 | 9,164,84,21,0,30.8,0.831,32,1 240 | 0,104,76,0,0,18.4,0.582,27,0 241 | 1,91,64,24,0,29.2,0.192,21,0 242 | 4,91,70,32,88,33.1,0.446,22,0 243 | 3,139,54,0,0,25.6,0.402,22,1 244 | 6,119,50,22,176,27.1,1.318,33,1 245 | 2,146,76,35,194,38.2,0.329,29,0 246 | 9,184,85,15,0,30,1.213,49,1 247 | 10,122,68,0,0,31.2,0.258,41,0 248 | 0,165,90,33,680,52.3,0.427,23,0 249 | 9,124,70,33,402,35.4,0.282,34,0 250 | 1,111,86,19,0,30.1,0.143,23,0 251 | 9,106,52,0,0,31.2,0.38,42,0 252 | 2,129,84,0,0,28,0.284,27,0 253 | 2,90,80,14,55,24.4,0.249,24,0 254 | 0,86,68,32,0,35.8,0.238,25,0 255 | 12,92,62,7,258,27.6,0.926,44,1 256 | 1,113,64,35,0,33.6,0.543,21,1 257 | 3,111,56,39,0,30.1,0.557,30,0 258 | 2,114,68,22,0,28.7,0.092,25,0 259 | 1,193,50,16,375,25.9,0.655,24,0 260 | 11,155,76,28,150,33.3,1.353,51,1 261 | 3,191,68,15,130,30.9,0.299,34,0 262 | 3,141,0,0,0,30,0.761,27,1 263 | 4,95,70,32,0,32.1,0.612,24,0 264 | 3,142,80,15,0,32.4,0.2,63,0 265 | 4,123,62,0,0,32,0.226,35,1 266 | 5,96,74,18,67,33.6,0.997,43,0 267 | 0,138,0,0,0,36.3,0.933,25,1 268 | 2,128,64,42,0,40,1.101,24,0 269 | 0,102,52,0,0,25.1,0.078,21,0 270 | 2,146,0,0,0,27.5,0.24,28,1 271 | 10,101,86,37,0,45.6,1.136,38,1 272 | 2,108,62,32,56,25.2,0.128,21,0 273 | 3,122,78,0,0,23,0.254,40,0 274 | 1,71,78,50,45,33.2,0.422,21,0 275 | 13,106,70,0,0,34.2,0.251,52,0 276 | 2,100,70,52,57,40.5,0.677,25,0 277 | 7,106,60,24,0,26.5,0.296,29,1 278 | 0,104,64,23,116,27.8,0.454,23,0 279 | 5,114,74,0,0,24.9,0.744,57,0 280 | 2,108,62,10,278,25.3,0.881,22,0 281 | 0,146,70,0,0,37.9,0.334,28,1 282 | 10,129,76,28,122,35.9,0.28,39,0 283 | 7,133,88,15,155,32.4,0.262,37,0 284 | 7,161,86,0,0,30.4,0.165,47,1 285 | 2,108,80,0,0,27,0.259,52,1 286 | 7,136,74,26,135,26,0.647,51,0 287 | 5,155,84,44,545,38.7,0.619,34,0 288 | 1,119,86,39,220,45.6,0.808,29,1 289 | 4,96,56,17,49,20.8,0.34,26,0 290 | 5,108,72,43,75,36.1,0.263,33,0 291 | 0,78,88,29,40,36.9,0.434,21,0 292 | 0,107,62,30,74,36.6,0.757,25,1 293 | 2,128,78,37,182,43.3,1.224,31,1 294 | 1,128,48,45,194,40.5,0.613,24,1 295 | 0,161,50,0,0,21.9,0.254,65,0 296 | 6,151,62,31,120,35.5,0.692,28,0 297 | 2,146,70,38,360,28,0.337,29,1 298 | 0,126,84,29,215,30.7,0.52,24,0 299 | 14,100,78,25,184,36.6,0.412,46,1 300 | 8,112,72,0,0,23.6,0.84,58,0 301 | 0,167,0,0,0,32.3,0.839,30,1 302 | 2,144,58,33,135,31.6,0.422,25,1 303 | 5,77,82,41,42,35.8,0.156,35,0 304 | 5,115,98,0,0,52.9,0.209,28,1 305 | 3,150,76,0,0,21,0.207,37,0 306 | 2,120,76,37,105,39.7,0.215,29,0 307 | 10,161,68,23,132,25.5,0.326,47,1 308 | 0,137,68,14,148,24.8,0.143,21,0 309 | 0,128,68,19,180,30.5,1.391,25,1 310 | 2,124,68,28,205,32.9,0.875,30,1 311 | 6,80,66,30,0,26.2,0.313,41,0 312 | 0,106,70,37,148,39.4,0.605,22,0 313 | 2,155,74,17,96,26.6,0.433,27,1 314 | 3,113,50,10,85,29.5,0.626,25,0 315 | 7,109,80,31,0,35.9,1.127,43,1 316 | 2,112,68,22,94,34.1,0.315,26,0 317 | 3,99,80,11,64,19.3,0.284,30,0 318 | 3,182,74,0,0,30.5,0.345,29,1 319 | 3,115,66,39,140,38.1,0.15,28,0 320 | 6,194,78,0,0,23.5,0.129,59,1 321 | 4,129,60,12,231,27.5,0.527,31,0 322 | 3,112,74,30,0,31.6,0.197,25,1 323 | 0,124,70,20,0,27.4,0.254,36,1 324 | 13,152,90,33,29,26.8,0.731,43,1 325 | 2,112,75,32,0,35.7,0.148,21,0 326 | 1,157,72,21,168,25.6,0.123,24,0 327 | 1,122,64,32,156,35.1,0.692,30,1 328 | 10,179,70,0,0,35.1,0.2,37,0 329 | 2,102,86,36,120,45.5,0.127,23,1 330 | 6,105,70,32,68,30.8,0.122,37,0 331 | 8,118,72,19,0,23.1,1.476,46,0 332 | 2,87,58,16,52,32.7,0.166,25,0 333 | 1,180,0,0,0,43.3,0.282,41,1 334 | 12,106,80,0,0,23.6,0.137,44,0 335 | 1,95,60,18,58,23.9,0.26,22,0 336 | 0,165,76,43,255,47.9,0.259,26,0 337 | 0,117,0,0,0,33.8,0.932,44,0 338 | 5,115,76,0,0,31.2,0.343,44,1 339 | 9,152,78,34,171,34.2,0.893,33,1 340 | 7,178,84,0,0,39.9,0.331,41,1 341 | 1,130,70,13,105,25.9,0.472,22,0 342 | 1,95,74,21,73,25.9,0.673,36,0 343 | 1,0,68,35,0,32,0.389,22,0 344 | 5,122,86,0,0,34.7,0.29,33,0 345 | 8,95,72,0,0,36.8,0.485,57,0 346 | 8,126,88,36,108,38.5,0.349,49,0 347 | 1,139,46,19,83,28.7,0.654,22,0 348 | 3,116,0,0,0,23.5,0.187,23,0 349 | 3,99,62,19,74,21.8,0.279,26,0 350 | 5,0,80,32,0,41,0.346,37,1 351 | 4,92,80,0,0,42.2,0.237,29,0 352 | 4,137,84,0,0,31.2,0.252,30,0 353 | 3,61,82,28,0,34.4,0.243,46,0 354 | 1,90,62,12,43,27.2,0.58,24,0 355 | 3,90,78,0,0,42.7,0.559,21,0 356 | 9,165,88,0,0,30.4,0.302,49,1 357 | 1,125,50,40,167,33.3,0.962,28,1 358 | 13,129,0,30,0,39.9,0.569,44,1 359 | 12,88,74,40,54,35.3,0.378,48,0 360 | 1,196,76,36,249,36.5,0.875,29,1 361 | 5,189,64,33,325,31.2,0.583,29,1 362 | 5,158,70,0,0,29.8,0.207,63,0 363 | 5,103,108,37,0,39.2,0.305,65,0 364 | 4,146,78,0,0,38.5,0.52,67,1 365 | 4,147,74,25,293,34.9,0.385,30,0 366 | 5,99,54,28,83,34,0.499,30,0 367 | 6,124,72,0,0,27.6,0.368,29,1 368 | 0,101,64,17,0,21,0.252,21,0 369 | 3,81,86,16,66,27.5,0.306,22,0 370 | 1,133,102,28,140,32.8,0.234,45,1 371 | 3,173,82,48,465,38.4,2.137,25,1 372 | 0,118,64,23,89,0,1.731,21,0 373 | 0,84,64,22,66,35.8,0.545,21,0 374 | 2,105,58,40,94,34.9,0.225,25,0 375 | 2,122,52,43,158,36.2,0.816,28,0 376 | 12,140,82,43,325,39.2,0.528,58,1 377 | 0,98,82,15,84,25.2,0.299,22,0 378 | 1,87,60,37,75,37.2,0.509,22,0 379 | 4,156,75,0,0,48.3,0.238,32,1 380 | 0,93,100,39,72,43.4,1.021,35,0 381 | 1,107,72,30,82,30.8,0.821,24,0 382 | 0,105,68,22,0,20,0.236,22,0 383 | 1,109,60,8,182,25.4,0.947,21,0 384 | 1,90,62,18,59,25.1,1.268,25,0 385 | 1,125,70,24,110,24.3,0.221,25,0 386 | 1,119,54,13,50,22.3,0.205,24,0 387 | 5,116,74,29,0,32.3,0.66,35,1 388 | 8,105,100,36,0,43.3,0.239,45,1 389 | 5,144,82,26,285,32,0.452,58,1 390 | 3,100,68,23,81,31.6,0.949,28,0 391 | 1,100,66,29,196,32,0.444,42,0 392 | 5,166,76,0,0,45.7,0.34,27,1 393 | 1,131,64,14,415,23.7,0.389,21,0 394 | 4,116,72,12,87,22.1,0.463,37,0 395 | 4,158,78,0,0,32.9,0.803,31,1 396 | 2,127,58,24,275,27.7,1.6,25,0 397 | 3,96,56,34,115,24.7,0.944,39,0 398 | 0,131,66,40,0,34.3,0.196,22,1 399 | 3,82,70,0,0,21.1,0.389,25,0 400 | 3,193,70,31,0,34.9,0.241,25,1 401 | 4,95,64,0,0,32,0.161,31,1 402 | 6,137,61,0,0,24.2,0.151,55,0 403 | 5,136,84,41,88,35,0.286,35,1 404 | 9,72,78,25,0,31.6,0.28,38,0 405 | 5,168,64,0,0,32.9,0.135,41,1 406 | 2,123,48,32,165,42.1,0.52,26,0 407 | 4,115,72,0,0,28.9,0.376,46,1 408 | 0,101,62,0,0,21.9,0.336,25,0 409 | 8,197,74,0,0,25.9,1.191,39,1 410 | 1,172,68,49,579,42.4,0.702,28,1 411 | 6,102,90,39,0,35.7,0.674,28,0 412 | 1,112,72,30,176,34.4,0.528,25,0 413 | 1,143,84,23,310,42.4,1.076,22,0 414 | 1,143,74,22,61,26.2,0.256,21,0 415 | 0,138,60,35,167,34.6,0.534,21,1 416 | 3,173,84,33,474,35.7,0.258,22,1 417 | 1,97,68,21,0,27.2,1.095,22,0 418 | 4,144,82,32,0,38.5,0.554,37,1 419 | 1,83,68,0,0,18.2,0.624,27,0 420 | 3,129,64,29,115,26.4,0.219,28,1 421 | 1,119,88,41,170,45.3,0.507,26,0 422 | 2,94,68,18,76,26,0.561,21,0 423 | 0,102,64,46,78,40.6,0.496,21,0 424 | 2,115,64,22,0,30.8,0.421,21,0 425 | 8,151,78,32,210,42.9,0.516,36,1 426 | 4,184,78,39,277,37,0.264,31,1 427 | 0,94,0,0,0,0,0.256,25,0 428 | 1,181,64,30,180,34.1,0.328,38,1 429 | 0,135,94,46,145,40.6,0.284,26,0 430 | 1,95,82,25,180,35,0.233,43,1 431 | 2,99,0,0,0,22.2,0.108,23,0 432 | 3,89,74,16,85,30.4,0.551,38,0 433 | 1,80,74,11,60,30,0.527,22,0 434 | 2,139,75,0,0,25.6,0.167,29,0 435 | 1,90,68,8,0,24.5,1.138,36,0 436 | 0,141,0,0,0,42.4,0.205,29,1 437 | 12,140,85,33,0,37.4,0.244,41,0 438 | 5,147,75,0,0,29.9,0.434,28,0 439 | 1,97,70,15,0,18.2,0.147,21,0 440 | 6,107,88,0,0,36.8,0.727,31,0 441 | 0,189,104,25,0,34.3,0.435,41,1 442 | 2,83,66,23,50,32.2,0.497,22,0 443 | 4,117,64,27,120,33.2,0.23,24,0 444 | 8,108,70,0,0,30.5,0.955,33,1 445 | 4,117,62,12,0,29.7,0.38,30,1 446 | 0,180,78,63,14,59.4,2.42,25,1 447 | 1,100,72,12,70,25.3,0.658,28,0 448 | 0,95,80,45,92,36.5,0.33,26,0 449 | 0,104,64,37,64,33.6,0.51,22,1 450 | 0,120,74,18,63,30.5,0.285,26,0 451 | 1,82,64,13,95,21.2,0.415,23,0 452 | 2,134,70,0,0,28.9,0.542,23,1 453 | 0,91,68,32,210,39.9,0.381,25,0 454 | 2,119,0,0,0,19.6,0.832,72,0 455 | 2,100,54,28,105,37.8,0.498,24,0 456 | 14,175,62,30,0,33.6,0.212,38,1 457 | 1,135,54,0,0,26.7,0.687,62,0 458 | 5,86,68,28,71,30.2,0.364,24,0 459 | 10,148,84,48,237,37.6,1.001,51,1 460 | 9,134,74,33,60,25.9,0.46,81,0 461 | 9,120,72,22,56,20.8,0.733,48,0 462 | 1,71,62,0,0,21.8,0.416,26,0 463 | 8,74,70,40,49,35.3,0.705,39,0 464 | 5,88,78,30,0,27.6,0.258,37,0 465 | 10,115,98,0,0,24,1.022,34,0 466 | 0,124,56,13,105,21.8,0.452,21,0 467 | 0,74,52,10,36,27.8,0.269,22,0 468 | 0,97,64,36,100,36.8,0.6,25,0 469 | 8,120,0,0,0,30,0.183,38,1 470 | 6,154,78,41,140,46.1,0.571,27,0 471 | 1,144,82,40,0,41.3,0.607,28,0 472 | 0,137,70,38,0,33.2,0.17,22,0 473 | 0,119,66,27,0,38.8,0.259,22,0 474 | 7,136,90,0,0,29.9,0.21,50,0 475 | 4,114,64,0,0,28.9,0.126,24,0 476 | 0,137,84,27,0,27.3,0.231,59,0 477 | 2,105,80,45,191,33.7,0.711,29,1 478 | 7,114,76,17,110,23.8,0.466,31,0 479 | 8,126,74,38,75,25.9,0.162,39,0 480 | 4,132,86,31,0,28,0.419,63,0 481 | 3,158,70,30,328,35.5,0.344,35,1 482 | 0,123,88,37,0,35.2,0.197,29,0 483 | 4,85,58,22,49,27.8,0.306,28,0 484 | 0,84,82,31,125,38.2,0.233,23,0 485 | 0,145,0,0,0,44.2,0.63,31,1 486 | 0,135,68,42,250,42.3,0.365,24,1 487 | 1,139,62,41,480,40.7,0.536,21,0 488 | 0,173,78,32,265,46.5,1.159,58,0 489 | 4,99,72,17,0,25.6,0.294,28,0 490 | 8,194,80,0,0,26.1,0.551,67,0 491 | 2,83,65,28,66,36.8,0.629,24,0 492 | 2,89,90,30,0,33.5,0.292,42,0 493 | 4,99,68,38,0,32.8,0.145,33,0 494 | 4,125,70,18,122,28.9,1.144,45,1 495 | 3,80,0,0,0,0,0.174,22,0 496 | 6,166,74,0,0,26.6,0.304,66,0 497 | 5,110,68,0,0,26,0.292,30,0 498 | 2,81,72,15,76,30.1,0.547,25,0 499 | 7,195,70,33,145,25.1,0.163,55,1 500 | 6,154,74,32,193,29.3,0.839,39,0 501 | 2,117,90,19,71,25.2,0.313,21,0 502 | 3,84,72,32,0,37.2,0.267,28,0 503 | 6,0,68,41,0,39,0.727,41,1 504 | 7,94,64,25,79,33.3,0.738,41,0 505 | 3,96,78,39,0,37.3,0.238,40,0 506 | 10,75,82,0,0,33.3,0.263,38,0 507 | 0,180,90,26,90,36.5,0.314,35,1 508 | 1,130,60,23,170,28.6,0.692,21,0 509 | 2,84,50,23,76,30.4,0.968,21,0 510 | 8,120,78,0,0,25,0.409,64,0 511 | 12,84,72,31,0,29.7,0.297,46,1 512 | 0,139,62,17,210,22.1,0.207,21,0 513 | 9,91,68,0,0,24.2,0.2,58,0 514 | 2,91,62,0,0,27.3,0.525,22,0 515 | 3,99,54,19,86,25.6,0.154,24,0 516 | 3,163,70,18,105,31.6,0.268,28,1 517 | 9,145,88,34,165,30.3,0.771,53,1 518 | 7,125,86,0,0,37.6,0.304,51,0 519 | 13,76,60,0,0,32.8,0.18,41,0 520 | 6,129,90,7,326,19.6,0.582,60,0 521 | 2,68,70,32,66,25,0.187,25,0 522 | 3,124,80,33,130,33.2,0.305,26,0 523 | 6,114,0,0,0,0,0.189,26,0 524 | 9,130,70,0,0,34.2,0.652,45,1 525 | 3,125,58,0,0,31.6,0.151,24,0 526 | 3,87,60,18,0,21.8,0.444,21,0 527 | 1,97,64,19,82,18.2,0.299,21,0 528 | 3,116,74,15,105,26.3,0.107,24,0 529 | 0,117,66,31,188,30.8,0.493,22,0 530 | 0,111,65,0,0,24.6,0.66,31,0 531 | 2,122,60,18,106,29.8,0.717,22,0 532 | 0,107,76,0,0,45.3,0.686,24,0 533 | 1,86,66,52,65,41.3,0.917,29,0 534 | 6,91,0,0,0,29.8,0.501,31,0 535 | 1,77,56,30,56,33.3,1.251,24,0 536 | 4,132,0,0,0,32.9,0.302,23,1 537 | 0,105,90,0,0,29.6,0.197,46,0 538 | 0,57,60,0,0,21.7,0.735,67,0 539 | 0,127,80,37,210,36.3,0.804,23,0 540 | 3,129,92,49,155,36.4,0.968,32,1 541 | 8,100,74,40,215,39.4,0.661,43,1 542 | 3,128,72,25,190,32.4,0.549,27,1 543 | 10,90,85,32,0,34.9,0.825,56,1 544 | 4,84,90,23,56,39.5,0.159,25,0 545 | 1,88,78,29,76,32,0.365,29,0 546 | 8,186,90,35,225,34.5,0.423,37,1 547 | 5,187,76,27,207,43.6,1.034,53,1 548 | 4,131,68,21,166,33.1,0.16,28,0 549 | 1,164,82,43,67,32.8,0.341,50,0 550 | 4,189,110,31,0,28.5,0.68,37,0 551 | 1,116,70,28,0,27.4,0.204,21,0 552 | 3,84,68,30,106,31.9,0.591,25,0 553 | 6,114,88,0,0,27.8,0.247,66,0 554 | 1,88,62,24,44,29.9,0.422,23,0 555 | 1,84,64,23,115,36.9,0.471,28,0 556 | 7,124,70,33,215,25.5,0.161,37,0 557 | 1,97,70,40,0,38.1,0.218,30,0 558 | 8,110,76,0,0,27.8,0.237,58,0 559 | 11,103,68,40,0,46.2,0.126,42,0 560 | 11,85,74,0,0,30.1,0.3,35,0 561 | 6,125,76,0,0,33.8,0.121,54,1 562 | 0,198,66,32,274,41.3,0.502,28,1 563 | 1,87,68,34,77,37.6,0.401,24,0 564 | 6,99,60,19,54,26.9,0.497,32,0 565 | 0,91,80,0,0,32.4,0.601,27,0 566 | 2,95,54,14,88,26.1,0.748,22,0 567 | 1,99,72,30,18,38.6,0.412,21,0 568 | 6,92,62,32,126,32,0.085,46,0 569 | 4,154,72,29,126,31.3,0.338,37,0 570 | 0,121,66,30,165,34.3,0.203,33,1 571 | 3,78,70,0,0,32.5,0.27,39,0 572 | 2,130,96,0,0,22.6,0.268,21,0 573 | 3,111,58,31,44,29.5,0.43,22,0 574 | 2,98,60,17,120,34.7,0.198,22,0 575 | 1,143,86,30,330,30.1,0.892,23,0 576 | 1,119,44,47,63,35.5,0.28,25,0 577 | 6,108,44,20,130,24,0.813,35,0 578 | 2,118,80,0,0,42.9,0.693,21,1 579 | 10,133,68,0,0,27,0.245,36,0 580 | 2,197,70,99,0,34.7,0.575,62,1 581 | 0,151,90,46,0,42.1,0.371,21,1 582 | 6,109,60,27,0,25,0.206,27,0 583 | 12,121,78,17,0,26.5,0.259,62,0 584 | 8,100,76,0,0,38.7,0.19,42,0 585 | 8,124,76,24,600,28.7,0.687,52,1 586 | 1,93,56,11,0,22.5,0.417,22,0 587 | 8,143,66,0,0,34.9,0.129,41,1 588 | 6,103,66,0,0,24.3,0.249,29,0 589 | 3,176,86,27,156,33.3,1.154,52,1 590 | 0,73,0,0,0,21.1,0.342,25,0 591 | 11,111,84,40,0,46.8,0.925,45,1 592 | 2,112,78,50,140,39.4,0.175,24,0 593 | 3,132,80,0,0,34.4,0.402,44,1 594 | 2,82,52,22,115,28.5,1.699,25,0 595 | 6,123,72,45,230,33.6,0.733,34,0 596 | 0,188,82,14,185,32,0.682,22,1 597 | 0,67,76,0,0,45.3,0.194,46,0 598 | 1,89,24,19,25,27.8,0.559,21,0 599 | 1,173,74,0,0,36.8,0.088,38,1 600 | 1,109,38,18,120,23.1,0.407,26,0 601 | 1,108,88,19,0,27.1,0.4,24,0 602 | 6,96,0,0,0,23.7,0.19,28,0 603 | 1,124,74,36,0,27.8,0.1,30,0 604 | 7,150,78,29,126,35.2,0.692,54,1 605 | 4,183,0,0,0,28.4,0.212,36,1 606 | 1,124,60,32,0,35.8,0.514,21,0 607 | 1,181,78,42,293,40,1.258,22,1 608 | 1,92,62,25,41,19.5,0.482,25,0 609 | 0,152,82,39,272,41.5,0.27,27,0 610 | 1,111,62,13,182,24,0.138,23,0 611 | 3,106,54,21,158,30.9,0.292,24,0 612 | 3,174,58,22,194,32.9,0.593,36,1 613 | 7,168,88,42,321,38.2,0.787,40,1 614 | 6,105,80,28,0,32.5,0.878,26,0 615 | 11,138,74,26,144,36.1,0.557,50,1 616 | 3,106,72,0,0,25.8,0.207,27,0 617 | 6,117,96,0,0,28.7,0.157,30,0 618 | 2,68,62,13,15,20.1,0.257,23,0 619 | 9,112,82,24,0,28.2,1.282,50,1 620 | 0,119,0,0,0,32.4,0.141,24,1 621 | 2,112,86,42,160,38.4,0.246,28,0 622 | 2,92,76,20,0,24.2,1.698,28,0 623 | 6,183,94,0,0,40.8,1.461,45,0 624 | 0,94,70,27,115,43.5,0.347,21,0 625 | 2,108,64,0,0,30.8,0.158,21,0 626 | 4,90,88,47,54,37.7,0.362,29,0 627 | 0,125,68,0,0,24.7,0.206,21,0 628 | 0,132,78,0,0,32.4,0.393,21,0 629 | 5,128,80,0,0,34.6,0.144,45,0 630 | 4,94,65,22,0,24.7,0.148,21,0 631 | 7,114,64,0,0,27.4,0.732,34,1 632 | 0,102,78,40,90,34.5,0.238,24,0 633 | 2,111,60,0,0,26.2,0.343,23,0 634 | 1,128,82,17,183,27.5,0.115,22,0 635 | 10,92,62,0,0,25.9,0.167,31,0 636 | 13,104,72,0,0,31.2,0.465,38,1 637 | 5,104,74,0,0,28.8,0.153,48,0 638 | 2,94,76,18,66,31.6,0.649,23,0 639 | 7,97,76,32,91,40.9,0.871,32,1 640 | 1,100,74,12,46,19.5,0.149,28,0 641 | 0,102,86,17,105,29.3,0.695,27,0 642 | 4,128,70,0,0,34.3,0.303,24,0 643 | 6,147,80,0,0,29.5,0.178,50,1 644 | 4,90,0,0,0,28,0.61,31,0 645 | 3,103,72,30,152,27.6,0.73,27,0 646 | 2,157,74,35,440,39.4,0.134,30,0 647 | 1,167,74,17,144,23.4,0.447,33,1 648 | 0,179,50,36,159,37.8,0.455,22,1 649 | 11,136,84,35,130,28.3,0.26,42,1 650 | 0,107,60,25,0,26.4,0.133,23,0 651 | 1,91,54,25,100,25.2,0.234,23,0 652 | 1,117,60,23,106,33.8,0.466,27,0 653 | 5,123,74,40,77,34.1,0.269,28,0 654 | 2,120,54,0,0,26.8,0.455,27,0 655 | 1,106,70,28,135,34.2,0.142,22,0 656 | 2,155,52,27,540,38.7,0.24,25,1 657 | 2,101,58,35,90,21.8,0.155,22,0 658 | 1,120,80,48,200,38.9,1.162,41,0 659 | 11,127,106,0,0,39,0.19,51,0 660 | 3,80,82,31,70,34.2,1.292,27,1 661 | 10,162,84,0,0,27.7,0.182,54,0 662 | 1,199,76,43,0,42.9,1.394,22,1 663 | 8,167,106,46,231,37.6,0.165,43,1 664 | 9,145,80,46,130,37.9,0.637,40,1 665 | 6,115,60,39,0,33.7,0.245,40,1 666 | 1,112,80,45,132,34.8,0.217,24,0 667 | 4,145,82,18,0,32.5,0.235,70,1 668 | 10,111,70,27,0,27.5,0.141,40,1 669 | 6,98,58,33,190,34,0.43,43,0 670 | 9,154,78,30,100,30.9,0.164,45,0 671 | 6,165,68,26,168,33.6,0.631,49,0 672 | 1,99,58,10,0,25.4,0.551,21,0 673 | 10,68,106,23,49,35.5,0.285,47,0 674 | 3,123,100,35,240,57.3,0.88,22,0 675 | 8,91,82,0,0,35.6,0.587,68,0 676 | 6,195,70,0,0,30.9,0.328,31,1 677 | 9,156,86,0,0,24.8,0.23,53,1 678 | 0,93,60,0,0,35.3,0.263,25,0 679 | 3,121,52,0,0,36,0.127,25,1 680 | 2,101,58,17,265,24.2,0.614,23,0 681 | 2,56,56,28,45,24.2,0.332,22,0 682 | 0,162,76,36,0,49.6,0.364,26,1 683 | 0,95,64,39,105,44.6,0.366,22,0 684 | 4,125,80,0,0,32.3,0.536,27,1 685 | 5,136,82,0,0,0,0.64,69,0 686 | 2,129,74,26,205,33.2,0.591,25,0 687 | 3,130,64,0,0,23.1,0.314,22,0 688 | 1,107,50,19,0,28.3,0.181,29,0 689 | 1,140,74,26,180,24.1,0.828,23,0 690 | 1,144,82,46,180,46.1,0.335,46,1 691 | 8,107,80,0,0,24.6,0.856,34,0 692 | 13,158,114,0,0,42.3,0.257,44,1 693 | 2,121,70,32,95,39.1,0.886,23,0 694 | 7,129,68,49,125,38.5,0.439,43,1 695 | 2,90,60,0,0,23.5,0.191,25,0 696 | 7,142,90,24,480,30.4,0.128,43,1 697 | 3,169,74,19,125,29.9,0.268,31,1 698 | 0,99,0,0,0,25,0.253,22,0 699 | 4,127,88,11,155,34.5,0.598,28,0 700 | 4,118,70,0,0,44.5,0.904,26,0 701 | 2,122,76,27,200,35.9,0.483,26,0 702 | 6,125,78,31,0,27.6,0.565,49,1 703 | 1,168,88,29,0,35,0.905,52,1 704 | 2,129,0,0,0,38.5,0.304,41,0 705 | 4,110,76,20,100,28.4,0.118,27,0 706 | 6,80,80,36,0,39.8,0.177,28,0 707 | 10,115,0,0,0,0,0.261,30,1 708 | 2,127,46,21,335,34.4,0.176,22,0 709 | 9,164,78,0,0,32.8,0.148,45,1 710 | 2,93,64,32,160,38,0.674,23,1 711 | 3,158,64,13,387,31.2,0.295,24,0 712 | 5,126,78,27,22,29.6,0.439,40,0 713 | 10,129,62,36,0,41.2,0.441,38,1 714 | 0,134,58,20,291,26.4,0.352,21,0 715 | 3,102,74,0,0,29.5,0.121,32,0 716 | 7,187,50,33,392,33.9,0.826,34,1 717 | 3,173,78,39,185,33.8,0.97,31,1 718 | 10,94,72,18,0,23.1,0.595,56,0 719 | 1,108,60,46,178,35.5,0.415,24,0 720 | 5,97,76,27,0,35.6,0.378,52,1 721 | 4,83,86,19,0,29.3,0.317,34,0 722 | 1,114,66,36,200,38.1,0.289,21,0 723 | 1,149,68,29,127,29.3,0.349,42,1 724 | 5,117,86,30,105,39.1,0.251,42,0 725 | 1,111,94,0,0,32.8,0.265,45,0 726 | 4,112,78,40,0,39.4,0.236,38,0 727 | 1,116,78,29,180,36.1,0.496,25,0 728 | 0,141,84,26,0,32.4,0.433,22,0 729 | 2,175,88,0,0,22.9,0.326,22,0 730 | 2,92,52,0,0,30.1,0.141,22,0 731 | 3,130,78,23,79,28.4,0.323,34,1 732 | 8,120,86,0,0,28.4,0.259,22,1 733 | 2,174,88,37,120,44.5,0.646,24,1 734 | 2,106,56,27,165,29,0.426,22,0 735 | 2,105,75,0,0,23.3,0.56,53,0 736 | 4,95,60,32,0,35.4,0.284,28,0 737 | 0,126,86,27,120,27.4,0.515,21,0 738 | 8,65,72,23,0,32,0.6,42,0 739 | 2,99,60,17,160,36.6,0.453,21,0 740 | 1,102,74,0,0,39.5,0.293,42,1 741 | 11,120,80,37,150,42.3,0.785,48,1 742 | 3,102,44,20,94,30.8,0.4,26,0 743 | 1,109,58,18,116,28.5,0.219,22,0 744 | 9,140,94,0,0,32.7,0.734,45,1 745 | 13,153,88,37,140,40.6,1.174,39,0 746 | 12,100,84,33,105,30,0.488,46,0 747 | 1,147,94,41,0,49.3,0.358,27,1 748 | 1,81,74,41,57,46.3,1.096,32,0 749 | 3,187,70,22,200,36.4,0.408,36,1 750 | 6,162,62,0,0,24.3,0.178,50,1 751 | 4,136,70,0,0,31.2,1.182,22,1 752 | 1,121,78,39,74,39,0.261,28,0 753 | 3,108,62,24,0,26,0.223,25,0 754 | 0,181,88,44,510,43.3,0.222,26,1 755 | 8,154,78,32,0,32.4,0.443,45,1 756 | 1,128,88,39,110,36.5,1.057,37,1 757 | 7,137,90,41,0,32,0.391,39,0 758 | 0,123,72,0,0,36.3,0.258,52,1 759 | 1,106,76,0,0,37.5,0.197,26,0 760 | 6,190,92,0,0,35.5,0.278,66,1 761 | 2,88,58,26,16,28.4,0.766,22,0 762 | 9,170,74,31,0,44,0.403,43,1 763 | 9,89,62,0,0,22.5,0.142,33,0 764 | 10,101,76,48,180,32.9,0.171,63,0 765 | 2,122,70,27,0,36.8,0.34,27,0 766 | 5,121,72,23,112,26.2,0.245,30,0 767 | 1,126,60,0,0,30.1,0.349,47,1 768 | 1,93,70,31,0,30.4,0.315,23,0 769 | -------------------------------------------------------------------------------- /images/01_clustering.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/01_clustering.png -------------------------------------------------------------------------------- /images/01_robot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/01_robot.png -------------------------------------------------------------------------------- /images/01_spam_filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/01_spam_filter.png -------------------------------------------------------------------------------- /images/01_supervised_learning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/01_supervised_learning.png -------------------------------------------------------------------------------- /images/02_ipython_header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/02_ipython_header.png -------------------------------------------------------------------------------- /images/02_jupyter_logo.svg: -------------------------------------------------------------------------------- 1 | 2 | logo.svg 3 | Created using Figma 0.90 4 | 5 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | -------------------------------------------------------------------------------- /images/02_sklearn_algorithms.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/02_sklearn_algorithms.png -------------------------------------------------------------------------------- /images/02_sklearn_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/02_sklearn_logo.png -------------------------------------------------------------------------------- /images/03_iris.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/03_iris.png -------------------------------------------------------------------------------- /images/04_1nn_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/04_1nn_map.png -------------------------------------------------------------------------------- /images/04_5nn_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/04_5nn_map.png -------------------------------------------------------------------------------- /images/04_knn_dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/04_knn_dataset.png -------------------------------------------------------------------------------- /images/05_overfitting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/05_overfitting.png -------------------------------------------------------------------------------- /images/05_train_test_split.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/05_train_test_split.png -------------------------------------------------------------------------------- /images/07_cross_validation_diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/07_cross_validation_diagram.png -------------------------------------------------------------------------------- /images/09_confusion_matrix_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/09_confusion_matrix_1.png -------------------------------------------------------------------------------- /images/09_confusion_matrix_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/09_confusion_matrix_2.png -------------------------------------------------------------------------------- /images/youtube.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/8545c74961398def7724501648fd504dbf061b41/images/youtube.png -------------------------------------------------------------------------------- /styles/custom.css: -------------------------------------------------------------------------------- 1 | 53 | --------------------------------------------------------------------------------