├── Building_Complex_Deep_Learning_Models_Using_Keras_Functional_API.ipynb ├── Deploying-Yolo3-Model-on-FastAPI ├── README.md ├── client.ipynb ├── images │ ├── apple.jpg │ ├── apples.jpg │ ├── car.jpg │ ├── car1.jpg │ ├── car2.jpg │ ├── car3.jpg │ ├── clock.jpg │ ├── clock2.jpg │ ├── clock3.jpg │ ├── fruits.jpg │ ├── oranges.jpg │ └── readme.md ├── server.ipynb └── token.png ├── How To Split The Data Effectively for Your Data Science Project.ipynb ├── How to Find the Optimal Number of Clusters Effectively.ipynb ├── Practical Guide to Support Vector Machines in Python .ipynb ├── Readme.md ├── ml.jpg └── practical-guide-to-dimesnioality-reduction-in-pyth.ipynb /Deploying-Yolo3-Model-on-FastAPI/README.md: -------------------------------------------------------------------------------- 1 | # Ungraded Lab - Deploying a Deep Learning model 2 | 3 | ## Introduction 4 | During this code you will go through the process of deploying an already trained Deep Learning model. To do so, we will take advantage of the user-friendly library fastAPI that provides a nice REST API framework. 5 | 6 | This tutorial is specifically designed to run locally on your machine. This can be done via 2 methods: using `Python Virtual Environments` or using `Docker`. 7 | 8 | Both approaches should yield the same result. If you already have a conda installation available on your computer, we recommend that you use the virtual environment method. If this is not the case, choose the Docker method as it is easier to set up. 9 | 10 | As a general note, the commands in this tutorial are meant to be run within a terminal. To begin you need to **clone this repo in your local filesystem and `cd` to the deployment_tutorial directory**. 11 | 12 | To clone the repo use this command: 13 | ```bash 14 | git clone https://github.com/youssefHosni/Practical-Machine-Learning.git 15 | ``` 16 | 17 | or for cloning via SSH use: 18 | ```bash 19 | git clone git@github.com:youssefHosni/Practical-Machine-Learning.git 20 | ``` 21 | 22 | If you are unsure which method to use for cloning, use the first one. 23 | 24 | The `cd` command allows you to change directories. Assuming you are at the directory where you issued the cloning command, type the following on your terminal. 25 | ```bash 26 | cd working_directory/deployment_tutorial 27 | ``` 28 | 29 | ## Method 1: Python Virtual Environment with Conda 30 | 31 | ### Prerequisites: Have [conda](https://docs.conda.io/en/latest/) installed on your local machine. 32 | 33 | You will use Conda as an environment management system so that all the dependencies you need for this tutorial are stored in an isolated environment. 34 | 35 | Conda includes a lot of libraries so if you are only installing it to complete this lab , we suggest using [miniconda](https://docs.conda.io/en/latest/miniconda.html), which is a minimal version of conda. 36 | 37 | ### 1. Creating a virtual Environment 38 | 39 | Now we assume that you either successfully installed conda or that it was previously available in your system. The first step is creating a new developing environment. Let's set a new environment with python 3.8 with this command: 40 | 41 | ```bash 42 | conda create --name deployment-tutorial python=3.8 43 | ``` 44 | 45 | After successfully creating the environment, you need to activate it by issuing this command: 46 | 47 | ```bash 48 | conda activate deployment-tutorial 49 | ``` 50 | 51 | At this point, you will do all your libraries installation and work in this environment. 52 | 53 | ### 2. Installing dependencies using PIP 54 | 55 | Now use the following command to install the required dependencies: 56 | 57 | ```bash 58 | pip install -r requirements.txt 59 | ``` 60 | 61 | This command can take a while to run depending on the speed of your internet connection. Once this step completes you should be ready to spin up jupyter lab and begin working on the ungraded lab. 62 | 63 | ### 3. Launching Jupyter Lab 64 | 65 | Jupyter lab was installed during the previous step so you can launch it with this command: 66 | ```bash 67 | jupyter lab 68 | ``` 69 | After execution, you will see some information printed on the terminal. Usually you will need to authenticate to use Jupyter lab. For this, copy the token that appears on your terminal, head over to [http://localhost:8888/](http://localhost:8888/) and paste it there. Your terminal's output should look very similar to the next image, in which the token has been highlighted for reference: 70 | 71 | 72 | ![token](https://user-images.githubusercontent.com/72076328/219001245-ad9cc6d9-aaf7-4495-a08a-4eb92711c35d.png) 73 | 74 | 75 | ### 4. Running the notebook 76 | 77 | Within Jupyter lab you should be in the same directory where you used the `jupyter lab` command. 78 | 79 | Look for the `server.ipynb` file and open it to begin to run it. 80 | 81 | To stop jupyter lab once you are done with the lab just press `Ctrl + C` twice. 82 | 83 | ### And... that's it! Have fun deploying a Deep Learning model! :) 84 | -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/client.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "cfaa72cf", 6 | "metadata": {}, 7 | "source": [ 8 | "# Ungraded Lab Part 2 - Consuming a Machine Learning Model\n", 9 | "\n", 10 | "Welcome to the second part of this ungraded lab! \n", 11 | "**Before going forward check that the server from part 1 is still running.**\n", 12 | "\n", 13 | "In this notebook you will code a minimal client that uses Python's `requests` library to interact with your running server." 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "id": "dfda4466", 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import os\n", 24 | "import io\n", 25 | "import cv2\n", 26 | "import requests\n", 27 | "import numpy as np\n", 28 | "from IPython.display import Image, display" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "id": "6dfdaeba", 34 | "metadata": {}, 35 | "source": [ 36 | "## Understanding the URL\n", 37 | "\n", 38 | "\n", 39 | "### Breaking down the URL\n", 40 | "\n", 41 | "After experimenting with the fastAPI's client you may have noticed that we made all requests by pointing to a specific URL and appending some parameters to it.\n", 42 | "\n", 43 | "More concretely:\n", 44 | "\n", 45 | "1. The server is hosted in the URL [http://localhost:8000/](http://localhost:8000/).\n", 46 | "2. The endpoint that serves your model is the `/predict` endpoint.\n", 47 | "\n", 48 | "Also you can specify the model to use: `yolov3` or`yolov3-tiny`. Let's stick to the tiny version for computational efficiency.\n", 49 | "\n", 50 | "Let's get started by putting in place all this information." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "id": "1d3d6b7f", 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "base_url = 'http://localhost:8000'\n", 61 | "endpoint = '/predict'\n", 62 | "model = 'yolov3-tiny'" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "id": "b23e512c", 68 | "metadata": {}, 69 | "source": [ 70 | "To consume your model, you append the endpoint to the base URL to get the full URL. Notice that the parameters are absent for now." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "id": "fcfa95e6", 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "'http://localhost:8000/predict'" 83 | ] 84 | }, 85 | "execution_count": 3, 86 | "metadata": {}, 87 | "output_type": "execute_result" 88 | } 89 | ], 90 | "source": [ 91 | "url_with_endpoint_no_params = base_url + endpoint\n", 92 | "url_with_endpoint_no_params" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "id": "d83a5e1b", 98 | "metadata": {}, 99 | "source": [ 100 | "To set any of the expected parameters, the syntax is to add a \"?\" character followed by the name of the parameter and its value.\n", 101 | "\n", 102 | "Let's do it and check how the final URL looks like:" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "id": "c6b979b7", 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "'http://localhost:8000/predict?model=yolov3-tiny'" 115 | ] 116 | }, 117 | "execution_count": 4, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "full_url = url_with_endpoint_no_params + \"?model=\" + model\n", 124 | "full_url" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "id": "654db3fc", 130 | "metadata": {}, 131 | "source": [ 132 | "This endpoint expects both a model's name and an image. But since the image is more complex it is not passed within the URL. Instead we leverage the `requests` library to handle this process.\n", 133 | "\n", 134 | "# Sending a request to your server\n", 135 | "\n", 136 | "### Coding the response_from_server function\n", 137 | "\n", 138 | "As a reminder, this endpoint expects a POST HTTP request. The `post` function is part of the requests library. \n", 139 | "\n", 140 | "To pass the file along with the request, you need to create a dictionary indicating the name of the file ('file' in this case) and the actual file.\n", 141 | "\n", 142 | " `status code` is a handy command to check the status of the response the request triggered. **A status code of 200 means that everything went well.**" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 5, 148 | "id": "8ab203e1", 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "def response_from_server(url, image_file, verbose=True):\n", 153 | " \"\"\"Makes a POST request to the server and returns the response.\n", 154 | "\n", 155 | " Args:\n", 156 | " url (str): URL that the request is sent to.\n", 157 | " image_file (_io.BufferedReader): File to upload, should be an image.\n", 158 | " verbose (bool): True if the status of the response should be printed. False otherwise.\n", 159 | "\n", 160 | " Returns:\n", 161 | " requests.models.Response: Response from the server.\n", 162 | " \"\"\"\n", 163 | " \n", 164 | " files = {'file': image_file}\n", 165 | " response = requests.post(url, files=files)\n", 166 | " status_code = response.status_code\n", 167 | " if verbose:\n", 168 | " msg = \"Everything went well!\" if status_code == 200 else \"There was an error when handling the request.\"\n", 169 | " print(msg)\n", 170 | " return response" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "id": "8a2e03f5", 176 | "metadata": {}, 177 | "source": [ 178 | "To test this function, open a file in your filesystem and pass it as a parameter alongside the URL:" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 6, 184 | "id": "67651bc3", 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "There was an error when handling the request.\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "with open(\"images/clock2.jpg\", \"rb\") as image_file:\n", 197 | " prediction = response_from_server(full_url, image_file)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "id": "eff0c51a", 203 | "metadata": {}, 204 | "source": [ 205 | "Great news! The request was successful. However, you are not getting any information about the objects in the image.\n", 206 | "\n", 207 | "To get the image with the bounding boxes and labels, you need to parse the content of the response into an appropriate format. This process looks very similar to how you read raw images into a cv2 image on the server.\n", 208 | "\n", 209 | "To handle this step, let's create a directory called `images_predicted` to save the image to:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 7, 215 | "id": "7a172496", 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "dir_name = \"images_predicted\"\n", 220 | "if not os.path.exists(dir_name):\n", 221 | " os.mkdir(dir_name)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "id": "86a858c8", 227 | "metadata": {}, 228 | "source": [ 229 | "\n", 230 | "### Creating the display_image_from_response function" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 8, 236 | "id": "0877514b", 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "def display_image_from_response(response):\n", 241 | " \"\"\"Display image within server's response.\n", 242 | "\n", 243 | " Args:\n", 244 | " response (requests.models.Response): The response from the server after object detection.\n", 245 | " \"\"\"\n", 246 | " \n", 247 | " image_stream = io.BytesIO(response.content)\n", 248 | " image_stream.seek(0)\n", 249 | " file_bytes = np.asarray(bytearray(image_stream.read()), dtype=np.uint8)\n", 250 | " image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)\n", 251 | " filename = \"image_with_objects.jpeg\"\n", 252 | " cv2.imwrite(f'images_predicted/{filename}', image)\n", 253 | " display(Image(f'images_predicted/{filename}'))" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 9, 259 | "id": "7b385400", 260 | "metadata": {}, 261 | "outputs": [ 262 | { 263 | "ename": "error", 264 | "evalue": "OpenCV(4.5.3) C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pip-req-build-z4706ql7\\opencv\\modules\\imgcodecs\\src\\loadsave.cpp:803: error: (-215:Assertion failed) !_img.empty() in function 'cv::imwrite'\n", 265 | "output_type": "error", 266 | "traceback": [ 267 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 268 | "\u001b[1;31merror\u001b[0m Traceback (most recent call last)", 269 | "Cell \u001b[1;32mIn[9], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[43mdisplay_image_from_response\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprediction\u001b[49m\u001b[43m)\u001b[49m\n", 270 | "Cell \u001b[1;32mIn[8], line 13\u001b[0m, in \u001b[0;36mdisplay_image_from_response\u001b[1;34m(response)\u001b[0m\n\u001b[0;32m 11\u001b[0m image \u001b[38;5;241m=\u001b[39m cv2\u001b[38;5;241m.\u001b[39mimdecode(file_bytes, cv2\u001b[38;5;241m.\u001b[39mIMREAD_COLOR)\n\u001b[0;32m 12\u001b[0m filename \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mimage_with_objects.jpeg\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m---> 13\u001b[0m \u001b[43mcv2\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mimwrite\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mimages_predicted/\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mfilename\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mimage\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 14\u001b[0m display(Image(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mimages_predicted/\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mfilename\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m))\n", 271 | "\u001b[1;31merror\u001b[0m: OpenCV(4.5.3) C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pip-req-build-z4706ql7\\opencv\\modules\\imgcodecs\\src\\loadsave.cpp:803: error: (-215:Assertion failed) !_img.empty() in function 'cv::imwrite'\n" 272 | ] 273 | } 274 | ], 275 | "source": [ 276 | "display_image_from_response(prediction)" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "id": "b7504a89", 282 | "metadata": {}, 283 | "source": [ 284 | "Now you are ready to consume your object detection model through your own client!\n", 285 | "\n", 286 | "Let's test it out on some other images:" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "id": "d6d7e14e", 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "image_files = [\n", 297 | " 'car2.jpg',\n", 298 | " 'clock3.jpg',\n", 299 | " 'apples.jpg'\n", 300 | "]\n", 301 | "\n", 302 | "for image_file in image_files:\n", 303 | " with open(f\"images/{image_file}\", \"rb\") as image_file:\n", 304 | " prediction = response_from_server(full_url, image_file, verbose=False)\n", 305 | " \n", 306 | " display_image_from_response(prediction)" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "id": "3d97869c", 312 | "metadata": {}, 313 | "source": [ 314 | "**Congratulations on finishing this ungraded lab!** Real life clients and servers have a lot more going on in terms of security and performance. However, the code you just experienced is close to what you see in real production environments. \n", 315 | "Hopefully, this lab served the purpose of increasing your familiarity with the process of deploying a Deep Learning model, and consuming from it.\n", 316 | "\n", 317 | "**Keep it up!**" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "id": "ccfc4b46", 323 | "metadata": {}, 324 | "source": [ 325 | "# \n", 326 | "## Optional Challenge - Adding the confidence level to the request\n", 327 | "\n", 328 | "Let's expand on what you have learned so far. The next logical step is to extend the server and the client so that they can accommodate an additional parameter: the level of confidence of the prediction. \n", 329 | "\n", 330 | "**To test your extended implementation you must perform the following steps:**\n", 331 | "\n", 332 | "- Stop the server by interrupting the Kernel.\n", 333 | "- Extend the `prediction` function in the server.\n", 334 | "- Re run the cell containing your server code.\n", 335 | "- Re launch the server.\n", 336 | "- Extend your client.\n", 337 | "- Test it with some images (either with your client or fastAPI's one).\n", 338 | "\n", 339 | "Here are some hints that can help you out throughout the process:\n", 340 | "\n", 341 | "#### Server side:\n", 342 | "- The `prediction` function that handles the `/predict` endpoint needs an additional parameter to accept the confidence level. Add this new parameter before the `File` parameter. This is necessary because `File` has a default value and must be specified last.\n", 343 | "\n", 344 | "\n", 345 | "- `cv.detect_common_objects` accepts the `confidence` parameter, which is a floating point number (type `float`in Python).\n", 346 | "\n", 347 | "\n", 348 | "#### Client side:\n", 349 | "- You can add a new parameter to the URL by extending it with an `&` followed by the name of the parameter and its value. The name of this new parameter must be equal to the name used within the `prediction` function in the server. An example would look like this: `myawesomemodel.com/predict?model=yolov3-tiny&newParam=value` \n", 350 | "\n", 351 | "##### Sample Solution:\n", 352 | "- Once you're done with this optional task or if you got stuck while doing it, you can see a sample solution by one of your course mentors [here](https://community.deeplearning.ai/t/c1-w1-optional-challenge-confidence-level/67619). Just make sure you've already joined our Discourse community as shown in an earlier reading item. This is posted in the [MLEP Learner Projects](https://community.deeplearning.ai/c/machine-learning-engineering-for-production/mlep-learner-projects/224) category and feel free to post your own solution (and other content-related projects) there as well. Just remember **not to post any graded material** so as not to violate the Honor Code. You can instead take one of the tools/concepts taught in the lectures or labs then apply it to a mini-project. [Here](https://community.deeplearning.ai/t/fastapi-for-text-classification-problem-in-arabic/56857) is an example. We encourage you to explore your fellow learners' projects and comment on the ones you find interesting.\n", 353 | "\n", 354 | "\n", 355 | "**You can do it!**" 356 | ] 357 | } 358 | ], 359 | "metadata": { 360 | "kernelspec": { 361 | "display_name": "Python 3 (ipykernel)", 362 | "language": "python", 363 | "name": "python3" 364 | }, 365 | "language_info": { 366 | "codemirror_mode": { 367 | "name": "ipython", 368 | "version": 3 369 | }, 370 | "file_extension": ".py", 371 | "mimetype": "text/x-python", 372 | "name": "python", 373 | "nbconvert_exporter": "python", 374 | "pygments_lexer": "ipython3", 375 | "version": "3.8.16" 376 | } 377 | }, 378 | "nbformat": 4, 379 | "nbformat_minor": 5 380 | } 381 | -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/apple.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/apple.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/apples.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/apples.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/car.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/car.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/car1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/car1.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/car2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/car2.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/car3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/car3.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/clock.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/clock.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/clock2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/clock2.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/clock3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/clock3.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/fruits.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/fruits.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/oranges.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/images/oranges.jpg -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/images/readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Deploying-Yolo3-Model-on-FastAPI/token.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/Deploying-Yolo3-Model-on-FastAPI/token.png -------------------------------------------------------------------------------- /How To Split The Data Effectively for Your Data Science Project.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# How To Split The Data Effectively for Your Data Science Project" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Data is one of the most important resources for any data science project. But what good is abundant data if you can't use it effectively? After all, your success as a data scientist hinges on how adeptly you can manipulate and analyze data to produce actionable insights from it. \n", 15 | "\n", 16 | "One of the common steps in your data science project after collecting the data is to split the data into train and test. Although this step might seem simple, if it is done in an effective way, it may affect your results and lead to unrealistic models. With this in mind, this article will walk you through the steps of splitting your data effectively as a data scientist. These tips will help you understand how to split your data for different types of analysis. We'll also look at some potential issues with splitting your data and give you some general rules to follow when undertaking this process. Read on to discover more!" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### Table of content:\n", 24 | "\n", 25 | "1. Why should you split your data?\n", 26 | "2. What Should Be The Splitting Percentage?\n", 27 | "3. Split at the early beginning\n", 28 | "4. Consistent Split \n", 29 | "5. Avoid Sampling Bias" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 32, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "import os\n", 39 | "import tarfile\n", 40 | "import urllib.request\n", 41 | "import pandas as pd" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## 1. Why should you split your data?\n", 49 | "\n", 50 | "Splitting data into train and test or splitting it into train, validation, and test is a common step in supervised machine learning projects. The train set is used to fit and train the model while the test set is used to evaluate the trained model so as to get a better idea of how good the model is on new data and how it will act in the production environment. Therefore the test data need to be similar to what is expected to be seen in the production. \n", 51 | "\n", 52 | "Another common splitting technique is to split the data into three datasets: train, validation, and test dataset. The validation dataset will be used to choose the best hyperparameters for your project." 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## 2. What Should Be The Splitting Percentage?\n", 60 | "\n", 61 | "The percentage of the test data depends on many factors that's why there is no optimal splitting percentage. You must choose a split percentage that meets your project's objectives with considerations that include:\n", 62 | "\n", 63 | "* **Computational cost in training the model:** If the cost of training the model is high this might affect our ability to evaluate multiple models especially if we will use a validation dataset. \n", 64 | "\n", 65 | "* **Size of the data:** This is a very important factor if we have a small dataset so probably we will not be able to split the data another good option will be using k-fold cross-validation for evaluation. If the size of the data is so big then it will be more than enough to split a small portion of the data and use it as a test dataset. \n", 66 | "\n", 67 | "However, there are some common splitting percentages:\n", 68 | "* Train: 80%, Test: 20%\n", 69 | "* Train: 67%, Test: 33%\n", 70 | "* Train: 50%, Test: 50%\n", 71 | "* Train 90%, Test: 10 %\n", 72 | "* Train 95% , Test 5%" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "## 3. Split At The Early Beginning" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "One of the important practical tips is to split your data in an early stage of your project directly after collecting the data. Although it may sound strange to voluntarily set aside part of the data at the early stage of your data science project before even data exploration. The reason for this is that your brain is an amazing pattern detection system, which means that it is highly prone to overfitting: if you look at the test set, you may stumble upon some seemingly interesting pattern in the test data that leads you to select a particular kind of Machine Learning model. Therefore when you estimate the generalization error using the test set after training your model, your estimate will be too optimistic, and you will launch a system into production that will not perform as well as expected. This is called snooping data bias or data leakage." 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## 4. Consistent Splitting" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "Although creating a test set is theoretically quite simple: you can just pick some instances randomly, typically 20% of the dataset, and set them aside as shown in the code below, but before that let's load the data we will be using. We will be using the California housing prices dataset throughout this article:" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 31, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "text/html": [ 111 | "
\n", 112 | "\n", 125 | "\n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
\n", 209 | "
" 210 | ], 211 | "text/plain": [ 212 | " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", 213 | "0 -122.23 37.88 41.0 880.0 129.0 \n", 214 | "1 -122.22 37.86 21.0 7099.0 1106.0 \n", 215 | "2 -122.24 37.85 52.0 1467.0 190.0 \n", 216 | "3 -122.25 37.85 52.0 1274.0 235.0 \n", 217 | "4 -122.25 37.85 52.0 1627.0 280.0 \n", 218 | "\n", 219 | " population households median_income median_house_value ocean_proximity \n", 220 | "0 322.0 126.0 8.3252 452600.0 NEAR BAY \n", 221 | "1 2401.0 1138.0 8.3014 358500.0 NEAR BAY \n", 222 | "2 496.0 177.0 7.2574 352100.0 NEAR BAY \n", 223 | "3 558.0 219.0 5.6431 341300.0 NEAR BAY \n", 224 | "4 565.0 259.0 3.8462 342200.0 NEAR BAY " 225 | ] 226 | }, 227 | "execution_count": 31, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "DOWNLOAD_ROOT = \"https://raw.githubusercontent.com/ageron/handson-ml2/master/\"\n", 234 | "HOUSING_PATH = os.path.join(\"datasets\", \"housing\")\n", 235 | "HOUSING_URL = DOWNLOAD_ROOT + \"datasets/housing/housing.tgz\"\n", 236 | "\n", 237 | "def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):\n", 238 | " if not os.path.isdir(housing_path):\n", 239 | " os.makedirs(housing_path)\n", 240 | " tgz_path = os.path.join(housing_path, \"housing.tgz\")\n", 241 | " urllib.request.urlretrieve(housing_url, tgz_path)\n", 242 | " housing_tgz = tarfile.open(tgz_path)\n", 243 | " housing_tgz.extractall(path=housing_path)\n", 244 | " housing_tgz.close()\n", 245 | " \n", 246 | "fetch_housing_data()\n", 247 | "\n", 248 | "def load_housing_data(housing_path=HOUSING_PATH):\n", 249 | " csv_path = os.path.join(housing_path, \"housing.csv\")\n", 250 | " return pd.read_csv(csv_path)\n", 251 | "\n", 252 | "housing = load_housing_data()\n", 253 | "housing.head()" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "Now let's define the splitting function to split the data into training and test set:" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 10, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "The size of the training data 16512\n", 273 | "The size of the testing data 4128\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "#define a splitting fucntion \n", 279 | "def split_train_test(data, test_ratio):\n", 280 | " shuffled_indices = np.random.permutation(len(data))\n", 281 | " test_set_size = int(len(data) * test_ratio)\n", 282 | " test_indices = shuffled_indices[:test_set_size]\n", 283 | " train_indices = shuffled_indices[test_set_size:]\n", 284 | " return data.iloc[train_indices], data.iloc[test_indices]\n", 285 | "\n", 286 | "# split the data into training and testing data \n", 287 | "train_set, test_set = split_train_test(housing, 0.2)\n", 288 | "print('The size of the training data', len(train_set))\n", 289 | "print('The size of the testing data', len(test_set))" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "This method works, but it has a major artifact. If you run the code again, it will generate a different test set! Over time, you (or your Machine Learning algorithms) will get to see the whole dataset, which is what you want to avoid. There are some potential solutions that can be done to avoid this:\n", 297 | "* One solution is to save the test set on the first run in a separate file and then load it in the subsequent runs.\n", 298 | "* Another option is to set the random number generator's seed (e.g., np.random.seed(42)) before calling np.random.permutation(), so that it always generates the same shuffled indices." 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "**However, both of these solutions will break the next time you fetch an updated dataset.** A common better, and more reliable solution is to use each instance's identifier to decide whether or not it should go in the test set (assuming instances have a unique and immutable identifier). For example, you could compute a hash of each instance's identifier and put that instance in the test set if the hash is lower or equal to 20% of the maximum hash value. This ensures that the test set will remain consistent across multiple runs, even if you refresh the dataset. The new test set will contain 20% of the new instances, but it will not contain any instance that was previously in the training set." 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "Here is a possible implementation:" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 11, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "from zlib import crc32\n", 322 | "def test_set_check(identifier, test_ratio):\n", 323 | " return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32\n", 324 | "\n", 325 | "def split_train_test_by_id(data, test_ratio, id_column):\n", 326 | " ids = data[id_column]\n", 327 | " in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio))\n", 328 | " return data.loc[~in_test_set], data.loc[in_test_set]" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "Unfortunately, our housing dataset does not have an identifier column. The simplest solution is to use the row index as the ID:" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 12, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "housing_with_id = housing.reset_index() # adds an `index` column\n", 345 | "train_set, test_set = split_train_test_by_id(housing_with_id, 0.2, \"index\")" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "However, it is important to note that if you use the row index as a unique identifier, you need to make sure that new data gets appended to the end of the dataset and that no row ever gets deleted. If this is not possible, then you can try to use the most stable features to build a unique identifier. For example, in the housing data set, the district's latitude and longitude are very good choices since they are guaranteed to be stable for a few million years so you could combine them into an ID like the following:" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 13, 358 | "metadata": {}, 359 | "outputs": [], 360 | "source": [ 361 | "housing_with_id[\"id\"] = housing[\"longitude\"] * 1000 + housing[\"latitude\"]\n", 362 | "train_set, test_set = split_train_test_by_id(housing_with_id, 0.2, \"id\")" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "**Scikit-Learn** provides a few functions to split datasets into multiple subsets in various ways. The simplest function is **train_test_split**, which does pretty much the same thing as the function split_train_test defined earlier, with a couple of additional features. First, there is a random_state parameter that allows you to set the random generator seed as explained previously, and second, you can pass it through multiple datasets with an identical number of rows, and it will split them on the same indices (this is very useful, for example, if you have a separate DataFrame for labels):" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 14, 375 | "metadata": {}, 376 | "outputs": [], 377 | "source": [ 378 | "from sklearn.model_selection import train_test_split\n", 379 | "train_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "## 5. Avoid Sampling Bias" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "So far, we have considered purely random sampling methods for getting and splitting test data. This is generally fine if your dataset is large enough (especially relative to the number of attributes), but if it is not, you run the risk of introducing a significant sampling bias." 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "Let's consider a survey company that decided to call 1,000 people to ask them a few questions, they don't just pick 1,000 people randomly in a phone book. They try to ensure that these 1,000 people are representative of the whole population. For example, the US population is composed of 51.3% female and 48.7% male, so a well-conducted survey in the US would try to maintain this ratio in the sample: 513 females and 487 males. We should try to do the same when splitting our dataset." 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "This is called stratified sampling, in which the population is divided into homogeneous subgroups called strata, and the right number of instances is sampled from each stratum to guarantee that the test set is representative of the overall population. So in the previous example, if they used purely random sampling, there would be about a 12% chance of sampling a skewed test set with either less than 49% female or more than 54% female. Either way, the survey results would be significantly biased." 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "Let's go back to the housing prices dataset used in this blog. We can assume that median income is a very important attribute in predicting median housing prices. Therefore we should ensure that the test set is representative of the various categories of incomes in the whole dataset.\n", 415 | "\n", 416 | "Since the median income is a continuous numerical attribute, you first need to create an income category attribute. Let's plot a histogram for the median income feature and have a close look at it:" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 21, 422 | "metadata": { 423 | "scrolled": false 424 | }, 425 | "outputs": [ 426 | { 427 | "data": { 428 | "text/plain": [ 429 | "Text(0.5, 1.0, 'Histogram for the median income')" 430 | ] 431 | }, 432 | "execution_count": 21, 433 | "metadata": {}, 434 | "output_type": "execute_result" 435 | }, 436 | { 437 | "data": { 438 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABJgAAANsCAYAAAAX8BIxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA9RElEQVR4nO3de5xlZ13n++/PtECgJQlGekIS7agRhcRbWsTr6QwoGYOEOQc1HNBEGfMaZBCdoHTGmcHRiSczI14YBCdHkDBwaCKiRFqQDNowFwISRZoAGTKkJTcTuSTSwIAJv/PHXtFKpbvTu5+q2lXd7/fr1a/ae+21nnq66klX59NrrV3dHQAAAAA4XF+06AkAAAAAsLEJTAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQC4n6q6rqq2L3oeq62q/m1Vfayq/mqVxt9eVTevxtgrraq6qr56evybVfWvVuFzfFdVXb/S4wIAiycwAcBRpqr2VtUTl227sKr+273Pu/ux3b37AcbZOkWJTas01VVVVacmuTjJY7r7H6zQmH8XaTay7v6n3f2LqzDuf+3uR6/0uADA4glMAMC6tAbh6iuSfLy775j3wI0a1QAAVovABADcz9KznKrqcVX1nqr6m6q6vap+ZdrtHdPHO6tqX1V9W1V9UVX9y6r6y6q6o6peVVXHLRn3R6bXPl5V/2rZ5/n5qnp9Vb26qv4myYXT535nVd1ZVbdV1Uuq6kFLxuuq+omq+nBVfaqqfrGqvmo65m+q6sql+y857olJrk7yqGnur5y2P2W6PPDOqtpdVV+37Gvygqp6X5JPL49MVXXv1+MvpjF/aMlrF09fj9uq6keXbH9wVf1yVX10+tr+ZlUde4DvyYVV9d+r6len+X2kqr592n7TNP4Fhzp2Vf3MNJ9bq+rHln2uV1bVv50en1BVb6qqv66qT06PT1my7+7p6/7fp+/BW6vqxAP8Hu5zyeD0NX1+Vb2vqu6qqtdV1UOWvH5eVb13+l7+r6o6Z9r+qKq6qqo+UVU3VNWPLznm56vqd6Z19Kmq2lNVX1NVl0xfo5uq6nuX7H9cVb18+lrcUrPLJo/Z3/wBgAMTmACAB/LrSX69ux+e5KuSXDlt/+7p4/Hdvbm735nkwunX2Um+MsnmJC9Jkqp6TJKXJnlGkpOSHJfk5GWf67wkr09yfJLXJLknyU8nOTHJtyV5QpKfWHbMOUnOSvL4JD+b5PLpc5ya5IwkT1/+G+ru/5LkHyW5dZr7hVX1NUlem+SnknxZkj9M8gfLAtXTk5w7/Z7vXjbmvV+Pb5jGfN30/B8s+b0+K8lvVNUJ02v/LsnXJPnGJF897fOvl893iW9N8r4kX5rk/0uyM8m3TMc+M8lLqmrzA409hZrnJ/meJKcnuc8lk8t8UZLfzuyMry9P8tlM39Ml/u8kP5rkkUkeNI19qH4ws+/haUm+PrP1k6p6XJJXJfmZzNbDdyfZOx3z2iQ3J3lUkqcl+aWqesKSMb8/yX9OckKSP0/yR9Pv4+Qkv5DkPy3Z94okd2f2NfqmJN+b5J/MMX8AIAITABytfn86C+bOqrozs/BzIH+b5Kur6sTu3tfd1xxk32ck+ZXu/kh370tySZLzp7N9npbkD7r7v3X35zOLHb3s+Hd29+939xe6+7PdfW13X9Pdd3f33szCwP+x7Jh/191/093XJXl/krdOn/+uJG/OLBocih9Ksqu7r+7uv03yy0mOTfLtS/Z5cXff1N2fPcQxk9nX7xe6+2+7+w+T7Evy6KqqJD+e5Ke7+xPd/akkv5Tk/IOMdWN3/3Z335PkdZlFtF/o7s9191uTfD6z79UDjf2DSX67u9/f3Z9O8vMH+oTd/fHu/t3u/sw0zqW5//fgt7v7f05flyszi1qH6sXdfWt3fyLJHyw59llJXjF9P77Q3bd094dqdu+s70zygu7+39393iS/leSHl4z5X7v7j6YI+DuZBcPLpu/rziRbq+r4qtqSWWj8qe7+9HS55K/m4N8DAGA/3D8AAI5OT53O4kkyu/wqBz5r41mZnfXxoaq6Mcm/6e43HWDfRyX5yyXP/zKzv29smV676d4XuvszVfXxZcfftPTJdFbRryTZluSh01jXLjvm9iWPP7uf54d6A+/7zL27v1BVN+W+Z1nddL+jHtjHl53t9JnMzuz6ssx+T9fOelCSpJIc7PKs5b+3dPfybYcy9qNy36/j0u/ZfVTVQzOLLudkdkZQknxJVR0zha4kWfoufPf+/g7V8mMfNT0+NbOzyJZ7VJJ7o9m9/jKzNXKv5V+Tjy2Z671xcPM01hcnuW3J1+mLcnjfZwA4qjmDCQA4qO7+cHc/PbPLn/5dktdX1cNy/7OPkuTWzC6luteXZ3b50e1Jbkuy9N49x2Z2qdd9Pt2y5y9L8qEkp0+X6P2LzELJarjP3KezgE5NcstB5jfiY5nFjsd29/HTr+O6e544c7hj35bZ7+1eX36QsS5O8ugk3zp9D+69FHC1vg/3uimzSzKXuzXJI6rqS5Zs+/Lc9/s0z+f4XJITl3ydHt7djz2MsQDgqCYwAQAHVVXPrKov6+4vJLlz2nxPkr9O8oXM7rV0r9cm+emqOm26F9AvJXnddAbP65N8/3Rj6gcl+Td54EjxJUn+Jsm+qvraJM9eqd/XflyZ5NyqekJVfXFmYeVzSf7HHGPcnvt+PQ5o+nr+v0l+taoemSRVdXJVPWm+aR/W2FdmdhP1x0xnKL3wIMN9SWax6s6qesQD7LuSXp7kR6fvxxdN8//a7r4ps+/J/1NVD6mqr8/sLLvXzPsJuvu2JG9N8qKqevj0eb6qqpZfAggAPACBCQB4IOckua6q9mV2w+/zp3vffCaz+/H89+leTo9P8orMbq78jiQ3JvnfSZ6bJNM9kp6b2T1wbkvyqSR3ZBZxDuT5md1A+lOZBZPXHWTfId19fWY3yv6PmZ0B9P1Jvn+6X9Sh+vkkV0xfjx88hP1fkOSGJNfU7J3z/ktmZwuthAOO3d1vTvJrSf542uePDzLOr2V2L6qPJbkmyVtWaH4H1d3vzuzG4b+a5K4kb8/fn2H29CRbMzub6feSvLC7rz7MT/Ujmd2Y/ANJPplZCD3psCcOAEep6l7JM70BAA7NdIbTnZld/nbjgqcDAMAAZzABAGumqr6/qh463cPpl5Psyd+/9TwAABuUwAQArKXzMrus6dYkp2d2uZ3TqQEANjiXyAEAAAAwxBlMAAAAAAzZtOgJrJYTTzyxt27deljHfvrTn87DHvawlZ0QRyzrhXlYL8zDemEe1gvzsF6Yh/XCPKyXI9+11177se7+suXbj9jAtHXr1rznPe85rGN3796d7du3r+yEOGJZL8zDemEe1gvzsF6Yh/XCPKwX5mG9HPmq6i/3t90lcgAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwJBNi54AcHTaumPXio6397JzV3Q8AAAADp0zmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAkFULTFX1iqq6o6rev2z7c6vq+qq6rqr+/ZLtl1TVDdNrT1qy/ayq2jO99uKqqtWaMwAAAADzW80zmF6Z5JylG6rq7CTnJfn67n5skl+etj8myflJHjsd89KqOmY67GVJLkpy+vTrPmMCAAAAsFirFpi6+x1JPrFs87OTXNbdn5v2uWPafl6Snd39ue6+MckNSR5XVScleXh3v7O7O8mrkjx1teYMAAAAwPxq1m1WafCqrUne1N1nTM/fm+SNmZ2F9L+TPL+7/7SqXpLkmu5+9bTfy5O8OcnezILUE6ft35XkBd395AN8vosyO9spW7ZsOWvnzp2HNe99+/Zl8+bNh3UsRx/r5fDsueWuFR3vzJOPW9HxVov1wjysF+ZhvTAP64V5WC/Mw3o58p199tnXdve25ds3rfE8NiU5Icnjk3xLkiur6iuT7O++Sn2Q7fvV3ZcnuTxJtm3b1tu3bz+sSe7evTuHeyxHH+vl8Fy4Y9eKjrf3GdtXdLzVYr0wD+uFeVgvzMN6YR7WC/OwXo5ea/0ucjcneUPPvDvJF5KcOG0/dcl+pyS5ddp+yn62AwAAALBOrHVg+v0k/zBJquprkjwoyceSXJXk/Kp6cFWdltnNvN/d3bcl+VRVPX5697gfyewSOwAAAADWiVW7RK6qXptke5ITq+rmJC9M8ookr6iq9yf5fJILppt3X1dVVyb5QJK7kzynu++Zhnp2Zu9Id2xm92V682rNGY4UW1f48rMk2XvZuSs+JgAAAEeGVQtM3f30A7z0zAPsf2mSS/ez/T1JzljBqQEAAACwgtb6EjkAAAAAjjACEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMGTToicAbAxbd+xa9BQAAABYp5zBBAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADPEucsARYTXe5W7vZeeu+JgAAABHImcwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgyKoFpqp6RVXdUVXv389rz6+qrqoTl2y7pKpuqKrrq+pJS7afVVV7ptdeXFW1WnMGAAAAYH6reQbTK5Ocs3xjVZ2a5HuSfHTJtsckOT/JY6djXlpVx0wvvyzJRUlOn37db0wAAAAAFmfVAlN3vyPJJ/bz0q8m+dkkvWTbeUl2dvfnuvvGJDckeVxVnZTk4d39zu7uJK9K8tTVmjMAAAAA86tZt1mlwau2JnlTd58xPX9Kkid09/Oqam+Sbd39sap6SZJruvvV034vT/LmJHuTXNbdT5y2f1eSF3T3kw/w+S7K7GynbNmy5aydO3ce1rz37duXzZs3H9axHH3W43rZc8tdi57CEeHMk49b0fH23HJXthyb3P7ZlRlvpefH+rMe/3xh/bJemIf1wjysF+ZhvRz5zj777Gu7e9vy7ZvWagJV9dAkP5fke/f38n629UG271d3X57k8iTZtm1bb9++ff6JJtm9e3cO91iOPutxvVy4Y9eip3BE2PuM7Ss63oU7duXiM+/Oi/aszB+9Kz0/1p/1+OcL65f1wjysF+ZhvTAP6+XotWaBKclXJTktyV9M9+k+JcmfVdXjktyc5NQl+56S5NZp+yn72Q4AAADAOrGaN/m+j+7e092P7O6t3b01s3j0zd39V0muSnJ+VT24qk7L7Gbe7+7u25J8qqoeP7173I8keeNazRkAAACAB7ZqgamqXpvknUkeXVU3V9WzDrRvd1+X5MokH0jyliTP6e57ppefneS3Mrvx9//K7N5MAAAAAKwTq3aJXHc//QFe37rs+aVJLt3Pfu9JcsaKTg4AAACAFbNml8gBAAAAcGQSmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGDIpkVPAEi27ti16CkAAADAYXMGEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDNi16ArARbd2x6+8eX3zm3blwyXMAAAA42jiDCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgyKZFTwBgvdq6Y9eipwAAALAhOIMJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAkFULTFX1iqq6o6rev2Tbf6iqD1XV+6rq96rq+CWvXVJVN1TV9VX1pCXbz6qqPdNrL66qWq05AwAAADC/1TyD6ZVJzlm27eokZ3T31yf5n0kuSZKqekyS85M8djrmpVV1zHTMy5JclOT06dfyMQEAAABYoFULTN39jiSfWLbtrd199/T0miSnTI/PS7Kzuz/X3TcmuSHJ46rqpCQP7+53dncneVWSp67WnAEAAACYX826zSoNXrU1yZu6+4z9vPYHSV7X3a+uqpckuaa7Xz299vIkb06yN8ll3f3Eaft3JXlBdz/5AJ/voszOdsqWLVvO2rlz52HNe9++fdm8efNhHcvRYc8td/3d4y3HJrd/doGTYUNZyfVy5snHrcxArFt+HjEP64V5WC/Mw3phHtbLke/ss8++tru3Ld++aRGTqaqfS3J3ktfcu2k/u/VBtu9Xd1+e5PIk2bZtW2/fvv2w5rd79+4c7rEcHS7csevvHl985t150Z6F/KfEBrSS62XvM7avyDisX34eMQ/rhXlYL8zDemEe1svRa83/r7iqLkjy5CRP6L8/fermJKcu2e2UJLdO20/Zz3YAAAAA1onVvMn3/VTVOUlekOQp3f2ZJS9dleT8qnpwVZ2W2c28393dtyX5VFU9fnr3uB9J8sa1nDMAAAAAB7dqZzBV1WuTbE9yYlXdnOSFmb1r3IOTXD3rRbmmu/9pd19XVVcm+UBml849p7vvmYZ6dmbvSHdsZvdlevNqzRkAAACA+a1aYOrup+9n88sPsv+lSS7dz/b3JLnfTcIBAAAAWB/W9BI5AAAAAI48AhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAM2bToCcBa2Lpj16KnAAAAAEcsZzABAAAAMERgAgAAAGDIqgWmqnpFVd1RVe9fsu0RVXV1VX14+njCktcuqaobqur6qnrSku1nVdWe6bUXV1Wt1pwBAAAAmN9qnsH0yiTnLNu2I8nbuvv0JG+bnqeqHpPk/CSPnY55aVUdMx3zsiQXJTl9+rV8TAAAAAAWaNUCU3e/I8knlm0+L8kV0+Mrkjx1yfad3f257r4xyQ1JHldVJyV5eHe/s7s7yauWHAMAAADAOlCzbrNKg1dtTfKm7j5jen5ndx+/5PVPdvcJVfWSJNd096un7S9P8uYke5Nc1t1PnLZ/V5IXdPeTD/D5LsrsbKds2bLlrJ07dx7WvPft25fNmzcf1rGsT3tuuWvVxt5ybHL7Z1dteI4wK7lezjz5uJUZiHXLzyPmYb0wD+uFeVgvzMN6OfKdffbZ13b3tuXbNy1iMvuxv/sq9UG271d3X57k8iTZtm1bb9++/bAms3v37hzusaxPF+7YtWpjX3zm3XnRnvXynxLr3Uqul73P2L4i47B++XnEPKwX5mG9MA/rhXlYL0evtX4Xuduny94yfbxj2n5zklOX7HdKklun7afsZzsAAAAA68RaB6arklwwPb4gyRuXbD+/qh5cVadldjPvd3f3bUk+VVWPn9497keWHAMAAADAOrBq1/VU1WuTbE9yYlXdnOSFSS5LcmVVPSvJR5P8QJJ093VVdWWSDyS5O8lzuvueaahnZ/aOdMdmdl+mN6/WnAEAAACY36oFpu5++gFeesIB9r80yaX72f6eJGes4NQAjghbV+HeYnsvO3fFxwQAAI58a32JHAAAAABHGIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAkE2LngAA68fWHbtWdLy9l527ouMBAADrkzOYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIYcUmKrqOw5lGwAAAABHn0M9g+k/HuI2AAAAAI4ymw72YlV9W5JvT/JlVfXPl7z08CTHrObEAAAAANgYDhqYkjwoyeZpvy9Zsv1vkjxttSYFAAAAwMZx0MDU3W9P8vaqemV3/+UazQkAAACADeSBzmC614Or6vIkW5ce093/cDUmBQAAAMDGcaiB6XeS/GaS30pyz+pNBwAAAICN5lAD093d/bJVnQkAAAAAG9IXHeJ+f1BVP1FVJ1XVI+79taozAwAAAGBDONQzmC6YPv7Mkm2d5CtXdjoAAAAAbDSHFJi6+7TVnggAAAAAG9MhBaaq+pH9be/uV63sdAAAAADYaA71ErlvWfL4IUmekOTPkghMAAAAAEe5Q71E7rlLn1fVcUn+86rMCAAAAIAN5VDfRW65zyQ5fSUnAgAAAMDGdKj3YPqDzN41LkmOSfJ1Sa5crUkBAAAAsHEc6j2YfnnJ47uT/GV337wK8wEAAABggzmkS+S6++1JPpTkS5KckOTzqzkpAAAAADaOQwpMVfWDSd6d5AeS/GCSd1XV01ZzYgAAAABsDId6idzPJfmW7r4jSarqy5L8lySvX62JAQAAALAxHOq7yH3RvXFp8vE5jgUAAADgCHaoZzC9par+KMlrp+c/lOQPV2dKAAAAAGwkBw1MVfXVSbZ0989U1f+Z5DuTVJJ3JnnNGswPAAAAgHXugS5z+7Ukn0qS7n5Dd//z7v7pzM5e+rXVnRoAAAAAG8EDBaat3f2+5Ru7+z1Jtq7KjAAAAADYUB4oMD3kIK8du5ITAQAAAGBjeqDA9KdV9ePLN1bVs5JcuzpTAgAAAGAjeaB3kfupJL9XVc/I3welbUkelOQfr+K8AAAAANggDhqYuvv2JN9eVWcnOWPavKu7/3jVZwYAAADAhvBAZzAlSbr7T5L8ySrPBQAAAIAN6JACEwAcjq07dq3oeHsvO3dFxwMAAFbGA93kGwAAAAAOaiGBqap+uqquq6r3V9Vrq+ohVfWIqrq6qj48fTxhyf6XVNUNVXV9VT1pEXMGAAAAYP/WPDBV1clJfjLJtu4+I8kxSc5PsiPJ27r79CRvm56nqh4zvf7YJOckeWlVHbPW8wYAAABg/xZ1idymJMdW1aYkD01ya5LzklwxvX5FkqdOj89LsrO7P9fdNya5Icnj1na6AAAAABxIdffaf9Kq5yW5NMlnk7y1u59RVXd29/FL9vlkd59QVS9Jck13v3ra/vIkb+7u1+9n3IuSXJQkW7ZsOWvnzp2HNb99+/Zl8+bNh3Us69OeW+5atbG3HJvc/tlVG54jjPUy5syTj1v0FNaUn0fMw3phHtYL87BemIf1cuQ7++yzr+3ubcu3r/m7yE33VjovyWlJ7kzyO1X1zIMdsp9t+61i3X15ksuTZNu2bb19+/bDmuPu3btzuMeyPl24wu9ktdTFZ96dF+3xhowcGutlzN5nbF/0FNaUn0fMw3phHtYL87BemIf1cvRaxCVyT0xyY3f/dXf/bZI3JPn2JLdX1UlJMn28Y9r/5iSnLjn+lMwuqQMAAABgHVhEYPpoksdX1UOrqpI8IckHk1yV5IJpnwuSvHF6fFWS86vqwVV1WpLTk7x7jecMAAAAwAGs+XUa3f2uqnp9kj9LcneSP8/ssrbNSa6sqmdlFqF+YNr/uqq6MskHpv2f0933rPW8AQAAANi/hdwIpLtfmOSFyzZ/LrOzmfa3/6WZ3RQcAAAAgHVmEZfIAQAAAHAEEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBk06InAACHauuOXSs63t7Lzl3R8QAA4GjlDCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAzZtOgJwHJbd+xa9BQAAACAOTiDCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwJCFBKaqOr6qXl9VH6qqD1bVt1XVI6rq6qr68PTxhCX7X1JVN1TV9VX1pEXMGQAAAID9W9QZTL+e5C3d/bVJviHJB5PsSPK27j49ydum56mqxyQ5P8ljk5yT5KVVdcxCZg0AAADA/ax5YKqqhyf57iQvT5Lu/nx335nkvCRXTLtdkeSp0+Pzkuzs7s91941JbkjyuLWcMwAAAAAHVt29tp+w6huTXJ7kA5mdvXRtkucluaW7j1+y3ye7+4SqekmSa7r71dP2lyd5c3e/fj9jX5TkoiTZsmXLWTt37jysOe7bty+bN28+rGMZt+eWuxY9hblsOTa5/bOLngUbhfWyvpx58nGLnsJB+XnEPKwX5mG9MA/rhXlYL0e+s88++9ru3rZ8+6YFzGVTkm9O8tzufldV/Xqmy+EOoPazbb9VrLsvzyxeZdu2bb19+/bDmuDu3btzuMcy7sIduxY9hblcfObdedGeRfynxEZkvawve5+xfdFTOCg/j5iH9cI8rBfmYb0wD+vl6LWIezDdnOTm7n7X9Pz1mQWn26vqpCSZPt6xZP9Tlxx/SpJb12iuAAAAADyANQ9M3f1XSW6qqkdPm56Q2eVyVyW5YNp2QZI3To+vSnJ+VT24qk5LcnqSd6/hlAEAAAA4iEVdp/HcJK+pqgcl+UiSH80sdl1ZVc9K8tEkP5Ak3X1dVV2ZWYS6O8lzuvuexUwbAAAAgOUWEpi6+71J7ndDqMzOZtrf/pcmuXQ15wQAAADA4VnEPZgAAAAAOIIITAAAAAAMEZgAAAAAGLKom3wDwMJt3bFrxcfce9m5Kz4mAACsd85gAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYsmnREwAADmzrjl0rOt7ey85d0fEAACBxBhMAAAAAgwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIYITAAAAAAMWVhgqqpjqurPq+pN0/NHVNXVVfXh6eMJS/a9pKpuqKrrq+pJi5ozAAAAAPe3yDOYnpfkg0ue70jytu4+PcnbpuepqsckOT/JY5Ock+SlVXXMGs8VAAAAgANYSGCqqlOSnJvkt5ZsPi/JFdPjK5I8dcn2nd39ue6+MckNSR63RlMFAAAA4AEs6gymX0vys0m+sGTblu6+LUmmj4+ctp+c5KYl+908bQMAAABgHajuXttPWPXkJN/X3T9RVduTPL+7n1xVd3b38Uv2+2R3n1BVv5Hknd396mn7y5P8YXf/7n7GvijJRUmyZcuWs3bu3HlYc9y3b182b958WMcybs8tdy16CnPZcmxy+2cXPQs2CuvlyHfmycet2Fj79u3LjXfds2LjJSs7P9YXf39hHtYL87BemIf1cuQ7++yzr+3ubcu3b1rAXL4jyVOq6vuSPCTJw6vq1Ulur6qTuvu2qjopyR3T/jcnOXXJ8ackuXV/A3f35UkuT5Jt27b19u3bD2uCu3fvzuEey7gLd+xa9BTmcvGZd+dFexbxnxIbkfVyFNjz6RUb6uIz71nx9bL3GdtXdDzWD39/YR7WC/OwXpiH9XL0WvNL5Lr7ku4+pbu3Znbz7j/u7mcmuSrJBdNuFyR54/T4qiTnV9WDq+q0JKcnefcaTxsAAACAA1hP/4x+WZIrq+pZST6a5AeSpLuvq6ork3wgyd1JntPdK3u9AAAAAACHbaGBqbt3J9k9Pf54kiccYL9Lk1y6ZhMDAAAA4JAt6l3kAAAAADhCCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwZNOiJ8DGt3XHrkVPAQAAAFggZzABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMGTToicAAKydrTt2rfiYey87d8XHBABgY3EGEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMGTToicAAGxsW3fsWtHx9l527oqOBwDA6nMGEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhqx5YKqqU6vqT6rqg1V1XVU9b9r+iKq6uqo+PH08Yckxl1TVDVV1fVU9aa3nDAAAAMCBLeIMpruTXNzdX5fk8UmeU1WPSbIjydu6+/Qkb5ueZ3rt/CSPTXJOkpdW1TELmDcAAAAA+7Hmgam7b+vuP5sefyrJB5OcnOS8JFdMu12R5KnT4/OS7Ozuz3X3jUluSPK4NZ00AAAAAAe00HswVdXWJN+U5F1JtnT3bcksQiV55LTbyUluWnLYzdM2AAAAANaB6u7FfOKqzUnenuTS7n5DVd3Z3ccvef2T3X1CVf1Gknd296un7S9P8ofd/bv7GfOiJBclyZYtW87auXPnYc1t37592bx582EdezTac8tdi57CQm05Nrn9s4ueBRuF9cI8jtb1cubJxy16ChuSv78wD+uFeVgvzMN6OfKdffbZ13b3tuXbNy1iMlX1xUl+N8lruvsN0+bbq+qk7r6tqk5Kcse0/eYkpy45/JQkt+5v3O6+PMnlSbJt27bevn37Yc1v9+7dOdxjj0YX7ti16Cks1MVn3p0X7VnIf0psQNYL8zha18veZ2xf9BQ2JH9/YR7WC/OwXpiH9XL0WsS7yFWSlyf5YHf/ypKXrkpywfT4giRvXLL9/Kp6cFWdluT0JO9eq/kCAAAAcHCL+GfR70jyw0n2VNV7p23/IsllSa6sqmcl+WiSH0iS7r6uqq5M8oHM3oHuOd19z5rPGgAAAID9WvPA1N3/LUkd4OUnHOCYS5NcumqTAgAAAOCwLfRd5AAAAADY+AQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhAhMAAAAAQwQmAAAAAIZsWvQEAACW2rpj14qOt/eyc1d0PAAA7s8ZTAAAAAAMcQbTUWil/2UYAAAAOLo5gwkAAACAIQITAAAAAEMEJgAAAACGuAcTAMAcVuNeht7pDgDY6JzBBAAAAMAQgQkAAACAIQITAAAAAEPcgwkAOKKtxj2TAAC4L2cwAQAAADBEYAIAAABgiMAEAAAAwBCBCQAAAIAhbvINALBgK3Ej8ovPvDsXTuPsvezc4fEAAObhDCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMGTToicAAMDK2rpj14qOt/eyc1d0PADgyOMMJgAAAACGCEwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGCEwAAAAADNm06AkAALC+bd2xa8XH3HvZuSs+JgCwOM5gAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDBCYAAAAAhghMAAAAAAwRmAAAAAAYIjABAAAAMERgAgAAAGCIwAQAAADAEIEJAAAAgCECEwAAAABDNi16AgAAHH227ti1ouPtvezcFR0PAJiPwAQAAIh+AAxxiRwAAAAAQwQmAAAAAIa4RA4AAABgCZcNz09gWudWelEDAAAArDSXyAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEMEJgAAAACGbFr0BAAAYNTWHbtWdLy9l527ouMdjVb6e7IafJ8BVo7ABAAAG9BIwLn4zLtz4QYIQABsHC6RAwAAAGCIM5gAAGAZl9wBwHwEJgAA4KgkJAKsHJfIAQAAADDEGUwAALDKNsI7qgHACGcwAQAAADBEYAIAAABgiMAEAAAAwBD3YAIAAFiHVuPeXd7pDlgtAhMAAACHZaUjmAAGG5fABAAAsAK8W+C4o/GsraPx98yRSWACAAA4ShxOzLj4zLtz4QaOZ8IfrI0Nc5Pvqjqnqq6vqhuqasei5wMAAADAzIY4g6mqjknyG0m+J8nNSf60qq7q7g8sdmYAAACwvqz3e2Ot9/lxeDZEYEryuCQ3dPdHkqSqdiY5L4nABAAAAKtoniC0iEsqXQa5PlR3L3oOD6iqnpbknO7+J9PzH07yrd39z5btd1GSi6anj05y/WF+yhOTfOwwj+XoY70wD+uFeVgvzMN6YR7WC/OwXpiH9XLk+4ru/rLlGzfKGUy1n233K2PdfXmSy4c/WdV7unvb6DgcHawX5mG9MA/rhXlYL8zDemEe1gvzsF6OXhvlJt83Jzl1yfNTkty6oLkAAAAAsMRGCUx/muT0qjqtqh6U5PwkVy14TgAAAABkg1wi1913V9U/S/JHSY5J8oruvm4VP+XwZXYcVawX5mG9MA/rhXlYL8zDemEe1gvzsF6OUhviJt8AAAAArF8b5RI5AAAAANYpgQkAAACAIQLTMlV1TlVdX1U3VNWORc+H9auqTq2qP6mqD1bVdVX1vEXPifWvqo6pqj+vqjctei6sb1V1fFW9vqo+NP05822LnhPrV1X99PSz6P1V9dqqesii58T6UVWvqKo7qur9S7Y9oqqurqoPTx9PWOQcWT8OsF7+w/Tz6H1V9XtVdfwCp8g6sr/1suS151dVV9WJi5gba09gWqKqjknyG0n+UZLHJHl6VT1msbNiHbs7ycXd/XVJHp/kOdYLh+B5ST646EmwIfx6krd099cm+YZYNxxAVZ2c5CeTbOvuMzJ7Q5TzFzsr1plXJjln2bYdSd7W3acnedv0HJL9r5erk5zR3V+f5H8muWStJ8W69crcf72kqk5N8j1JPrrWE2JxBKb7elySG7r7I939+SQ7k5y34DmxTnX3bd39Z9PjT2X2P38nL3ZWrGdVdUqSc5P81qLnwvpWVQ9P8t1JXp4k3f357r5zoZNivduU5Niq2pTkoUluXfB8WEe6+x1JPrFs83lJrpgeX5HkqWs5J9av/a2X7n5rd989Pb0mySlrPjHWpQP8+ZIkv5rkZ5N4V7GjiMB0XycnuWnJ85sjGHAIqmprkm9K8q4FT4X17dcy+0H7hQXPg/XvK5P8dZLfni6p/K2qetiiJ8X61N23JPnlzP6V+LYkd3X3Wxc7KzaALd19WzL7R7Mkj1zwfNg4fizJmxc9CdavqnpKklu6+y8WPRfWlsB0X7WfbYorB1VVm5P8bpKf6u6/WfR8WJ+q6slJ7ujuaxc9FzaETUm+OcnLuvubknw6Ll/hAKZ755yX5LQkj0rysKp65mJnBRyJqurnMrtNxGsWPRfWp6p6aJKfS/KvFz0X1p7AdF83Jzl1yfNT4hRzDqKqvjizuPSa7n7DoufDuvYdSZ5SVXszu/z2H1bVqxc7Jdaxm5Pc3N33nhX5+syCE+zPE5Pc2N1/3d1/m+QNSb59wXNi/bu9qk5KkunjHQueD+tcVV2Q5MlJntHd/hGeA/mqzP7B4y+mv/eekuTPquofLHRWrAmB6b7+NMnpVXVaVT0osxtkXrXgObFOVVVldn+UD3b3ryx6Pqxv3X1Jd5/S3Vsz+7Plj7vbGQbsV3f/VZKbqurR06YnJPnAAqfE+vbRJI+vqodOP5ueEDeF54FdleSC6fEFSd64wLmwzlXVOUlekOQp3f2ZRc+H9au793T3I7t76/T33puTfPP0dxuOcALTEtON6/5Zkj/K7C9mV3b3dYudFevYdyT54czORHnv9Ov7Fj0p4Ijx3CSvqar3JfnGJL+02OmwXk1nur0+yZ8l2ZPZ3+8uX+ikWFeq6rVJ3pnk0VV1c1U9K8llSb6nqj6c2Ts9XbbIObJ+HGC9vCTJlyS5evo7728udJKsGwdYLxylytmNAAAAAIxwBhMAAAAAQwQmAAAAAIYITAAAAAAMEZgAAAAAGCIwAQAAADBEYAIAjipV1VX1n5c831RVf11Vb5pznN1VtW16/IdVdfwKzO3CqnrJ6DgAAGtt06InAACwxj6d5IyqOra7P5vke5LcMjJgd3/fiswMAGCDcgYTAHA0enOSc6fHT0/y2ntfqKqHVdUrqupPq+rPq+q8afuxVbWzqt5XVa9LcuySY/ZW1YnT49+vqmur6rqqumjJPvuq6tKq+ouquqaqthxsglX1yqp6cVX9j6r6SFU9bclrP1tVe6axLpu2feM07vuq6veq6oRp++6q+tWqekdVfbCqvqWq3lBVH66qf7tkzGdW1bur6r1V9Z+q6pjD//ICAEcbgQkAOBrtTHJ+VT0kydcnedeS134uyR9397ckOTvJf6iqhyV5dpLPdPfXJ7k0yVkHGPvHuvusJNuS/GRVfem0/WFJrunub0jyjiQ/fgjzPCnJdyZ5cpJ7Q9I/SvLUJN86jfXvp31fleQF0/z2JHnhknE+393fneQ3k7wxyXOSnJHkwqr60qr6uiQ/lOQ7uvsbk9yT5BmHMD8AgCQukQMAjkLd/b6q2prZ2Ut/uOzl703ylKp6/vT8IUm+PMl3J3nxkuPfd4Dhf7Kq/vH0+NQkpyf5eJLPJ7n3Pk/XZnZp3gP5/e7+QpIPLDnj6YlJfru7PzPN5RNVdVyS47v77dM+VyT5nSXjXDV93JPkuu6+LUmq6iPTHL8zs2D2p1WVzM7OuuMQ5gcAkERgAgCOXlcl+eUk25N86ZLtleT/6u7rl+48hZc+2IBVtT2zAPRt3f2ZqtqdWaBKkr/t7nuPvyeH9vewzy2b170fDzqPg4zzhWVjfmGaRyW5orsvmXNcAIAkLpEDAI5er0jyC929Z9n2P0ry3JqKUlV907T9HZkuG6uqMzK7tG6545J8copLX5vk8asw77cm+bGqeug0l0d0911JPllV3zXt88NJ3n6gAfbjbUmeVlWPvHfMqvqKlZw0AHBkE5gAgKNSd9/c3b++n5d+MckXJ3lfVb1/ep4kL0uyebo07meTvHs/x74lyaZpn19Mcs0qzPstmZ199Z6qem+Sey/luyCz+0W9L8k3JvmFOcb8QJJ/meSt0/FXZ3b/JwCAQ1J/f6Y2AAAAAMzPGUwAAAAADBGYAAAAABgiMAEAAAAwRGACAAAAYIjABAAAAMAQgQkAAACAIQITAAAAAEP+f0hgrbN8OweHAAAAAElFTkSuQmCC\n", 439 | "text/plain": [ 440 | "
" 441 | ] 442 | }, 443 | "metadata": { 444 | "needs_background": "light" 445 | }, 446 | "output_type": "display_data" 447 | } 448 | ], 449 | "source": [ 450 | "%matplotlib inline\n", 451 | "import matplotlib as mpl\n", 452 | "import matplotlib.pyplot as plt\n", 453 | "\n", 454 | "housing['median_income'].hist(bins=50, figsize=(20,15))\n", 455 | "plt.xlabel('Median Income')\n", 456 | "plt.ylabel('Count')\n", 457 | "plt.title('Histogram for the median income')" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "We can see that most of the median income values are clustered around 2 to 5 (i.e.,$20,000–$50,000), but some median incomes go far beyond 6 (i.e., $60,000). It is important to have a sufficient number of instances in your dataset for each stratum, or else the estimate of the stratum's importance may be biased. This means that you should not have too many strata, and each stratum should be large enough." 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "The following code creates an income category attribute by dividing the median income by 1.5 (to limit the number of income categories) and rounding up using ceil (to have discrete categories), and then keeping only the categories lower than 5 and merging the other categories into category 5:" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 23, 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "data": { 481 | "text/plain": [ 482 | "Text(0.5, 1.0, 'Histogram for the median income')" 483 | ] 484 | }, 485 | "execution_count": 23, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | }, 489 | { 490 | "data": { 491 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEWCAYAAACXGLsWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAjY0lEQVR4nO3df5xcdX3v8debhB+RJT8Q3IYkGq7GH4EomjWNRbybkkqqaOittOGiJEpNi9Rqi5XQX2o119xeUQQETQs3QZA1pWJSNNQYWbhtwZhYZA2YS2oChKSJkh9khcYmfPrH+W49mczumZ3dM7PJvp+PxzzmnO853+/3M9+Znc+eH3OOIgIzM7O+HNfsAMzMbOhzsjAzs0JOFmZmVsjJwszMCjlZmJlZIScLMzMr5GRhSNooqb3ZcZRN0qck/VTSv5XUfrukbWW0PdgkhaRXpOkvSvrzEvo4T9KmwW7XmsPJ4hgnaauk2RVlCyT9Y898RJwVEZ0F7UxOXzAjSwq1VJImAVcBUyPilwapzf/6wj2aRcTvRcQnS2j3/0XEqwa7XWsOJwsbEhqQhF4GPBMRu/pb8WhNkGaDycnCDtv6kDRD0npJz0raKemzabUH0vNeSd2S3iTpOEl/JukJSbsk3SZpTK7dy9KyZyT9eUU/H5d0l6TbJT0LLEh9Pyhpr6Qdkm6UdEKuvZD0AUmPS9ov6ZOSXp7qPCtpRX79XL3ZwBrgjBT7slT+zrQLbq+kTkmvqRiTqyU9AvysMmFI6hmPH6Q2fzu37Ko0HjskvTdXfqKkz0h6Mo3tFyWN6uU9WSDpnyR9LsX3Y0m/ksqfSu3Pr7VtSX+c4tku6X0VfS2T9Kk0PU7SPZJ+ImlPmp6YW7czjfs/pffgW5JO6+U1HLZbLo3pRyQ9ImmfpK9KOim3fK6kh9N7+a+S5qTyMyStkrRb0mZJ78/V+bikv02fo/2SuiS9UtI1aYyekvTW3PpjJN2SxuJpZbsmR1SL3ypEhB/H8APYCsyuKFsA/GO1dYAHgfek6RZgZpqeDAQwMlfvfcBm4L+ldb8GfDktmwp0A28GTgA+A/xHrp+Pp/mLyP5pGQVMB2YCI1N/jwEfzvUXwCpgNHAWcABYm/ofAzwKzO9lHNqBbbn5VwI/A34NOB74aHotJ+TG5GFgEjCqlzYDeEVFHweBv0xtvg14DhiXll+X4j8VOAX4e+DTvbS9ILX1XmAE8CngSeALwInAW4H9QEtR28AcYCdwNnAy8JV87MAy4FNp+sXAbwIvSu38LfD1XFydwL+m8RuV5pfUOOZbgXXAGSnOx4DfS8tmAPvS+3EcMAF4dVp2P3ATcBJwDvAT4Pzc5+jfgQvIPje3AVuAP03vwfuBLbkYvg58KY3DS1I8v9vsv9Oj4dH0APwo+Q3O/kC7gb25x3P0niweAD4BnFbRzmSOTBZrgQ/k5l9FlgBGAn8B3Jlb9iLg5xyeLB4oiP3DwN25+QDOzc1vAK7OzV8LXNdLW5VfXH8OrMjNHwc8DbTnxuR9BfFVSxbPV4zRLrIEKLLk9PLcsjflv8gq2l4APJ6bn5b6a82VPZO+PPtsG7iV3Bc62Rd91WRRJY5zgD25+U7gz3LzHwDurXHMtwLvzs3/FfDFNP0l4HNV2pgEHAJOyZV9GliW+xytyS17B9nnfUSaPyW91rFAK9k/GKNy618C3NfIv8mj9eF9scPDRRHx7Z4ZSQuA3+ll3cvJ/jP+kaQtwCci4p5e1j0DeCI3/wRZomhNy57qWRARz0l6pqL+U/kZSa8EPgu0kSWXkWQJIW9nbvr5KvO1Hrw+LPaIeEHSU2T/0VaNr0bPRMTB3PxzZFtdp5O9pg2SepaJbKuhN5WvjYioLKul7TM4fBzz79lhJL0I+BzZ1si4VHyKpBERcSjN588m63l9taqse0aangR8s8r6ZwC7I2J/ruwJss9Ij8ox+Wku1ufTc0tq63hgR26cjqO+93nY8TELO0xEPB4Rl5Btov9v4C5JJ5P9d1ZpO9mB4x4vJdt1shPYAeT3dY8i28VxWHcV8zcDPwKmRMRo4E/IvvTKcFjsyr49JpFtXfQW30D8lOyL66yIGJseYyKiP1+09ba9g+y19XhpH21dRbaF+MvpPXhLKi/rfejxFPDyKuXbgVMlnZIreymHv0/96eMA2VZzzziNjoiz6mhr2HGysMNIerek0yPiBbJdVpDtBvgJ8ALZ8YEedwJ/KOlMSS3A/wK+mv6zvgt4RzooewLZrq2iL5xTgGeBbkmvBq4YrNdVxQrg7ZLOl3Q82ZfkAeCf+9HGTg4fj16l8fxr4HOSXgIgaYKkC/oXdl1tryA7gWBq2nL4WB/NnUKWePZKOrVg3cF0C/De9H4cl+J/dUQ8RfaefFrSSZJeS7b1e0d/O4iIHcC3gGsljU79vFzSfx/UV3KMcrKwSnOAjZK6gc8D8yLi3yPiOWAx8E/p7JyZZPvCv0x2nGML2YHGDwJExMY03UH2n+1+sv33B/ro+yPA/0zr/jXw1cF/eZmI2AS8G7iB7D/zdwDviIif96OZjwPL03j8Vg3rX012EP0hZWeAfZvsv/jB0GvbEbGa7AD4d9I63+mjnevIDlz/FHgIuHeQ4utTRKwjO5j/ObID3ffziy2/S8iOmW0H7gY+FhFr6uzqMrITLh4F9pD9UzO+7sCHEaWDPGalSlsee8l2MW1pcjhm1k/esrDSSHqHpBelYx6fAbrIzogxs6OMk4WVaS7ZroPtwBSyXVrelDU7Cnk3lJmZFfKWhZmZFTpmf5R32mmnxeTJk+uq+7Of/YyTTz55cAMaBI6rfxxX/ziu/jlW49qwYcNPI+L0IxY0+yfkZT2mT58e9brvvvvqrlsmx9U/jqt/HFf/HKtxAeujyneqd0OZmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVmhY/ZyH2ZDVdfT+1iw6BtN6Xvrkrc3pV87+nnLwszMCjlZmJlZodKShaRXSXo493hW0oclnSppjaTH0/O4XJ1rJG2WtCl/I3tJ0yV1pWXXS1JZcZuZ2ZFKSxYRsSkizomIc4DpwHNkN1tfBKyNiCnA2jSPpKnAPOAsYA5wk6QRqbmbgYVkd1ubkpabmVmDNGo31PnAv0bEE2S32lyeypcDF6XpuUBHRByIiC3AZmCGpPHA6Ih4MF0+97ZcHTMza4CG3FZV0q3A9yPiRkl7I2JsbtmeiBgn6UbgoYi4PZXfAqwGtgJLImJ2Kj8PuDoiLqzSz0KyLRBaW1und3R01BVvd3c3LS0tddUtk+Pqn6Ea167d+9j5fHP6njZhTK/Lhup4Oa7+GWhcs2bN2hARbZXlpZ86K+kE4J3ANUWrVimLPsqPLIxYCiwFaGtri/b29toDzens7KTeumVyXP0zVOO64Y6VXNvVnLPWt17a3uuyoTpejqt/yoqrEbuhfp1sq2Jnmt+Zdi2Rnnel8m3ApFy9icD2VD6xSrmZmTVII5LFJcCduflVwPw0PR9YmSufJ+lESWeSHcheFxE7gP2SZqazoC7L1TEzswYodVtY0ouAXwN+N1e8BFgh6XLgSeBigIjYKGkF8ChwELgyIg6lOlcAy4BRZMcxVpcZt5mZHa7UZBERzwEvrih7huzsqGrrLwYWVylfD5xdRoxmZlbMv+A2M7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK+RkYWZmhXynPANg8gDu3HbVtIN13/nNd24zOzp4y8LMzAo5WZiZWSEnCzMzK+RkYWZmhZwszMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK+RkYWZmhZwszMysUKnJQtJYSXdJ+pGkxyS9SdKpktZIejw9j8utf42kzZI2SbogVz5dUldadr0klRm3mZkdruwti88D90bEq4HXAY8Bi4C1ETEFWJvmkTQVmAecBcwBbpI0IrVzM7AQmJIec0qO28zMckpLFpJGA28BbgGIiJ9HxF5gLrA8rbYcuChNzwU6IuJARGwBNgMzJI0HRkfEgxERwG25OmZm1gDKvn9LaFg6B1gKPEq2VbEB+BDwdESMza23JyLGSboReCgibk/ltwCrga3AkoiYncrPA66OiAur9LmQbAuE1tbW6R0dHXXF3t3dTUtLS111y1RmXF1P76u7buso2Pl8fXWnTRhTd79Fhur7uGv3vrrHa6D6Gu+hOl6Oq38GGtesWbM2RERbZXmZd8obCbwB+GBEfFfS50m7nHpR7ThE9FF+ZGHEUrIERVtbW7S3t/cr4B6dnZ3UW7dMZcZV753uILtT3rVd9X2Utl7aXne/RYbq+3jDHSvrHq+B6mu8h+p4Oa7+KSuuMo9ZbAO2RcR30/xdZMljZ9q1RHrelVt/Uq7+RGB7Kp9YpdzMzBqktGQREf8GPCXpVanofLJdUquA+alsPrAyTa8C5kk6UdKZZAey10XEDmC/pJnpLKjLcnXMzKwByt4W/iBwh6QTgB8D7yVLUCskXQ48CVwMEBEbJa0gSygHgSsj4lBq5wpgGTCK7DjG6pLjNjOznFKTRUQ8DBxxoIRsK6Pa+ouBxVXK1wNnD2pwZmZWM/+C28zMCjlZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVsjJwszMCjlZmJlZIScLMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVsjJwszMCjlZmJlZIScLMzMr5GRhZmaFSk0WkrZK6pL0sKT1qexUSWskPZ6ex+XWv0bSZkmbJF2QK5+e2tks6XpJKjNuMzM7XCO2LGZFxDkR0ZbmFwFrI2IKsDbNI2kqMA84C5gD3CRpRKpzM7AQmJIecxoQt5mZJc3YDTUXWJ6mlwMX5co7IuJARGwBNgMzJI0HRkfEgxERwG25OmZm1gDKvn9LalzaAuwBAvhSRCyVtDcixubW2RMR4yTdCDwUEben8luA1cBWYElEzE7l5wFXR8SFVfpbSLYFQmtr6/SOjo664u7u7qalpaWuumUqM66up/fVXbd1FOx8vr660yaMqbvfIkP1fdy1e1/d4zVQfY33UB0vx9U/A41r1qxZG3J7gv7LyAFFVezciNgu6SXAGkk/6mPdaschoo/yIwsjlgJLAdra2qK9vb2f4WY6Ozupt26ZyoxrwaJv1F33qmkHubarvo/S1kvb6+63yFB9H2+4Y2Xd4zVQfY33UB0vx9U/ZcVV6m6oiNienncBdwMzgJ1p1xLpeVdafRswKVd9IrA9lU+sUm5mZg1SWrKQdLKkU3qmgbcCPwRWAfPTavOBlWl6FTBP0omSziQ7kL0uInYA+yXNTGdBXZarY2ZmDVDmtnArcHc6y3Uk8JWIuFfS94AVki4HngQuBoiIjZJWAI8CB4ErI+JQausKYBkwiuw4xuoS4zYzswqlJYuI+DHwuirlzwDn91JnMbC4Svl64OzBjtHMzGrjX3CbmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwK1ZQsJJ1bS5mZmR2bat2yuKHGMjMzOwb1eVtVSW8CfgU4XdIf5RaNBkaUGZiZmQ0dRVsWJwAtZEnllNzjWeBdtXQgaYSkf5F0T5o/VdIaSY+n53G5da+RtFnSJkkX5MqnS+pKy66XpP69TDMzG4g+tywi4n7gfknLIuKJOvv4EPAY2dYIwCJgbUQskbQozV8taSowDzgLOAP4tqRXRsQh4GZgIfAQ8E1gDrC6znjMzKyfaj1mcaKkpZK+Jek7PY+iSpImAm8H/iZXPBdYnqaXAxflyjsi4kBEbAE2AzMkjQdGR8SDERHAbbk6ZmbWAMq+fwtWkn4AfBHYABzqKY+IDQX17gI+Tbbr6iMRcaGkvRExNrfOnogYJ+lG4KGIuD2V30K29bAVWBIRs1P5ecDVEXFhlf4Wkm2B0NraOr2jo6PwtVXT3d1NS0tLXXXLVGZcXU/vq7tu6yjY+Xx9dadNGFN3v0WG6vu4a/e+usdroPoa76E6Xo6rfwYa16xZszZERFtleZ+7oXIORsTN/elQ0oXArojYIKm9lipVyqKP8iMLI5YCSwHa2tqivb2Wbo/U2dlJvXXLVGZcCxZ9o+66V007yLVdtX6UDrf10va6+y0yVN/HG+5YWfd4DVRf4z1Ux8tx9U9ZcdX6if17SR8A7gYO9BRGxO4+6pwLvFPS24CTgNGSbgd2ShofETvSLqZdaf1twKRc/YnA9lQ+sUq5mZk1SK3HLOYDfwz8M9muqA3A+r4qRMQ1ETExIiaTHbj+TkS8G1iV2utpd2WaXgXMk3SipDOBKcC6iNgB7Jc0M50FdVmujpmZNUBNWxYRceYg9rkEWCHpcuBJ4OLUx0ZJK4BHgYPAlelMKIArgGXAKLLjGD4TysysgWpKFpIuq1YeEbfVUj8iOoHONP0McH4v6y0GFlcpXw+cXUtfZmY2+Go9ZvHG3PRJZF/23yc7jdXMzI5xte6G+mB+XtIY4MulRGRmZkNOvZcof47sALSZmQ0DtR6z+Ht+8duGEcBrgBVlBWVmZkNLrccsPpObPgg8ERHbSojHzMyGoFqPWdwvqZVfHOh+vLyQzOxYM3mAVwio9woDW5e8ve5+7XC13invt4B1ZL+J+C3gu5JqukS5mZkd/WrdDfWnwBsjYheApNOBbwN3lRWYmZkNHbWeDXVcT6JInulHXTMzO8rVumVxr6R/AO5M879NdhMiMzMbBoruwf0KoDUi/ljS/wDeTHbJ8AeBOxoQn5mZDQFFu5KuA/YDRMTXIuKPIuIPybYqris3NDMzGyqKksXkiHiksjBd2G9yKRGZmdmQU5QsTupj2ajBDMTMzIauomTxPUnvryxM96Lo8/7bZmZ27Cg6G+rDwN2SLuUXyaENOAH4jRLjMjOzIaTPZBERO4FfkTSLX9x86BsR8Z3SIzMzsyGj1mtD3QfcV3IsZmY2RPlX2GZmVsjJwszMCpWWLCSdJGmdpB9I2ijpE6n8VElrJD2ensfl6lwjabOkTZIuyJVPl9SVll0vSWXFbWZmRypzy+IA8KsR8TrgHGCOpJnAImBtREwB1qZ5JE0F5gFnAXOAmySNSG3dDCwku5XrlLTczMwapLRkEZnuNHt8egQwF1ieypcDF6XpuUBHRByIiC3AZmCGpPHA6Ih4MCICuC1Xx8zMGkDZ929JjWdbBhuAVwBfiIirJe2NiLG5dfZExDhJNwIPRcTtqfwWYDWwFVgSEbNT+XnA1RFxYZX+FpJtgdDa2jq9o6Ojrri7u7tpaWmpq26Zyoyr6+l9dddtHQU7n6+v7rQJY+rut8hQfR937d5X93gNVF/j7c9X/wzVz9dA45o1a9aGiGirLK/1EuV1iYhDwDmSxpL9uO/sPlavdhwi+iiv1t9SYClAW1tbtLe39yveHp2dndRbt0xlxlXvbSshu+3ltV31fZS2Xtped79Fhur7eMMdK+ser4Hqa7z9+eqfofr5KiuuhpwNFRF7gU6yYw07064l0nPPTZW2AZNy1SYC21P5xCrlZmbWIGWeDXV62qJA0ihgNvAjYBUwP602H1iZplcB8ySdKOlMsgPZ6yJiB7Bf0sx0FtRluTpmZtYAZW4LjweWp+MWxwErIuIeSQ8CK9LFCJ8ELgaIiI2SVgCPAgeBK9NuLIArgGVkV7pdnR5mZtYgpSWLdB+M11cpfwY4v5c6i4HFVcrX84trU5mZWYP5F9xmZlbIycLMzAo5WZiZWSEnCzMzK+RkYWZmhZwszMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK9ScO7CYmR3jJg/ghk8DsWzOyaW06y0LMzMr5GRhZmaFnCzMzKyQk4WZmRVysjAzs0JOFmZmVqi0ZCFpkqT7JD0maaOkD6XyUyWtkfR4eh6Xq3ONpM2SNkm6IFc+XVJXWna9JJUVt5mZHanMLYuDwFUR8RpgJnClpKnAImBtREwB1qZ50rJ5wFnAHOAmSSNSWzcDC4Ep6TGnxLjNzKxCackiInZExPfT9H7gMWACMBdYnlZbDlyUpucCHRFxICK2AJuBGZLGA6Mj4sGICOC2XB0zM2uAhhyzkDQZeD3wXaA1InZAllCAl6TVJgBP5aptS2UT0nRluZmZNYiyf9ZL7EBqAe4HFkfE1yTtjYixueV7ImKcpC8AD0bE7an8FuCbwJPApyNidio/D/hoRLyjSl8LyXZX0draOr2jo6OumLu7u2lpaamrbpnKjKvr6X11120dBTufr6/utAlj6u63yFB9H3ft3lf3eA1UX+Ptz1f/FI3XQF7zQJw5ZsSA3sdZs2ZtiIi2yvJSrw0l6Xjg74A7IuJrqXinpPERsSPtYtqVyrcBk3LVJwLbU/nEKuVHiIilwFKAtra2aG9vryvuzs5O6q1bpjLjWjCA69hcNe0g13bV91Haeml73f0WGarv4w13rKx7vAaqr/H256t/isZrIK95IJbNObmU97HMs6EE3AI8FhGfzS1aBcxP0/OBlbnyeZJOlHQm2YHsdWlX1X5JM1Obl+XqmJlZA5T57825wHuALkkPp7I/AZYAKyRdTraL6WKAiNgoaQXwKNmZVFdGxKFU7wpgGTAKWJ0eZmbWIKUli4j4R6C330Oc30udxcDiKuXrgbMHLzozM+sP/4LbzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoWcLMzMrJCThZmZFXKyMDOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGFmZoVKSxaSbpW0S9IPc2WnSloj6fH0PC637BpJmyVtknRBrny6pK607HpJKitmMzOrrswti2XAnIqyRcDaiJgCrE3zSJoKzAPOSnVukjQi1bkZWAhMSY/KNs3MrGSlJYuIeADYXVE8F1ieppcDF+XKOyLiQERsATYDMySNB0ZHxIMREcBtuTpmZtYgyr6DS2pcmgzcExFnp/m9ETE2t3xPRIyTdCPwUETcnspvAVYDW4ElETE7lZ8HXB0RF/bS30KyrRBaW1und3R01BV3d3c3LS0tddUtU5lxdT29r+66raNg5/P11Z02YUzd/RYZqu/jrt376h6vgeprvP356p+i8RrIax6IM8eMGND7OGvWrA0R0VZZPnJAUQ2easchoo/yqiJiKbAUoK2tLdrb2+sKprOzk3rrlqnMuBYs+kbdda+adpBru+r7KG29tL3ufosM1ffxhjtW1j1eA9XXePvz1T9F4zWQ1zwQy+acXMr72OizoXamXUuk512pfBswKbfeRGB7Kp9YpdzMzBqo0cliFTA/Tc8HVubK50k6UdKZZAey10XEDmC/pJnpLKjLcnXMzKxBStsWlnQn0A6cJmkb8DFgCbBC0uXAk8DFABGxUdIK4FHgIHBlRBxKTV1BdmbVKLLjGKvLitnMzKorLVlExCW9LDq/l/UXA4urlK8Hzh7E0Ap1Pb2vKfsbty55e8P7NDOrhX/BbWZmhZwszMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK+RkYWZmhZwszMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK+RkYWZmhZwszMyskJOFmZkVcrIwM7NCR02ykDRH0iZJmyUtanY8ZmbDyVGRLCSNAL4A/DowFbhE0tTmRmVmNnwcFckCmAFsjogfR8TPgQ5gbpNjMjMbNhQRzY6hkKR3AXMi4nfS/HuAX46I369YbyGwMM2+CthUZ5enAT+ts26ZHFf/OK7+cVz9c6zG9bKIOL2ycOQAGmwkVSk7IstFxFJg6YA7k9ZHRNtA2xlsjqt/HFf/OK7+GW5xHS27obYBk3LzE4HtTYrFzGzYOVqSxfeAKZLOlHQCMA9Y1eSYzMyGjaNiN1REHJT0+8A/ACOAWyNiY4ldDnhXVkkcV/84rv5xXP0zrOI6Kg5wm5lZcx0tu6HMzKyJnCzMzKzQsE0Wkm6VtEvSD3tZLknXp8uLPCLpDUMkrnZJ+yQ9nB5/0aC4Jkm6T9JjkjZK+lCVdRo+ZjXG1fAxk3SSpHWSfpDi+kSVdZoxXrXE1ZTPWOp7hKR/kXRPlWVN+ZusIa5m/U1uldSV+lxfZfngjldEDMsH8BbgDcAPe1n+NmA12W88ZgLfHSJxtQP3NGG8xgNvSNOnAP8fmNrsMasxroaPWRqDljR9PPBdYOYQGK9a4mrKZyz1/UfAV6r136y/yRriatbf5FbgtD6WD+p4Ddsti4h4ANjdxypzgdsi8xAwVtL4IRBXU0TEjoj4fpreDzwGTKhYreFjVmNcDZfGoDvNHp8elWeTNGO8aomrKSRNBN4O/E0vqzTlb7KGuIaqQR2vYZssajABeCo3v40h8CWUvCntRlgt6axGdy5pMvB6sv9K85o6Zn3EBU0Ys7Tr4mFgF7AmIobEeNUQFzTnM3Yd8FHghV6WN+vzdR19xwXNGa8AviVpg7JLHVUa1PFysuhdTZcYaYLvk1275XXADcDXG9m5pBbg74APR8SzlYurVGnImBXE1ZQxi4hDEXEO2RUHZkg6u2KVpoxXDXE1fLwkXQjsiogNfa1WpazU8aoxrmb9TZ4bEW8guxr3lZLeUrF8UMfLyaJ3Q/ISIxHxbM9uhIj4JnC8pNMa0bek48m+kO+IiK9VWaUpY1YUVzPHLPW5F+gE5lQsaupnrLe4mjRe5wLvlLSV7KrSvyrp9op1mjFehXE16/MVEdvT8y7gbrKrc+cN6ng5WfRuFXBZOqNgJrAvInY0OyhJvyRJaXoG2Xv4TAP6FXAL8FhEfLaX1Ro+ZrXE1Ywxk3S6pLFpehQwG/hRxWrNGK/CuJoxXhFxTURMjIjJZJfz+U5EvLtitYaPVy1xNenzdbKkU3qmgbcClWdQDup4HRWX+yiDpDvJzmI4TdI24GNkB/uIiC8C3yQ7m2Az8Bzw3iES17uAKyQdBJ4H5kU69aFk5wLvAbrS/m6APwFemoutGWNWS1zNGLPxwHJlN+46DlgREfdI+r1cXM0Yr1riatZn7AhDYLxqiasZ49UK3J1y1EjgKxFxb5nj5ct9mJlZIe+GMjOzQk4WZmZWyMnCzMwKOVmYmVkhJwszMyvkZGHHNEkh6cu5+ZGSfqIqVw8taKdTUlua/mbPbxUGGNsCSTcOtB2zRhi2v7OwYeNnwNmSRkXE88CvAU8PpMGIeNugRGZ2FPGWhQ0Hq8muGgpwCXBnz4L0S9hbJX1P2f0K5qbyUZI6lN0H4KvAqFydrT2Xc5D09XQht435i7lJ6pa0OF1c7iFJrX0FKGmZsnsP/LOkH0t6V27ZR5Xdt+AHkpaksnNSu49IulvSuFTeKelzkh5Qdo+PN0r6mqTHJX0q1+a7ld3X4mFJX0o/0jPrlZOFDQcdwDxJJwGv5fCr0v4p2SUc3gjMAv5PunzCFcBzEfFaYDEwvZe23xcR04E24A8kvTiVnww8lC4u9wDw/hriHA+8GbgQ6EkKvw5cBPxyauuv0rq3AVen+LrIfunf4+cR8Rbgi8BK4ErgbGCBpBdLeg3w22QXojsHOARcWkN8Nox5N5Qd8yLiEWWXL7+E7BIIeW8lu1DcR9L8SWSXCnkLcH2u/iO9NP8Hkn4jTU8CppBdF+jnQM9xkQ1ku7+KfD0iXgAezW2JzAb+b0Q8l2LZLWkMMDYi7k/rLAf+NtfOqvTcBWzsuR6QpB+nGN9Mlvy+ly4XMYrscuVmvXKysOFiFfAZsutuvThXLuA3I2JTfuX0JdrntXAktZN9mb8pIp6T1EmWbAD+I3d9oEPU9rd2oCKunuf+XpOnp50XKtp8IcUhYHlEXNPPdm0Y824oGy5uBf4yIroqyv8B+GDuqqGvT+UPkHbNKLvfw2urtDkG2JMSxavJbl052L4FvE/Si1Isp0bEPmCPpPPSOu8B7u+tgSrWAu+S9JKeNiW9bDCDtmOPk4UNCxGxLSI+X2XRJ8mu6vuIpB+meYCbgZa0++mjwLoqde8FRqZ1Pgk8VELc95JtFa1PV9Xt2V02n+z4yiPAOcBf9qPNR4E/I7vL2iPAGrLjJWa98lVnzcyskLcszMyskJOFmZkVcrIwM7NCThZmZlbIycLMzAo5WZiZWSEnCzMzK/SfVnKB+FJe+roAAAAASUVORK5CYII=\n", 492 | "text/plain": [ 493 | "
" 494 | ] 495 | }, 496 | "metadata": { 497 | "needs_background": "light" 498 | }, 499 | "output_type": "display_data" 500 | } 501 | ], 502 | "source": [ 503 | "housing[\"income_cat\"] = np.ceil(housing[\"median_income\"] / 1.5)\n", 504 | "housing[\"income_cat\"].where(housing[\"income_cat\"] < 5, 5.0, inplace=True)\n", 505 | "housing[\"income_cat\"].hist()\n", 506 | "plt.xlabel('Median Income')\n", 507 | "plt.ylabel('Count')\n", 508 | "plt.title('Histogram for the median income')" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "Now you are ready to do stratified sampling based on the income category. For this, we will use Scikit-Learn's **StratifiedShuffleSplit** class:" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": 24, 521 | "metadata": {}, 522 | "outputs": [], 523 | "source": [ 524 | "from sklearn.model_selection import StratifiedShuffleSplit\n", 525 | "split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)\n", 526 | "\n", 527 | "for train_index, test_index in split.split(housing, housing[\"income_cat\"]):\n", 528 | " strat_train_set = housing.loc[train_index]\n", 529 | " strat_test_set = housing.loc[test_index]" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "Let's see if this worked as expected. You can start by looking at the income category proportions in the stratified test set and compare it with the proportions in the overall dataset:" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 25, 542 | "metadata": { 543 | "scrolled": true 544 | }, 545 | "outputs": [ 546 | { 547 | "data": { 548 | "text/plain": [ 549 | "3.0 0.350533\n", 550 | "2.0 0.318798\n", 551 | "4.0 0.176357\n", 552 | "5.0 0.114583\n", 553 | "1.0 0.039729\n", 554 | "Name: income_cat, dtype: float64" 555 | ] 556 | }, 557 | "execution_count": 25, 558 | "metadata": {}, 559 | "output_type": "execute_result" 560 | } 561 | ], 562 | "source": [ 563 | "strat_test_set[\"income_cat\"].value_counts() / len(strat_test_set)" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": {}, 569 | "source": [ 570 | "You can also measure the income category proportions in the full dataset as the following:" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": 27, 576 | "metadata": {}, 577 | "outputs": [ 578 | { 579 | "data": { 580 | "text/plain": [ 581 | "3.0 0.350581\n", 582 | "2.0 0.318847\n", 583 | "4.0 0.176308\n", 584 | "5.0 0.114438\n", 585 | "1.0 0.039826\n", 586 | "Name: income_cat, dtype: float64" 587 | ] 588 | }, 589 | "execution_count": 27, 590 | "metadata": {}, 591 | "output_type": "execute_result" 592 | } 593 | ], 594 | "source": [ 595 | "housing[\"income_cat\"].value_counts() / len(housing)" 596 | ] 597 | } 598 | ], 599 | "metadata": { 600 | "kernelspec": { 601 | "display_name": "Python 3", 602 | "language": "python", 603 | "name": "python3" 604 | }, 605 | "language_info": { 606 | "codemirror_mode": { 607 | "name": "ipython", 608 | "version": 3 609 | }, 610 | "file_extension": ".py", 611 | "mimetype": "text/x-python", 612 | "name": "python", 613 | "nbconvert_exporter": "python", 614 | "pygments_lexer": "ipython3", 615 | "version": "3.7.9" 616 | } 617 | }, 618 | "nbformat": 4, 619 | "nbformat_minor": 4 620 | } 621 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Practical Machine Learning 2 | Practical machine learning notebook & articles cover the machine learning life cycle. 3 | 4 | [![GitHub license](https://img.shields.io/github/license/youssefHosni/Practical-Machine-Learning.svg)](https://github.com/youssefHosni/Practical-Machine-Learning/blob/master/LICENSE) 5 | [![GitHub contributors](https://img.shields.io/github/contributors/youssefHosni/Practical-Machine-Learning.svg)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/graphs/contributors/) 6 | [![GitHub issues](https://img.shields.io/github/issues/youssefHosni/Practical-Machine-Learning.svg)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/issues/) 7 | [![GitHub pull-requests](https://img.shields.io/github/issues-pr/youssefHosni/Practical-Machine-Learning.svg)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/pulls/) 8 | [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com) 9 | 10 | [![GitHub watchers](https://img.shields.io/github/watchers/youssefHosni/Practical-Machine-Learning.svg?style=social&label=Watch)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/watchers/) 11 | [![GitHub forks](https://img.shields.io/github/forks/youssefHosni/Practical-Machine-Learning.svg?style=social&label=Fork)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/network/) 12 | [![GitHub stars](https://img.shields.io/github/stars/youssefHosni/Practical-Machine-Learning.svg?style=social&label=Star)](https://GitHub.com/youssefHosni/Practical-Machine-Learning/stargazers/) 13 | 14 | 15 | ![alt_text](https://github.com/youssefHosni/Practical-Machine-Learning/blob/main/ml.jpg) 16 | 17 | ## Overview of Machine Learning Project Life Cycle ## 18 | * [End-to-End Machine Learning Workflow [Part 1]](https://medium.com/mlearning-ai/end-to-end-machine-learning-workflow-part-1-b5aa2e3d30e2?sk=2c0fa63e0cd3e09bc9329c1f20c63f1f) 19 | * [End-to-End Machine Learning Workflow [Part 2]](https://medium.com/mlearning-ai/end-to-end-machine-learning-workflow-part-2-e7b6d3fb1d53?sk=06cde2cb868ac46a1dd1e71064b76b05) 20 | 21 | ## Setting Performance Baseline & Success Metrics 22 | * [How Data Science is Helping Businesses Stay Ahead of the Game: 9 Inspiring Use Cases](https://medium.com/geekculture/how-data-science-is-helping-businesses-stay-ahead-of-the-game-9-inspiring-use-cases-9eba7b14c262?sk=f918caf0f682f797b9f6b2f734cf7d73) 23 | * [How to Set Performance Baseline for Your Machine Learning Project Effectively?](https://pub.towardsai.net/how-to-set-performance-baseline-for-your-machine-learning-project-effectively-5be7cffdd68d?sk=51fe014a4f7c7831b2163e4b972477f1) 24 | 25 | ## Data Collection 26 | 27 | ## Data Preprocessing & Feature Engineering 28 | * [How To Split Data Effectively for Your Data Science Project](https://pub.towardsai.net/how-to-split-the-data-effectively-for-your-data-science-project-a9cb6a387b70?sk=7036bbef95e24baeaa2f1a98afa33491) [[Code](https://github.com/youssefHosni/Machine-Learning-Practical-Guide/blob/main/How%20To%20Split%20The%20Data%20Effectively%20for%20Your%20Data%20Science%20Project.ipynb) | [Article](https://pub.towardsai.net/how-to-split-the-data-effectively-for-your-data-science-project-a9cb6a387b70?sk=7036bbef95e24baeaa2f1a98afa33491) | [Kaggle Notebook](https://www.kaggle.com/code/youssef19/how-to-split-the-data-effectively)] 29 | * [Six Reasons Why Your Model Gives Bad Results](https://medium.com/mlearning-ai/six-reasons-why-your-model-give-bad-results-db2804f0da0e?sk=144ae1fe14011ae3a7eb5e8bc0d1f599) 30 | 31 | ## Modeling 32 | * [Brief Guide for Machine Learning Model Selection](https://medium.com/mlearning-ai/brief-guide-for-machine-learning-model-selection-a19a82f8bdcd?sk=f3fe7b646cfbc1b8818e6cd4a61814e5) 33 | 34 | ### Supervised Machine Learning Modeling 35 | 36 | * [Practical Guide to Support Vector Machine in Python](https://pub.towardsai.net/practical-guide-to-support-vector-machines-in-python-dc0e628d50bc?sk=3736c436ed9ec33011b453d852f53746) [[Code](https://github.com/youssefHosni/Machine-Learning-Practical-Guide/blob/main/Practical%20Guide%20to%20Support%20Vector%20Machines%20in%20Python%20.ipynb) | [Article](https://pub.towardsai.net/practical-guide-to-support-vector-machines-in-python-dc0e628d50bc?sk=3736c436ed9ec33011b453d852f53746)] 37 | * [Practical Guide to Boosting Algorithms In Machine Learning](https://pub.towardsai.net/practical-guide-to-boosting-algorithms-in-machine-learning-61c023107e12?sk=4924d002b480475afec71c900ab3b469) [[Code]() | [Article](https://pub.towardsai.net/practical-guide-to-boosting-algorithms-in-machine-learning-61c023107e12?sk=4924d002b480475afec71c900ab3b469)] 38 | 39 | ### Unsupervised Machine Learning Modeling 40 | * [Overview of Unsupervised Machine Learning Tasks & Applications](https://pub.towardsai.net/overview-of-unsupervised-machine-learning-tasks-applications-139db2239e2c?sk=26aa82893548ddc3c2916d4ee3c91d65) 41 | * [Practical Guide to Dimesnioality Reduction in Python]() [[Code](https://github.com/youssefHosni/Practical-Guide-to-ML-DL-Concepts/blob/main/practical-guide-to-dimesnioality-reduction-in-pyth.ipynb) | [Article](https://medium.com/mlearning-ai/practical-guide-to-dimesnioality-reduction-in-python-9da6c84ad8ee?sk=ba37d536c5b52d79d7df19064639d4a4)] 42 | * [How to Find the Optimal Number of Clusters Effectively]() [ [Code](https://github.com/youssefHosni/Machine-Learning-Practical-Guide/blob/main/How%20to%20Find%20the%20Optimal%20Number%20of%20Clusters%20Effectively.ipynb) | [Article](https://pub.towardsai.net/stop-using-elbow-diagram-to-find-best-k-value-and-use-this-instead-568b13d77561?sk=d9456c70a04d6d5b020da45dcad5024f) | [Kaggle Notebook](https://www.kaggle.com/code/youssef19/finding-the-optimal-number-of-clusters-effectively) ] 43 | 44 | 45 | ### Deep Learning Modeling 46 | * [Maximizing the Impact of Data Augmentation: Effective Techniques and Best Practices](https://pub.towardsai.net/maximizing-the-impact-of-data-augmentation-effective-techniques-and-best-practices-c4cad9cd16e4?sk=c91290c8d4d69ad8df051818262ad015) 47 | * [Building Complex Models Using Keras Functional API](https://pub.towardsai.net/building-complex-deep-learning-models-using-keras-functional-api-38090f4769a4?sk=85e11759a720c074c7bab9cc1b5d1d06) [[Code](https://github.com/youssefHosni/Machine-Learning-Practical-Guide/blob/main/Building_Complex_Deep_Learning_Models_Using_Keras_Functional_API.ipynb) | [Article](https://pub.towardsai.net/building-complex-deep-learning-models-using-keras-functional-api-38090f4769a4?sk=85e11759a720c074c7bab9cc1b5d1d06) | [Kaggle Notebook](https://www.kaggle.com/code/youssef19/building-complex-network-with-keras-functional-api)] 48 | * [A Quick Setup for Neural Networks Hyperparameters for Best Results](https://pub.towardsai.net/a-quick-setup-for-neural-networks-hyperparameters-for-best-results-3a5a446abb3a?sk=9c9f6bf03b6895dcd0112a34158a2785) 49 | * [Building A Recurrent Neural Network From Scratch In Python](https://pub.towardsai.net/building-a-recurrent-neural-network-from-scratch-in-python-3ad244b1054f?sk=3fcfd18bbb18fd280826c64b547f130e) 50 | 51 | ## Model Evaluation 52 | [Why Should You Not Completely Trust In Test Accuracy?](https://pub.towardsai.net/why-should-you-not-completely-trust-in-test-accuracy-b4a80398c599?sk=6e369f1328757e79052f8b389cb2adb5) 53 | 54 | ## Machine Learning Explainability 55 | * [Machine Learning Models Are No Longer A Black Box](https://medium.com/mlearning-ai/4-methods-to-unbox-the-machine-learning-models-black-box-8358a8bce3a6?sk=7c3f175a08a3f521b1cc77e9e9e429a3) 56 | 57 | ## MLOps & Model Deployment 58 | * Step-by-Step Guide on Deploying Yolo3 Model on Fast API [Article](https://pub.towardsai.net/step-by-step-guide-on-deploying-yolo-model-on-fast-api-fcc6b60f5c26?sk=3ec77d08f4ff915cadcda7f0f474fc0b) | [Code]() 59 | * [Common Machine Learning Deployment Patterns & Their Applications](https://pub.towardsai.net/common-machine-learning-deployment-patterns-their-applications-84ae9afc5b37?sk=5364822167bd9012ab360498572caf9a) 60 | * [Key Challenges of Machine Learning Model Deployment](https://pub.towardsai.net/key-challenges-of-machine-learning-model-deployment-c48768d0e7c8?sk=5823a710321aa7122af5454c4eb4073a) 61 | * [From Detection to Correction: How to Keep Your Production Data Clean and Reliable](https://pub.towardsai.net/from-detection-to-correction-how-to-keep-your-production-data-clean-and-reliable-6dddb72c3ab5?sk=4aee6335d3a5478b08af0b4f49d4fc99) 62 | * [A Comprehensive Introduction to Machine Learning Experiment Tracking](https://pub.towardsai.net/a-comprehensive-introduction-to-machine-learning-experiment-tracking-3ef2cfb2c783?sk=bd961a1c0984266d195bdcb49e356545) 63 | 64 | -------------------------------------------------------------------------------- /ml.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/youssefHosni/Practical-Machine-Learning/2d82175f1f3eee7c653555f10955331b2436be63/ml.jpg --------------------------------------------------------------------------------