├── .DS_Store ├── Classification Week 1 - Logistic Regression ├── .DS_Store └── Yun-week-1-logistic-regression-assignment-1.ipynb ├── Classification Week 2 - Logistic Regression ├── .DS_Store ├── Assignment 1 │ ├── .DS_Store │ └── Yun-week-2-logistic-regression-assignment-1.ipynb └── Assignment 2 │ ├── .DS_Store │ └── Yun-week-2-logistic-regression-assignment-2.ipynb ├── Classification Week 3 - Decision Tree ├── .DS_Store ├── Assignment 1 │ ├── .DS_Store │ └── Yun-module-5-decision-tree-assignment-1.ipynb └── Assignment 2 │ ├── .DS_Store │ └── Yun-module-5-decision-tree-assignment-2.ipynb ├── Classification Week 4 - Decision Trees in Practice - overfitting ├── .DS_Store └── Yun-module-6-decision-tree-practical-assignment-1.ipynb ├── Classification Week 6 - Precision and Recall ├── .DS_Store ├── Quize.ipynb ├── Yun-module-9-Precision and Recall-assignment-1.ipynb ├── Yun-module-9-Precision and Recall-assignment-2.ipynb ├── module-9-assignment-test-idx.json ├── module-9-assignment-train-idx.json └── module-9-precision-recall-assignment-blank.ipynb ├── Classification Week 7 - Large Dataset ├── .DS_Store ├── Yun-module-10-online-learning-assignment.ipynb ├── _1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-train-idx.json.zip ├── _1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-validation-idx.json.zip ├── _35bdebdff61378878ea2247780005e52_important_words.json.zip ├── _35bdebdff61378878ea2247780005e52_module-10-online-learning-assignment-blank.ipynb.zip ├── _559847710f6045b9f5668d6635969ff4_amazon_baby_subset.csv.zip ├── amazon_baby_subset.csv ├── important_words.json ├── module-10-assignment-train-idx.json ├── module-10-assignment-validation-idx.json └── module-10-online-learning-assignment-blank.ipynb ├── Classsification Week 5 - Boosting ├── .DS_Store ├── Assignment 1 │ ├── .DS_Store │ └── Yun-module-8-boosting-assignment-1.ipynb └── Assignment 2 │ ├── .DS_Store │ └── Yun-module-8-boosting-assignment-2.ipynb ├── Clustering Week 2 - KNN Neighbors ├── .DS_Store ├── Assignment 1 │ ├── .DS_Store │ └── Yun_nearest-neighbors-features-and-metrics_assignment-1.ipynb ├── Assignment 2 │ ├── .DS_Store │ └── Yun_1_nearest-neighbors-lsh-implementation_Assignment2.ipynb └── Quiz 1.ipynb ├── Clustering Week 3- Kmeans ├── .DS_Store └── Yun-Clustering-Week3_kmeans-with-text-data_Assignment-1.ipynb ├── Clustering Week 4- EM for Gaussian mixtures ├── .DS_Store ├── Assignment 1 │ ├── .DS_Store │ └── Yun-Clustering_Week 4_EM-for-GMM_Assignment-1.ipynb └── Assignment 2 │ ├── .DS_Store │ └── Yun-Week4-em-with-text-data_Assignment-2.ipynb ├── Clustering Week 5 - Mixed Membership Modeling via Latent Dirichlet Allocation ├── .DS_Store └── Yun_5_lda_Assignment.ipynb ├── Clustering Week 6 - Hierarchical Clustering & Closing Remarks ├── .DS_Store └── Yun_6_hierarchical_clustering.ipynb ├── Machine Learning Case Study Week 2 - Regression ├── .DS_Store └── Yun_Predicting house prices.ipynb ├── Machine Learning Case Study Week 3 - Classification ├── .DS_Store └── Yun_Analyzing product sentiment.ipynb ├── Machine Learning Case Study Week 4 - Clustering and Retrieving ├── .DS_Store └── Yun_Document retrieval.ipynb ├── Machine Learning Case Study Week 5 - Recommender System ├── .DS_Store ├── Quiz.ipynb └── Yun_Song recommender.ipynb ├── Machine Learning Case Study Week 6 - Deep Learning, Searching for Images ├── .DS_Store └── Yun_Deep Features for Image Retrieval.ipynb ├── README.md ├── Regression Week 1 - Simple Linear ├── .DS_Store └── Yun-Regression_Week1_SimpleRegression-Assignment.ipynb ├── Regression Week 2 - Multiple Regression ├── .DS_Store └── Yun-ScikitLearn-week-2-multiple-regression-assignment-2.ipynb ├── Regression Week 3 - Accessing Performance ├── .DS_Store └── Yun-Regression-Week3-Polynomial-Assignment.ipynb ├── Regression Week 4 - Ridge ├── .DS_Store ├── Yun-week-4-ridge-regression-assignment-1.ipynb └── Yun-week-4-ridge-regression-assignment-2.ipynb ├── Regression Week 5 - Lasso ├── .DS_Store ├── W5 - Assigment 1 │ ├── .DS_Store │ └── Yun-week-5-ridge-regression-assignment-1.ipynb └── W5 - Assignment 2 │ ├── .DS_Store │ └── Yun-week-5-lasso-regression-assignment-2.ipynb └── Regression Week 6 - KNN and Kernal Regression ├── .DS_Store └── Yun-week-6-knn-kernal-regression-assignment-1.ipynb /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/.DS_Store -------------------------------------------------------------------------------- /Classification Week 1 - Logistic Regression/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 1 - Logistic Regression/.DS_Store -------------------------------------------------------------------------------- /Classification Week 2 - Logistic Regression/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 2 - Logistic Regression/.DS_Store -------------------------------------------------------------------------------- /Classification Week 2 - Logistic Regression/Assignment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 2 - Logistic Regression/Assignment 1/.DS_Store -------------------------------------------------------------------------------- /Classification Week 2 - Logistic Regression/Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 2 - Logistic Regression/Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Classification Week 3 - Decision Tree/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 3 - Decision Tree/.DS_Store -------------------------------------------------------------------------------- /Classification Week 3 - Decision Tree/Assignment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 3 - Decision Tree/Assignment 1/.DS_Store -------------------------------------------------------------------------------- /Classification Week 3 - Decision Tree/Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 3 - Decision Tree/Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Classification Week 4 - Decision Trees in Practice - overfitting/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 4 - Decision Trees in Practice - overfitting/.DS_Store -------------------------------------------------------------------------------- /Classification Week 6 - Precision and Recall/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 6 - Precision and Recall/.DS_Store -------------------------------------------------------------------------------- /Classification Week 6 - Precision and Recall/Quize.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import math " 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 4, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "name": "stdout", 21 | "output_type": "stream", 22 | "text": [ 23 | "2.6390573296152584\n" 24 | ] 25 | } 26 | ], 27 | "source": [ 28 | "# test\n", 29 | "print (math.log(14)) " 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "name": "stdout", 39 | "output_type": "stream", 40 | "text": [ 41 | "2.1972245773362196\n" 42 | ] 43 | } 44 | ], 45 | "source": [ 46 | "print (math.log(9)) " 47 | ] 48 | } 49 | ], 50 | "metadata": { 51 | "kernelspec": { 52 | "display_name": "Python 3", 53 | "language": "python", 54 | "name": "python3" 55 | }, 56 | "language_info": { 57 | "codemirror_mode": { 58 | "name": "ipython", 59 | "version": 3 60 | }, 61 | "file_extension": ".py", 62 | "mimetype": "text/x-python", 63 | "name": "python", 64 | "nbconvert_exporter": "python", 65 | "pygments_lexer": "ipython3", 66 | "version": "3.6.3" 67 | } 68 | }, 69 | "nbformat": 4, 70 | "nbformat_minor": 2 71 | } 72 | -------------------------------------------------------------------------------- /Classification Week 6 - Precision and Recall/Yun-module-9-Precision and Recall-assignment-1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 2 6 | } 7 | -------------------------------------------------------------------------------- /Classification Week 6 - Precision and Recall/module-9-precision-recall-assignment-blank.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Exploring precision and recall\n", 8 | "\n", 9 | "The goal of this second notebook is to understand precision-recall in the context of classifiers.\n", 10 | "\n", 11 | " * Use Amazon review data in its entirety.\n", 12 | " * Train a logistic regression model.\n", 13 | " * Explore various evaluation metrics: accuracy, confusion matrix, precision, recall.\n", 14 | " * Explore how various metrics can be combined to produce a cost of making an error.\n", 15 | " * Explore precision and recall curves.\n", 16 | " \n", 17 | "Because we are using the full Amazon review dataset (not a subset of words or reviews), in this assignment we return to using GraphLab Create for its efficiency. As usual, let's start by **firing up GraphLab Create**.\n", 18 | "\n", 19 | "Make sure you have the latest version of GraphLab Create (1.8.3 or later). If you don't find the decision tree module, then you would need to upgrade graphlab-create using\n", 20 | "\n", 21 | "```\n", 22 | " pip install graphlab-create --upgrade\n", 23 | "```\n", 24 | "See [this page](https://dato.com/download/) for detailed instructions on upgrading." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "import graphlab\n", 34 | "from __future__ import division\n", 35 | "import numpy as np\n", 36 | "graphlab.canvas.set_target('ipynb')" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "# Load amazon review dataset" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "products = graphlab.SFrame('amazon_baby.gl/')" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "# Extract word counts and sentiments" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "As in the first assignment of this course, we compute the word counts for individual words and extract positive and negative sentiments from ratings. To summarize, we perform the following:\n", 67 | "\n", 68 | "1. Remove punctuation.\n", 69 | "2. Remove reviews with \"neutral\" sentiment (rating 3).\n", 70 | "3. Set reviews with rating 4 or more to be positive and those with 2 or less to be negative." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "collapsed": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "def remove_punctuation(text):\n", 82 | " import string\n", 83 | " return text.translate(None, string.punctuation) \n", 84 | "\n", 85 | "# Remove punctuation.\n", 86 | "review_clean = products['review'].apply(remove_punctuation)\n", 87 | "\n", 88 | "# Count words\n", 89 | "products['word_count'] = graphlab.text_analytics.count_words(review_clean)\n", 90 | "\n", 91 | "# Drop neutral sentiment reviews.\n", 92 | "products = products[products['rating'] != 3]\n", 93 | "\n", 94 | "# Positive sentiment to +1 and negative sentiment to -1\n", 95 | "products['sentiment'] = products['rating'].apply(lambda rating : +1 if rating > 3 else -1)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": { 102 | "collapsed": true 103 | }, 104 | "outputs": [], 105 | "source": [] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "Now, let's remember what the dataset looks like by taking a quick peek:" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "products" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "## Split data into training and test sets\n", 128 | "\n", 129 | "We split the data into a 80-20 split where 80% is in the training set and 20% is in the test set." 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": { 136 | "collapsed": true 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "train_data, test_data = products.random_split(.8, seed=1)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "## Train a logistic regression classifier\n", 148 | "\n", 149 | "We will now train a logistic regression classifier with **sentiment** as the target and **word_count** as the features. We will set `validation_set=None` to make sure everyone gets exactly the same results. \n", 150 | "\n", 151 | "Remember, even though we now know how to implement logistic regression, we will use GraphLab Create for its efficiency at processing this Amazon dataset in its entirety. The focus of this assignment is instead on the topic of precision and recall." 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": { 158 | "scrolled": true 159 | }, 160 | "outputs": [], 161 | "source": [ 162 | "model = graphlab.logistic_classifier.create(train_data, target='sentiment',\n", 163 | " features=['word_count'],\n", 164 | " validation_set=None)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "# Model Evaluation" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "We will explore the advanced model evaluation concepts that were discussed in the lectures.\n", 179 | "\n", 180 | "## Accuracy\n", 181 | "\n", 182 | "One performance metric we will use for our more advanced exploration is accuracy, which we have seen many times in past assignments. Recall that the accuracy is given by\n", 183 | "\n", 184 | "$$\n", 185 | "\\mbox{accuracy} = \\frac{\\mbox{# correctly classified data points}}{\\mbox{# total data points}}\n", 186 | "$$\n", 187 | "\n", 188 | "To obtain the accuracy of our trained models using GraphLab Create, simply pass the option `metric='accuracy'` to the `evaluate` function. We compute the **accuracy** of our logistic regression model on the **test_data** as follows:" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "accuracy= model.evaluate(test_data, metric='accuracy')['accuracy']\n", 198 | "print \"Test Accuracy: %s\" % accuracy" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "## Baseline: Majority class prediction\n", 206 | "\n", 207 | "Recall from an earlier assignment that we used the **majority class classifier** as a baseline (i.e reference) model for a point of comparison with a more sophisticated classifier. The majority classifier model predicts the majority class for all data points. \n", 208 | "\n", 209 | "Typically, a good model should beat the majority class classifier. Since the majority class in this dataset is the positive class (i.e., there are more positive than negative reviews), the accuracy of the majority class classifier can be computed as follows:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "baseline = len(test_data[test_data['sentiment'] == 1])/len(test_data)\n", 219 | "print \"Baseline accuracy (majority class classifier): %s\" % baseline" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "** Quiz Question:** Using accuracy as the evaluation metric, was our **logistic regression model** better than the baseline (majority class classifier)?" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "## Confusion Matrix\n", 234 | "\n", 235 | "The accuracy, while convenient, does not tell the whole story. For a fuller picture, we turn to the **confusion matrix**. In the case of binary classification, the confusion matrix is a 2-by-2 matrix laying out correct and incorrect predictions made in each label as follows:\n", 236 | "```\n", 237 | " +---------------------------------------------+\n", 238 | " | Predicted label |\n", 239 | " +----------------------+----------------------+\n", 240 | " | (+1) | (-1) |\n", 241 | "+-------+-----+----------------------+----------------------+\n", 242 | "| True |(+1) | # of true positives | # of false negatives |\n", 243 | "| label +-----+----------------------+----------------------+\n", 244 | "| |(-1) | # of false positives | # of true negatives |\n", 245 | "+-------+-----+----------------------+----------------------+\n", 246 | "```\n", 247 | "To print out the confusion matrix for a classifier, use `metric='confusion_matrix'`:" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "confusion_matrix = model.evaluate(test_data, metric='confusion_matrix')['confusion_matrix']\n", 257 | "confusion_matrix" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "**Quiz Question**: How many predicted values in the **test set** are **false positives**?" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "## Computing the cost of mistakes\n", 279 | "\n", 280 | "\n", 281 | "Put yourself in the shoes of a manufacturer that sells a baby product on Amazon.com and you want to monitor your product's reviews in order to respond to complaints. Even a few negative reviews may generate a lot of bad publicity about the product. So you don't want to miss any reviews with negative sentiments --- you'd rather put up with false alarms about potentially negative reviews instead of missing negative reviews entirely. In other words, **false positives cost more than false negatives**. (It may be the other way around for other scenarios, but let's stick with the manufacturer's scenario for now.)\n", 282 | "\n", 283 | "Suppose you know the costs involved in each kind of mistake: \n", 284 | "1. \\$100 for each false positive.\n", 285 | "2. \\$1 for each false negative.\n", 286 | "3. Correctly classified reviews incur no cost.\n", 287 | "\n", 288 | "**Quiz Question**: Given the stipulation, what is the cost associated with the logistic regression classifier's performance on the **test set**?" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "## Precision and Recall" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "You may not have exact dollar amounts for each kind of mistake. Instead, you may simply prefer to reduce the percentage of false positives to be less than, say, 3.5% of all positive predictions. This is where **precision** comes in:\n", 310 | "\n", 311 | "$$\n", 312 | "[\\text{precision}] = \\frac{[\\text{# positive data points with positive predicitions}]}{\\text{[# all data points with positive predictions]}} = \\frac{[\\text{# true positives}]}{[\\text{# true positives}] + [\\text{# false positives}]}\n", 313 | "$$" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "So to keep the percentage of false positives below 3.5% of positive predictions, we must raise the precision to 96.5% or higher. \n", 321 | "\n", 322 | "**First**, let us compute the precision of the logistic regression classifier on the **test_data**." 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": {}, 329 | "outputs": [], 330 | "source": [ 331 | "precision = model.evaluate(test_data, metric='precision')['precision']\n", 332 | "print \"Precision on test data: %s\" % precision" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "**Quiz Question**: Out of all reviews in the **test set** that are predicted to be positive, what fraction of them are **false positives**? (Round to the second decimal place e.g. 0.25)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "metadata": {}, 346 | "outputs": [], 347 | "source": [] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "**Quiz Question:** Based on what we learned in lecture, if we wanted to reduce this fraction of false positives to be below 3.5%, we would: (see the quiz)" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "A complementary metric is **recall**, which measures the ratio between the number of true positives and that of (ground-truth) positive reviews:\n", 361 | "\n", 362 | "$$\n", 363 | "[\\text{recall}] = \\frac{[\\text{# positive data points with positive predicitions}]}{\\text{[# all positive data points]}} = \\frac{[\\text{# true positives}]}{[\\text{# true positives}] + [\\text{# false negatives}]}\n", 364 | "$$\n", 365 | "\n", 366 | "Let us compute the recall on the **test_data**." 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": null, 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "recall = model.evaluate(test_data, metric='recall')['recall']\n", 376 | "print \"Recall on test data: %s\" % recall" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "**Quiz Question**: What fraction of the positive reviews in the **test_set** were correctly predicted as positive by the classifier?\n", 384 | "\n", 385 | "**Quiz Question**: What is the recall value for a classifier that predicts **+1** for all data points in the **test_data**?" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": { 391 | "collapsed": true 392 | }, 393 | "source": [ 394 | "# Precision-recall tradeoff\n", 395 | "\n", 396 | "In this part, we will explore the trade-off between precision and recall discussed in the lecture. We first examine what happens when we use a different threshold value for making class predictions. We then explore a range of threshold values and plot the associated precision-recall curve. \n" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "## Varying the threshold\n", 404 | "\n", 405 | "False positives are costly in our example, so we may want to be more conservative about making positive predictions. To achieve this, instead of thresholding class probabilities at 0.5, we can choose a higher threshold. \n", 406 | "\n", 407 | "Write a function called `apply_threshold` that accepts two things\n", 408 | "* `probabilities` (an SArray of probability values)\n", 409 | "* `threshold` (a float between 0 and 1).\n", 410 | "\n", 411 | "The function should return an SArray, where each element is set to +1 or -1 depending whether the corresponding probability exceeds `threshold`." 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": null, 417 | "metadata": {}, 418 | "outputs": [], 419 | "source": [ 420 | "def apply_threshold(probabilities, threshold):\n", 421 | " ### YOUR CODE GOES HERE\n", 422 | " # +1 if >= threshold and -1 otherwise.\n", 423 | " ... " 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "Run prediction with `output_type='probability'` to get the list of probability values. Then use thresholds set at 0.5 (default) and 0.9 to make predictions from these probability values." 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": null, 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "probabilities = model.predict(test_data, output_type='probability')\n", 440 | "predictions_with_default_threshold = apply_threshold(probabilities, 0.5)\n", 441 | "predictions_with_high_threshold = apply_threshold(probabilities, 0.9)" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "print \"Number of positive predicted reviews (threshold = 0.5): %s\" % (predictions_with_default_threshold == 1).sum()" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "print \"Number of positive predicted reviews (threshold = 0.9): %s\" % (predictions_with_high_threshold == 1).sum()" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "**Quiz Question**: What happens to the number of positive predicted reviews as the threshold increased from 0.5 to 0.9?" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "## Exploring the associated precision and recall as the threshold varies" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "By changing the probability threshold, it is possible to influence precision and recall. We can explore this as follows:" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": null, 486 | "metadata": { 487 | "collapsed": true 488 | }, 489 | "outputs": [], 490 | "source": [ 491 | "# Threshold = 0.5\n", 492 | "precision_with_default_threshold = graphlab.evaluation.precision(test_data['sentiment'],\n", 493 | " predictions_with_default_threshold)\n", 494 | "\n", 495 | "recall_with_default_threshold = graphlab.evaluation.recall(test_data['sentiment'],\n", 496 | " predictions_with_default_threshold)\n", 497 | "\n", 498 | "# Threshold = 0.9\n", 499 | "precision_with_high_threshold = graphlab.evaluation.precision(test_data['sentiment'],\n", 500 | " predictions_with_high_threshold)\n", 501 | "recall_with_high_threshold = graphlab.evaluation.recall(test_data['sentiment'],\n", 502 | " predictions_with_high_threshold)" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [ 511 | "print \"Precision (threshold = 0.5): %s\" % precision_with_default_threshold\n", 512 | "print \"Recall (threshold = 0.5) : %s\" % recall_with_default_threshold" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "print \"Precision (threshold = 0.9): %s\" % precision_with_high_threshold\n", 522 | "print \"Recall (threshold = 0.9) : %s\" % recall_with_high_threshold" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "**Quiz Question (variant 1)**: Does the **precision** increase with a higher threshold?\n", 530 | "\n", 531 | "**Quiz Question (variant 2)**: Does the **recall** increase with a higher threshold?" 532 | ] 533 | }, 534 | { 535 | "cell_type": "markdown", 536 | "metadata": {}, 537 | "source": [ 538 | "## Precision-recall curve\n", 539 | "\n", 540 | "Now, we will explore various different values of tresholds, compute the precision and recall scores, and then plot the precision-recall curve." 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": null, 546 | "metadata": {}, 547 | "outputs": [], 548 | "source": [ 549 | "threshold_values = np.linspace(0.5, 1, num=100)\n", 550 | "print threshold_values" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "For each of the values of threshold, we compute the precision and recall scores." 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": null, 563 | "metadata": {}, 564 | "outputs": [], 565 | "source": [ 566 | "precision_all = []\n", 567 | "recall_all = []\n", 568 | "\n", 569 | "probabilities = model.predict(test_data, output_type='probability')\n", 570 | "for threshold in threshold_values:\n", 571 | " predictions = apply_threshold(probabilities, threshold)\n", 572 | " \n", 573 | " precision = graphlab.evaluation.precision(test_data['sentiment'], predictions)\n", 574 | " recall = graphlab.evaluation.recall(test_data['sentiment'], predictions)\n", 575 | " \n", 576 | " precision_all.append(precision)\n", 577 | " recall_all.append(recall)" 578 | ] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "Now, let's plot the precision-recall curve to visualize the precision-recall tradeoff as we vary the threshold." 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "import matplotlib.pyplot as plt\n", 594 | "%matplotlib inline\n", 595 | "\n", 596 | "def plot_pr_curve(precision, recall, title):\n", 597 | " plt.rcParams['figure.figsize'] = 7, 5\n", 598 | " plt.locator_params(axis = 'x', nbins = 5)\n", 599 | " plt.plot(precision, recall, 'b-', linewidth=4.0, color = '#B0017F')\n", 600 | " plt.title(title)\n", 601 | " plt.xlabel('Precision')\n", 602 | " plt.ylabel('Recall')\n", 603 | " plt.rcParams.update({'font.size': 16})\n", 604 | " \n", 605 | "plot_pr_curve(precision_all, recall_all, 'Precision recall curve (all)')" 606 | ] 607 | }, 608 | { 609 | "cell_type": "markdown", 610 | "metadata": {}, 611 | "source": [ 612 | "**Quiz Question**: Among all the threshold values tried, what is the **smallest** threshold value that achieves a precision of 96.5% or better? Round your answer to 3 decimal places." 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": null, 618 | "metadata": {}, 619 | "outputs": [], 620 | "source": [] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "**Quiz Question**: Using `threshold` = 0.98, how many **false negatives** do we get on the **test_data**? (**Hint**: You may use the `graphlab.evaluation.confusion_matrix` function implemented in GraphLab Create.)" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": null, 632 | "metadata": {}, 633 | "outputs": [], 634 | "source": [] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "metadata": {}, 639 | "source": [ 640 | "This is the number of false negatives (i.e the number of reviews to look at when not needed) that we have to deal with using this classifier." 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": {}, 646 | "source": [ 647 | "# Evaluating specific search terms" 648 | ] 649 | }, 650 | { 651 | "cell_type": "markdown", 652 | "metadata": {}, 653 | "source": [ 654 | "So far, we looked at the number of false positives for the **entire test set**. In this section, let's select reviews using a specific search term and optimize the precision on these reviews only. After all, a manufacturer would be interested in tuning the false positive rate just for their products (the reviews they want to read) rather than that of the entire set of products on Amazon.\n", 655 | "\n", 656 | "## Precision-Recall on all baby related items\n", 657 | "\n", 658 | "From the **test set**, select all the reviews for all products with the word 'baby' in them." 659 | ] 660 | }, 661 | { 662 | "cell_type": "code", 663 | "execution_count": null, 664 | "metadata": {}, 665 | "outputs": [], 666 | "source": [ 667 | "baby_reviews = test_data[test_data['name'].apply(lambda x: 'baby' in x.lower())]" 668 | ] 669 | }, 670 | { 671 | "cell_type": "markdown", 672 | "metadata": {}, 673 | "source": [ 674 | "Now, let's predict the probability of classifying these reviews as positive:" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": {}, 681 | "outputs": [], 682 | "source": [ 683 | "probabilities = model.predict(baby_reviews, output_type='probability')" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "Let's plot the precision-recall curve for the **baby_reviews** dataset.\n", 691 | "\n", 692 | "**First**, let's consider the following `threshold_values` ranging from 0.5 to 1:" 693 | ] 694 | }, 695 | { 696 | "cell_type": "code", 697 | "execution_count": null, 698 | "metadata": { 699 | "collapsed": true 700 | }, 701 | "outputs": [], 702 | "source": [ 703 | "threshold_values = np.linspace(0.5, 1, num=100)" 704 | ] 705 | }, 706 | { 707 | "cell_type": "markdown", 708 | "metadata": {}, 709 | "source": [ 710 | "**Second**, as we did above, let's compute precision and recall for each value in `threshold_values` on the **baby_reviews** dataset. Complete the code block below." 711 | ] 712 | }, 713 | { 714 | "cell_type": "code", 715 | "execution_count": null, 716 | "metadata": {}, 717 | "outputs": [], 718 | "source": [ 719 | "precision_all = []\n", 720 | "recall_all = []\n", 721 | "\n", 722 | "for threshold in threshold_values:\n", 723 | " \n", 724 | " # Make predictions. Use the `apply_threshold` function \n", 725 | " ## YOUR CODE HERE \n", 726 | " predictions = ...\n", 727 | "\n", 728 | " # Calculate the precision.\n", 729 | " # YOUR CODE HERE\n", 730 | " precision = ...\n", 731 | " \n", 732 | " # YOUR CODE HERE\n", 733 | " recall = ...\n", 734 | " \n", 735 | " # Append the precision and recall scores.\n", 736 | " precision_all.append(precision)\n", 737 | " recall_all.append(recall)" 738 | ] 739 | }, 740 | { 741 | "cell_type": "markdown", 742 | "metadata": {}, 743 | "source": [ 744 | "**Quiz Question**: Among all the threshold values tried, what is the **smallest** threshold value that achieves a precision of 96.5% or better for the reviews of data in **baby_reviews**? Round your answer to 3 decimal places." 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": null, 750 | "metadata": {}, 751 | "outputs": [], 752 | "source": [] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "**Quiz Question:** Is this threshold value smaller or larger than the threshold used for the entire dataset to achieve the same specified precision of 96.5%?\n", 759 | "\n", 760 | "**Finally**, let's plot the precision recall curve." 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": {}, 767 | "outputs": [], 768 | "source": [ 769 | "plot_pr_curve(precision_all, recall_all, \"Precision-Recall (Baby)\")" 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": null, 775 | "metadata": { 776 | "collapsed": true 777 | }, 778 | "outputs": [], 779 | "source": [] 780 | } 781 | ], 782 | "metadata": { 783 | "kernelspec": { 784 | "display_name": "Python 3", 785 | "language": "python", 786 | "name": "python3" 787 | }, 788 | "language_info": { 789 | "codemirror_mode": { 790 | "name": "ipython", 791 | "version": 3 792 | }, 793 | "file_extension": ".py", 794 | "mimetype": "text/x-python", 795 | "name": "python", 796 | "nbconvert_exporter": "python", 797 | "pygments_lexer": "ipython3", 798 | "version": "3.6.3" 799 | } 800 | }, 801 | "nbformat": 4, 802 | "nbformat_minor": 1 803 | } 804 | -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/.DS_Store -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/_1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-train-idx.json.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/_1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-train-idx.json.zip -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/_1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-validation-idx.json.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/_1ccb9ec834e6f4b9afb46f4f5ab56402_module-10-assignment-validation-idx.json.zip -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/_35bdebdff61378878ea2247780005e52_important_words.json.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/_35bdebdff61378878ea2247780005e52_important_words.json.zip -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/_35bdebdff61378878ea2247780005e52_module-10-online-learning-assignment-blank.ipynb.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/_35bdebdff61378878ea2247780005e52_module-10-online-learning-assignment-blank.ipynb.zip -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/_559847710f6045b9f5668d6635969ff4_amazon_baby_subset.csv.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classification Week 7 - Large Dataset/_559847710f6045b9f5668d6635969ff4_amazon_baby_subset.csv.zip -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/important_words.json: -------------------------------------------------------------------------------- 1 | ["baby", "one", "great", "love", "use", "would", "like", "easy", "little", "seat", "old", "well", "get", "also", "really", "son", "time", "bought", "product", "good", "daughter", "much", "loves", "stroller", "put", "months", "car", "still", "back", "used", "recommend", "first", "even", "perfect", "nice", "bag", "two", "using", "got", "fit", "around", "diaper", "enough", "month", "price", "go", "could", "soft", "since", "buy", "room", "works", "made", "child", "keep", "size", "small", "need", "year", "big", "make", "take", "easily", "think", "crib", "clean", "way", "quality", "thing", "better", "without", "set", "new", "every", "cute", "best", "bottles", "work", "purchased", "right", "lot", "side", "happy", "comfortable", "toy", "able", "kids", "bit", "night", "long", "fits", "see", "us", "another", "play", "day", "money", "monitor", "tried", "thought", "never", "item", "hard", "plastic", "however", "disappointed", "reviews", "something", "going", "pump", "bottle", "cup", "waste", "return", "amazon", "different", "top", "want", "problem", "know", "water", "try", "received", "sure", "times", "chair", "find", "hold", "gate", "open", "bottom", "away", "actually", "cheap", "worked", "getting", "ordered", "came", "milk", "bad", "part", "worth", "found", "cover", "many", "design", "looking", "weeks", "say", "wanted", "look", "place", "purchase", "looks", "second", "piece", "box", "pretty", "trying", "difficult", "together", "though", "give", "started", "anything", "last", "company", "come", "returned", "maybe", "took", "broke", "makes", "stay", "instead", "idea", "head", "said", "less", "went", "working", "high", "unit", "seems", "picture", "completely", "wish", "buying", "babies", "won", "tub", "almost", "either"] -------------------------------------------------------------------------------- /Classification Week 7 - Large Dataset/module-10-assignment-validation-idx.json: -------------------------------------------------------------------------------- 1 | [9, 14, 18, 32, 38, 53, 64, 102, 105, 117, 123, 134, 142, 170, 176, 184, 205, 217, 221, 223, 229, 233, 239, 263, 264, 284, 285, 326, 328, 350, 352, 356, 390, 403, 410, 411, 425, 430, 431, 451, 456, 466, 469, 492, 505, 509, 519, 528, 542, 543, 546, 551, 555, 562, 574, 577, 585, 596, 600, 626, 635, 637, 647, 657, 661, 671, 714, 715, 716, 722, 733, 737, 740, 742, 781, 790, 793, 802, 848, 851, 853, 861, 864, 865, 866, 869, 871, 893, 899, 907, 909, 923, 924, 930, 951, 965, 988, 995, 997, 999, 1010, 1043, 1055, 1056, 1061, 1074, 1078, 1116, 1118, 1122, 1124, 1129, 1136, 1143, 1156, 1158, 1177, 1185, 1195, 1200, 1203, 1217, 1244, 1252, 1254, 1262, 1263, 1287, 1290, 1297, 1311, 1325, 1335, 1336, 1354, 1358, 1359, 1367, 1428, 1432, 1444, 1452, 1482, 1492, 1499, 1530, 1538, 1562, 1580, 1589, 1594, 1611, 1631, 1636, 1644, 1648, 1651, 1667, 1684, 1700, 1703, 1706, 1724, 1735, 1741, 1748, 1755, 1767, 1770, 1771, 1773, 1807, 1809, 1818, 1824, 1830, 1857, 1870, 1881, 1884, 1911, 1912, 1918, 1920, 1936, 1946, 1951, 1959, 1963, 1966, 1981, 2003, 2012, 2024, 2047, 2056, 2085, 2088, 2093, 2105, 2109, 2111, 2119, 2127, 2129, 2136, 2145, 2146, 2155, 2159, 2187, 2243, 2247, 2248, 2252, 2257, 2268, 2283, 2284, 2288, 2294, 2318, 2321, 2337, 2341, 2345, 2367, 2377, 2389, 2396, 2401, 2417, 2419, 2436, 2437, 2439, 2451, 2459, 2464, 2471, 2494, 2496, 2499, 2500, 2519, 2540, 2549, 2559, 2563, 2571, 2574, 2585, 2609, 2617, 2636, 2640, 2653, 2662, 2669, 2675, 2677, 2701, 2710, 2714, 2715, 2718, 2722, 2724, 2726, 2728, 2730, 2741, 2745, 2747, 2766, 2767, 2772, 2776, 2781, 2811, 2815, 2817, 2824, 2827, 2830, 2831, 2836, 2837, 2839, 2840, 2841, 2866, 2874, 2893, 2894, 2900, 2926, 2935, 2940, 2947, 2948, 2965, 2966, 3005, 3008, 3021, 3045, 3059, 3060, 3069, 3088, 3089, 3095, 3102, 3112, 3114, 3115, 3117, 3143, 3147, 3159, 3196, 3200, 3221, 3247, 3254, 3255, 3263, 3308, 3313, 3321, 3324, 3337, 3345, 3347, 3351, 3364, 3366, 3367, 3376, 3387, 3393, 3398, 3413, 3414, 3416, 3423, 3426, 3429, 3443, 3457, 3470, 3471, 3494, 3549, 3583, 3587, 3609, 3612, 3656, 3664, 3682, 3694, 3698, 3707, 3741, 3748, 3784, 3789, 3807, 3810, 3815, 3830, 3832, 3836, 3837, 3850, 3872, 3885, 3907, 3909, 3911, 3915, 3930, 3934, 3940, 3944, 3945, 3947, 3953, 3955, 3958, 3959, 3966, 3967, 3973, 3975, 3979, 3981, 3982, 3984, 3999, 4003, 4020, 4027, 4043, 4064, 4065, 4066, 4068, 4075, 4082, 4091, 4093, 4118, 4121, 4122, 4137, 4152, 4161, 4179, 4200, 4210, 4213, 4231, 4233, 4254, 4258, 4275, 4278, 4294, 4298, 4316, 4341, 4342, 4346, 4354, 4359, 4361, 4363, 4376, 4382, 4384, 4387, 4389, 4403, 4417, 4420, 4425, 4449, 4450, 4472, 4503, 4507, 4508, 4509, 4525, 4526, 4529, 4535, 4543, 4544, 4554, 4589, 4592, 4594, 4595, 4602, 4611, 4614, 4626, 4641, 4650, 4685, 4703, 4718, 4727, 4729, 4740, 4742, 4747, 4760, 4763, 4764, 4770, 4792, 4796, 4797, 4801, 4807, 4813, 4853, 4865, 4867, 4869, 4875, 4878, 4882, 4900, 4917, 4918, 4924, 4925, 4944, 4947, 4955, 4964, 4965, 4971, 4994, 4997, 5019, 5021, 5025, 5057, 5061, 5067, 5086, 5091, 5098, 5100, 5114, 5124, 5166, 5167, 5171, 5173, 5178, 5184, 5186, 5194, 5202, 5209, 5213, 5232, 5238, 5260, 5261, 5266, 5268, 5271, 5280, 5282, 5290, 5293, 5296, 5310, 5319, 5335, 5341, 5345, 5359, 5361, 5372, 5382, 5388, 5393, 5395, 5441, 5454, 5469, 5481, 5482, 5486, 5490, 5501, 5506, 5527, 5534, 5536, 5561, 5597, 5614, 5616, 5626, 5632, 5635, 5640, 5649, 5654, 5684, 5686, 5688, 5697, 5711, 5719, 5722, 5725, 5729, 5741, 5756, 5784, 5786, 5787, 5792, 5805, 5821, 5825, 5829, 5858, 5866, 5875, 5880, 5881, 5883, 5905, 5925, 5927, 5943, 5982, 5987, 5996, 5997, 6003, 6009, 6013, 6015, 6023, 6029, 6040, 6042, 6061, 6064, 6074, 6075, 6078, 6079, 6082, 6085, 6102, 6108, 6111, 6119, 6139, 6160, 6161, 6166, 6184, 6193, 6226, 6234, 6243, 6251, 6255, 6263, 6265, 6279, 6282, 6285, 6290, 6297, 6305, 6309, 6319, 6358, 6369, 6381, 6389, 6391, 6392, 6394, 6429, 6454, 6473, 6482, 6499, 6502, 6504, 6540, 6541, 6584, 6587, 6602, 6605, 6606, 6627, 6633, 6634, 6635, 6685, 6688, 6689, 6698, 6705, 6722, 6730, 6737, 6738, 6744, 6745, 6777, 6780, 6784, 6803, 6814, 6828, 6832, 6851, 6855, 6861, 6868, 6875, 6903, 6904, 6905, 6907, 6930, 6933, 6943, 6944, 6949, 6959, 6975, 6976, 6979, 6985, 7002, 7005, 7006, 7010, 7020, 7027, 7036, 7044, 7049, 7059, 7073, 7082, 7091, 7108, 7109, 7120, 7126, 7148, 7164, 7204, 7217, 7219, 7222, 7226, 7241, 7244, 7260, 7281, 7284, 7287, 7300, 7305, 7314, 7317, 7323, 7328, 7330, 7374, 7379, 7396, 7397, 7399, 7401, 7407, 7412, 7417, 7426, 7433, 7440, 7441, 7452, 7457, 7465, 7487, 7518, 7543, 7544, 7545, 7546, 7556, 7567, 7577, 7585, 7592, 7593, 7613, 7617, 7620, 7623, 7644, 7666, 7668, 7671, 7735, 7737, 7742, 7758, 7760, 7762, 7763, 7771, 7774, 7776, 7795, 7797, 7800, 7810, 7822, 7849, 7871, 7937, 7948, 7958, 7971, 7972, 7981, 8004, 8032, 8045, 8063, 8067, 8077, 8093, 8096, 8111, 8130, 8137, 8156, 8162, 8176, 8191, 8199, 8202, 8207, 8234, 8248, 8273, 8277, 8291, 8306, 8308, 8323, 8333, 8338, 8346, 8348, 8349, 8355, 8357, 8364, 8365, 8370, 8376, 8390, 8404, 8423, 8442, 8449, 8451, 8452, 8461, 8482, 8491, 8499, 8503, 8527, 8540, 8553, 8559, 8563, 8566, 8580, 8585, 8622, 8666, 8670, 8684, 8749, 8753, 8800, 8805, 8807, 8812, 8826, 8832, 8845, 8852, 8872, 8873, 8880, 8881, 8895, 8898, 8909, 8917, 8944, 8947, 8952, 8960, 8965, 8986, 8991, 8999, 9002, 9003, 9017, 9020, 9037, 9049, 9079, 9083, 9092, 9093, 9144, 9147, 9150, 9151, 9164, 9188, 9191, 9208, 9233, 9234, 9246, 9274, 9291, 9296, 9302, 9332, 9339, 9356, 9360, 9385, 9392, 9411, 9418, 9421, 9422, 9434, 9437, 9453, 9469, 9479, 9489, 9491, 9499, 9501, 9510, 9522, 9523, 9527, 9531, 9536, 9541, 9545, 9547, 9552, 9565, 9574, 9576, 9581, 9583, 9595, 9596, 9599, 9601, 9602, 9621, 9633, 9655, 9689, 9693, 9735, 9745, 9751, 9768, 9786, 9788, 9795, 9796, 9798, 9808, 9862, 9873, 9876, 9888, 9898, 9901, 9908, 9918, 9920, 9922, 9933, 9942, 9943, 9946, 9949, 9951, 9963, 9969, 9974, 9981, 10000, 10007, 10008, 10009, 10012, 10032, 10036, 10045, 10064, 10071, 10073, 10080, 10091, 10093, 10095, 10099, 10108, 10109, 10119, 10123, 10126, 10167, 10182, 10186, 10188, 10192, 10211, 10229, 10235, 10248, 10254, 10267, 10268, 10274, 10278, 10281, 10282, 10314, 10322, 10323, 10336, 10339, 10341, 10356, 10364, 10370, 10376, 10381, 10384, 10386, 10394, 10420, 10427, 10444, 10452, 10453, 10454, 10457, 10460, 10462, 10465, 10490, 10507, 10512, 10517, 10519, 10522, 10524, 10527, 10563, 10571, 10572, 10580, 10610, 10631, 10641, 10642, 10658, 10665, 10669, 10670, 10672, 10692, 10709, 10719, 10733, 10762, 10765, 10770, 10771, 10806, 10809, 10813, 10824, 10828, 10852, 10867, 10868, 10878, 10880, 10883, 10909, 10910, 10925, 10930, 10951, 10959, 10960, 10962, 10982, 11006, 11031, 11045, 11067, 11070, 11083, 11085, 11089, 11092, 11117, 11134, 11139, 11147, 11159, 11174, 11178, 11180, 11191, 11203, 11206, 11221, 11222, 11224, 11231, 11252, 11273, 11277, 11294, 11299, 11305, 11308, 11309, 11313, 11331, 11333, 11338, 11341, 11345, 11349, 11352, 11374, 11381, 11409, 11419, 11422, 11425, 11437, 11443, 11449, 11451, 11463, 11481, 11486, 11490, 11506, 11536, 11543, 11544, 11545, 11558, 11563, 11577, 11581, 11587, 11588, 11589, 11593, 11612, 11621, 11635, 11656, 11657, 11678, 11690, 11699, 11706, 11713, 11717, 11730, 11733, 11744, 11748, 11758, 11769, 11772, 11795, 11803, 11827, 11840, 11846, 11853, 11854, 11874, 11878, 11888, 11891, 11903, 11904, 11912, 11925, 11929, 11937, 11954, 11967, 11986, 11996, 11997, 11998, 12003, 12005, 12010, 12015, 12023, 12029, 12032, 12038, 12040, 12041, 12049, 12055, 12095, 12107, 12122, 12125, 12130, 12150, 12192, 12198, 12211, 12213, 12223, 12225, 12230, 12231, 12232, 12245, 12263, 12293, 12333, 12346, 12368, 12386, 12396, 12398, 12401, 12416, 12418, 12420, 12429, 12450, 12457, 12464, 12473, 12489, 12508, 12511, 12529, 12539, 12552, 12559, 12574, 12602, 12604, 12607, 12626, 12680, 12682, 12692, 12700, 12726, 12749, 12763, 12796, 12808, 12816, 12818, 12825, 12849, 12865, 12869, 12873, 12875, 12892, 12899, 12901, 12902, 12904, 12906, 12923, 12927, 12929, 12933, 12936, 12963, 12965, 12967, 12968, 12975, 12978, 12991, 12999, 13026, 13028, 13042, 13043, 13054, 13055, 13085, 13108, 13110, 13134, 13142, 13150, 13180, 13190, 13204, 13221, 13222, 13235, 13247, 13252, 13265, 13266, 13329, 13332, 13340, 13350, 13353, 13356, 13362, 13366, 13373, 13379, 13400, 13403, 13419, 13433, 13441, 13449, 13472, 13480, 13483, 13492, 13499, 13511, 13520, 13524, 13528, 13532, 13550, 13561, 13572, 13591, 13596, 13607, 13612, 13614, 13616, 13626, 13636, 13638, 13642, 13646, 13652, 13654, 13669, 13672, 13684, 13705, 13706, 13712, 13714, 13717, 13733, 13747, 13758, 13775, 13788, 13796, 13803, 13811, 13827, 13835, 13843, 13852, 13853, 13862, 13863, 13869, 13874, 13887, 13904, 13919, 13921, 13923, 13926, 13930, 13931, 13948, 13951, 13952, 13969, 13978, 13983, 13985, 13986, 13988, 13992, 13999, 14025, 14029, 14030, 14057, 14071, 14072, 14082, 14084, 14099, 14109, 14119, 14141, 14145, 14154, 14157, 14174, 14185, 14191, 14202, 14206, 14208, 14219, 14265, 14268, 14271, 14289, 14292, 14293, 14298, 14303, 14305, 14318, 14320, 14339, 14340, 14358, 14359, 14365, 14393, 14404, 14425, 14426, 14431, 14437, 14459, 14466, 14479, 14484, 14487, 14502, 14506, 14519, 14522, 14535, 14560, 14567, 14580, 14584, 14586, 14605, 14630, 14637, 14653, 14665, 14671, 14675, 14678, 14700, 14711, 14731, 14733, 14735, 14744, 14782, 14809, 14817, 14827, 14866, 14867, 14880, 14902, 14918, 14924, 14933, 14954, 14955, 14963, 14966, 14975, 14983, 14987, 14990, 14995, 15010, 15017, 15044, 15064, 15076, 15077, 15080, 15101, 15107, 15117, 15123, 15130, 15136, 15141, 15143, 15152, 15162, 15172, 15185, 15212, 15216, 15225, 15259, 15272, 15285, 15301, 15309, 15311, 15316, 15322, 15324, 15348, 15357, 15359, 15362, 15403, 15404, 15421, 15422, 15424, 15433, 15434, 15438, 15439, 15442, 15458, 15463, 15466, 15473, 15489, 15491, 15499, 15529, 15534, 15541, 15543, 15545, 15550, 15556, 15563, 15576, 15581, 15588, 15597, 15615, 15655, 15666, 15673, 15675, 15687, 15698, 15700, 15711, 15714, 15718, 15724, 15725, 15753, 15783, 15792, 15833, 15838, 15840, 15842, 15847, 15860, 15897, 15900, 15903, 15913, 15916, 15929, 15931, 15945, 15959, 15984, 15991, 15996, 16025, 16040, 16058, 16067, 16068, 16079, 16085, 16107, 16111, 16129, 16134, 16141, 16148, 16159, 16161, 16164, 16166, 16168, 16201, 16211, 16212, 16246, 16267, 16270, 16275, 16279, 16287, 16295, 16297, 16306, 16311, 16312, 16323, 16333, 16339, 16354, 16358, 16367, 16370, 16379, 16384, 16390, 16407, 16414, 16416, 16438, 16448, 16467, 16486, 16491, 16507, 16511, 16533, 16536, 16541, 16544, 16553, 16558, 16574, 16575, 16610, 16622, 16634, 16669, 16674, 16687, 16697, 16700, 16704, 16715, 16719, 16725, 16735, 16738, 16739, 16752, 16753, 16755, 16765, 16787, 16800, 16802, 16805, 16810, 16811, 16812, 16815, 16819, 16821, 16825, 16853, 16873, 16878, 16930, 16950, 16961, 16963, 16982, 16996, 17015, 17018, 17024, 17027, 17032, 17049, 17054, 17060, 17062, 17075, 17077, 17101, 17103, 17110, 17111, 17116, 17128, 17144, 17146, 17148, 17161, 17185, 17186, 17187, 17199, 17204, 17216, 17221, 17222, 17225, 17232, 17250, 17252, 17257, 17282, 17285, 17297, 17306, 17307, 17310, 17334, 17336, 17364, 17395, 17411, 17413, 17415, 17450, 17479, 17489, 17490, 17520, 17526, 17527, 17528, 17537, 17542, 17547, 17554, 17560, 17564, 17569, 17579, 17594, 17596, 17610, 17611, 17616, 17618, 17622, 17626, 17647, 17688, 17696, 17703, 17714, 17720, 17735, 17744, 17756, 17759, 17762, 17771, 17774, 17787, 17788, 17792, 17793, 17798, 17801, 17808, 17827, 17829, 17835, 17845, 17855, 17871, 17879, 17886, 17893, 17895, 17897, 17898, 17904, 17906, 17918, 17931, 17940, 17941, 17950, 17959, 17967, 17971, 17973, 17978, 17987, 17993, 17994, 18007, 18019, 18023, 18036, 18048, 18062, 18078, 18083, 18128, 18133, 18146, 18147, 18154, 18159, 18167, 18189, 18194, 18201, 18205, 18209, 18213, 18214, 18217, 18224, 18234, 18239, 18249, 18254, 18283, 18289, 18290, 18312, 18323, 18330, 18332, 18337, 18375, 18382, 18386, 18387, 18392, 18397, 18410, 18412, 18432, 18433, 18437, 18445, 18459, 18465, 18468, 18471, 18481, 18522, 18526, 18546, 18552, 18554, 18561, 18564, 18568, 18569, 18572, 18577, 18586, 18589, 18598, 18611, 18612, 18627, 18628, 18654, 18659, 18664, 18674, 18680, 18681, 18700, 18721, 18723, 18730, 18765, 18780, 18785, 18788, 18793, 18795, 18843, 18857, 18858, 18862, 18867, 18880, 18881, 18895, 18901, 18931, 18934, 18938, 18958, 18987, 18988, 19019, 19025, 19034, 19035, 19038, 19041, 19050, 19054, 19064, 19075, 19083, 19087, 19099, 19109, 19116, 19130, 19138, 19140, 19146, 19152, 19153, 19159, 19168, 19173, 19182, 19184, 19195, 19202, 19206, 19216, 19221, 19236, 19257, 19260, 19269, 19273, 19276, 19281, 19284, 19292, 19337, 19339, 19343, 19347, 19353, 19358, 19362, 19390, 19396, 19398, 19422, 19430, 19447, 19449, 19451, 19453, 19487, 19489, 19498, 19502, 19506, 19517, 19522, 19524, 19525, 19536, 19547, 19548, 19556, 19596, 19602, 19604, 19608, 19618, 19622, 19629, 19641, 19647, 19658, 19672, 19681, 19690, 19706, 19709, 19710, 19712, 19719, 19720, 19738, 19757, 19764, 19772, 19776, 19780, 19783, 19806, 19816, 19818, 19832, 19848, 19858, 19859, 19867, 19880, 19889, 19907, 19909, 19933, 19935, 19936, 19944, 19948, 19967, 19973, 19982, 19993, 20006, 20018, 20033, 20090, 20095, 20107, 20117, 20119, 20144, 20145, 20153, 20154, 20157, 20163, 20176, 20183, 20189, 20215, 20218, 20224, 20272, 20274, 20281, 20290, 20295, 20302, 20306, 20308, 20325, 20334, 20346, 20348, 20353, 20369, 20370, 20379, 20380, 20383, 20385, 20387, 20394, 20400, 20415, 20444, 20448, 20450, 20454, 20465, 20468, 20476, 20488, 20491, 20505, 20512, 20523, 20544, 20553, 20570, 20585, 20599, 20622, 20624, 20633, 20638, 20647, 20651, 20657, 20664, 20677, 20700, 20702, 20710, 20713, 20715, 20717, 20722, 20724, 20725, 20746, 20766, 20782, 20800, 20827, 20834, 20840, 20848, 20855, 20856, 20864, 20868, 20872, 20883, 20891, 20893, 20897, 20904, 20927, 20935, 20941, 20944, 20948, 20949, 20953, 20963, 20993, 21004, 21005, 21008, 21012, 21013, 21023, 21029, 21045, 21047, 21050, 21059, 21064, 21074, 21087, 21096, 21100, 21115, 21117, 21123, 21132, 21140, 21157, 21173, 21174, 21177, 21179, 21188, 21190, 21195, 21207, 21221, 21230, 21236, 21238, 21245, 21253, 21254, 21256, 21261, 21269, 21291, 21313, 21325, 21342, 21346, 21347, 21375, 21383, 21384, 21385, 21391, 21404, 21408, 21426, 21433, 21445, 21496, 21497, 21510, 21515, 21521, 21523, 21524, 21527, 21545, 21554, 21571, 21572, 21588, 21619, 21640, 21649, 21655, 21657, 21658, 21662, 21663, 21670, 21673, 21682, 21688, 21699, 21704, 21710, 21719, 21726, 21728, 21735, 21739, 21758, 21759, 21794, 21803, 21805, 21811, 21823, 21831, 21838, 21852, 21854, 21861, 21879, 21882, 21883, 21892, 21910, 21920, 21925, 21932, 21937, 21949, 21951, 21963, 21968, 21972, 21973, 21981, 21983, 21995, 22012, 22018, 22019, 22030, 22039, 22043, 22051, 22052, 22069, 22082, 22085, 22086, 22087, 22102, 22109, 22110, 22130, 22150, 22157, 22172, 22176, 22184, 22186, 22187, 22199, 22205, 22210, 22233, 22236, 22238, 22239, 22241, 22276, 22284, 22286, 22288, 22319, 22327, 22337, 22342, 22345, 22346, 22362, 22375, 22379, 22384, 22395, 22396, 22400, 22408, 22422, 22434, 22450, 22467, 22471, 22473, 22474, 22477, 22478, 22482, 22484, 22490, 22512, 22518, 22520, 22532, 22533, 22538, 22550, 22563, 22603, 22604, 22612, 22617, 22621, 22625, 22628, 22629, 22660, 22664, 22678, 22680, 22683, 22692, 22706, 22730, 22732, 22738, 22742, 22757, 22802, 22826, 22845, 22846, 22850, 22863, 22892, 22914, 22921, 22923, 22947, 22955, 22958, 22964, 22967, 22995, 23013, 23016, 23026, 23028, 23034, 23046, 23051, 23075, 23077, 23100, 23111, 23138, 23141, 23149, 23150, 23163, 23184, 23185, 23197, 23201, 23204, 23207, 23226, 23228, 23243, 23247, 23248, 23253, 23262, 23263, 23267, 23274, 23276, 23278, 23286, 23307, 23309, 23310, 23350, 23359, 23367, 23375, 23381, 23382, 23383, 23389, 23409, 23420, 23422, 23423, 23430, 23431, 23442, 23446, 23470, 23479, 23485, 23491, 23504, 23508, 23527, 23528, 23529, 23535, 23545, 23548, 23563, 23570, 23582, 23607, 23614, 23622, 23631, 23640, 23646, 23647, 23675, 23683, 23689, 23700, 23706, 23714, 23715, 23716, 23724, 23726, 23730, 23731, 23748, 23752, 23769, 23800, 23802, 23852, 23853, 23856, 23864, 23889, 23904, 23910, 23915, 23916, 23920, 23929, 23936, 23947, 23959, 23968, 23969, 23976, 23991, 24030, 24031, 24038, 24055, 24074, 24075, 24077, 24082, 24083, 24084, 24094, 24097, 24102, 24117, 24157, 24168, 24180, 24181, 24192, 24221, 24239, 24244, 24258, 24279, 24322, 24347, 24350, 24351, 24357, 24358, 24368, 24370, 24389, 24396, 24405, 24415, 24420, 24438, 24442, 24457, 24473, 24497, 24505, 24529, 24541, 24565, 24567, 24596, 24608, 24624, 24632, 24636, 24642, 24647, 24659, 24666, 24673, 24690, 24693, 24705, 24711, 24752, 24761, 24771, 24784, 24785, 24786, 24808, 24833, 24836, 24851, 24860, 24871, 24877, 24883, 24925, 24935, 24952, 24959, 24965, 24978, 24980, 24982, 24988, 24990, 24991, 25003, 25004, 25018, 25019, 25027, 25036, 25061, 25062, 25078, 25084, 25087, 25092, 25099, 25105, 25107, 25147, 25152, 25170, 25183, 25184, 25185, 25186, 25190, 25194, 25199, 25207, 25210, 25231, 25241, 25251, 25254, 25258, 25263, 25280, 25287, 25289, 25291, 25292, 25339, 25350, 25377, 25383, 25391, 25392, 25394, 25410, 25414, 25429, 25430, 25453, 25457, 25466, 25467, 25470, 25476, 25479, 25490, 25491, 25493, 25514, 25523, 25524, 25527, 25562, 25564, 25573, 25575, 25598, 25604, 25611, 25661, 25677, 25688, 25695, 25696, 25709, 25712, 25714, 25766, 25768, 25769, 25772, 25791, 25794, 25799, 25806, 25810, 25833, 25840, 25847, 25850, 25868, 25871, 25873, 25877, 25885, 25908, 25922, 25927, 25938, 25939, 25999, 26008, 26014, 26028, 26042, 26053, 26095, 26102, 26122, 26128, 26143, 26162, 26175, 26183, 26199, 26203, 26219, 26221, 26229, 26234, 26244, 26248, 26258, 26260, 26281, 26282, 26287, 26295, 26312, 26317, 26328, 26329, 26330, 26344, 26364, 26379, 26382, 26383, 26390, 26393, 26397, 26398, 26402, 26405, 26409, 26415, 26450, 26453, 26460, 26463, 26465, 26479, 26488, 26505, 26513, 26516, 26517, 26550, 26554, 26565, 26585, 26589, 26596, 26601, 26607, 26614, 26622, 26632, 26639, 26653, 26655, 26667, 26679, 26680, 26687, 26706, 26720, 26722, 26741, 26742, 26750, 26757, 26761, 26770, 26782, 26796, 26821, 26824, 26828, 26835, 26847, 26855, 26863, 26868, 26883, 26896, 26911, 26913, 26922, 26923, 26933, 26935, 26942, 26947, 26952, 26963, 26967, 26975, 27003, 27014, 27015, 27025, 27034, 27036, 27059, 27062, 27075, 27083, 27092, 27093, 27100, 27122, 27131, 27134, 27136, 27150, 27166, 27181, 27192, 27207, 27219, 27222, 27223, 27229, 27237, 27257, 27265, 27298, 27300, 27301, 27302, 27305, 27311, 27318, 27329, 27343, 27344, 27373, 27379, 27399, 27401, 27444, 27456, 27458, 27478, 27493, 27509, 27529, 27530, 27550, 27557, 27567, 27570, 27576, 27582, 27599, 27616, 27617, 27640, 27642, 27670, 27673, 27680, 27690, 27695, 27699, 27700, 27702, 27706, 27708, 27716, 27745, 27760, 27763, 27780, 27781, 27797, 27832, 27838, 27845, 27846, 27852, 27860, 27864, 27880, 27893, 27900, 27901, 27905, 27909, 27911, 27918, 27923, 27927, 27965, 27966, 27975, 27984, 27993, 28004, 28041, 28052, 28064, 28066, 28069, 28083, 28095, 28105, 28115, 28120, 28122, 28125, 28135, 28152, 28161, 28185, 28196, 28207, 28220, 28225, 28232, 28258, 28266, 28281, 28334, 28363, 28379, 28389, 28395, 28399, 28409, 28425, 28450, 28461, 28464, 28465, 28481, 28483, 28485, 28495, 28496, 28502, 28510, 28533, 28544, 28552, 28555, 28556, 28557, 28561, 28563, 28572, 28574, 28577, 28589, 28592, 28605, 28607, 28618, 28626, 28634, 28638, 28639, 28645, 28647, 28651, 28662, 28764, 28775, 28816, 28824, 28837, 28842, 28853, 28869, 28871, 28874, 28891, 28908, 28910, 28915, 28920, 28951, 28956, 28965, 28979, 28997, 29001, 29005, 29007, 29027, 29037, 29052, 29058, 29062, 29066, 29082, 29097, 29105, 29126, 29150, 29158, 29176, 29178, 29183, 29186, 29189, 29206, 29209, 29217, 29242, 29248, 29250, 29260, 29266, 29313, 29314, 29327, 29328, 29349, 29365, 29366, 29376, 29381, 29388, 29392, 29401, 29424, 29431, 29434, 29443, 29449, 29456, 29468, 29471, 29474, 29496, 29516, 29519, 29524, 29529, 29556, 29557, 29559, 29567, 29578, 29583, 29587, 29611, 29614, 29661, 29662, 29668, 29675, 29694, 29696, 29701, 29705, 29714, 29734, 29752, 29757, 29759, 29765, 29768, 29771, 29780, 29784, 29792, 29794, 29841, 29881, 29886, 29904, 29943, 29958, 29980, 29983, 29986, 29989, 30002, 30008, 30014, 30023, 30031, 30037, 30042, 30049, 30053, 30059, 30072, 30094, 30101, 30104, 30115, 30143, 30158, 30178, 30182, 30188, 30193, 30198, 30212, 30242, 30246, 30261, 30278, 30279, 30282, 30284, 30299, 30300, 30303, 30322, 30333, 30334, 30352, 30359, 30374, 30394, 30411, 30419, 30430, 30439, 30450, 30457, 30459, 30468, 30473, 30477, 30499, 30500, 30506, 30516, 30525, 30526, 30539, 30540, 30541, 30550, 30557, 30561, 30579, 30583, 30604, 30614, 30642, 30651, 30663, 30672, 30674, 30696, 30712, 30739, 30744, 30750, 30765, 30784, 30787, 30798, 30805, 30808, 30810, 30811, 30818, 30828, 30831, 30833, 30860, 30863, 30865, 30886, 30891, 30893, 30910, 30925, 30940, 30967, 30969, 30984, 30986, 31019, 31026, 31030, 31034, 31053, 31055, 31068, 31093, 31099, 31101, 31117, 31121, 31126, 31136, 31138, 31145, 31150, 31160, 31172, 31176, 31183, 31193, 31200, 31201, 31208, 31236, 31263, 31277, 31284, 31303, 31311, 31315, 31319, 31330, 31369, 31372, 31381, 31388, 31392, 31393, 31398, 31442, 31447, 31449, 31454, 31458, 31475, 31478, 31482, 31525, 31553, 31574, 31588, 31598, 31616, 31622, 31625, 31638, 31646, 31668, 31675, 31684, 31695, 31697, 31708, 31709, 31711, 31720, 31727, 31739, 31757, 31769, 31776, 31781, 31782, 31786, 31795, 31796, 31806, 31824, 31826, 31837, 31840, 31851, 31861, 31863, 31875, 31886, 31889, 31899, 31931, 31936, 31937, 31944, 31946, 31978, 31980, 31995, 31999, 32000, 32019, 32027, 32045, 32056, 32057, 32061, 32068, 32081, 32085, 32094, 32103, 32110, 32123, 32142, 32149, 32154, 32161, 32163, 32167, 32175, 32220, 32222, 32226, 32240, 32245, 32246, 32261, 32266, 32274, 32315, 32318, 32335, 32344, 32351, 32358, 32362, 32370, 32381, 32422, 32445, 32451, 32458, 32475, 32487, 32506, 32518, 32521, 32528, 32533, 32538, 32545, 32547, 32569, 32577, 32581, 32583, 32593, 32601, 32608, 32610, 32613, 32652, 32658, 32663, 32690, 32755, 32756, 32761, 32789, 32795, 32802, 32820, 32845, 32852, 32859, 32862, 32873, 32875, 32879, 32880, 32883, 32892, 32903, 32918, 32920, 32922, 32938, 32939, 32946, 32954, 32968, 32971, 32974, 32978, 32989, 32992, 32993, 33014, 33017, 33019, 33052, 33058, 33064, 33074, 33076, 33077, 33082, 33086, 33118, 33130, 33142, 33146, 33153, 33175, 33185, 33193, 33234, 33236, 33256, 33282, 33287, 33322, 33323, 33326, 33338, 33360, 33363, 33426, 33428, 33429, 33444, 33445, 33449, 33458, 33465, 33475, 33483, 33503, 33512, 33518, 33532, 33537, 33554, 33572, 33574, 33576, 33579, 33590, 33604, 33609, 33614, 33617, 33620, 33624, 33625, 33650, 33673, 33681, 33685, 33691, 33700, 33714, 33723, 33728, 33732, 33737, 33739, 33749, 33759, 33775, 33806, 33833, 33837, 33840, 33842, 33843, 33847, 33885, 33886, 33907, 33917, 33929, 33931, 33932, 33936, 33941, 33954, 33963, 33964, 33983, 33999, 34020, 34024, 34041, 34042, 34048, 34049, 34054, 34056, 34070, 34085, 34098, 34103, 34125, 34133, 34141, 34151, 34154, 34164, 34165, 34183, 34209, 34212, 34218, 34245, 34246, 34256, 34264, 34286, 34290, 34316, 34340, 34345, 34364, 34374, 34378, 34380, 34382, 34383, 34390, 34393, 34404, 34407, 34411, 34421, 34427, 34448, 34468, 34469, 34475, 34485, 34488, 34489, 34501, 34502, 34509, 34510, 34527, 34538, 34539, 34552, 34573, 34574, 34613, 34619, 34626, 34635, 34647, 34652, 34656, 34662, 34667, 34686, 34687, 34688, 34693, 34695, 34707, 34717, 34724, 34727, 34743, 34749, 34755, 34761, 34766, 34780, 34791, 34796, 34821, 34857, 34864, 34878, 34882, 34902, 34905, 34909, 34913, 34941, 34952, 34958, 34961, 34976, 34996, 35023, 35031, 35033, 35036, 35043, 35065, 35098, 35102, 35118, 35123, 35137, 35149, 35158, 35160, 35173, 35177, 35178, 35182, 35191, 35199, 35215, 35243, 35265, 35268, 35269, 35277, 35282, 35287, 35292, 35308, 35313, 35314, 35316, 35325, 35328, 35332, 35341, 35345, 35350, 35363, 35378, 35388, 35397, 35405, 35407, 35408, 35423, 35427, 35443, 35456, 35462, 35469, 35471, 35474, 35480, 35483, 35518, 35521, 35534, 35536, 35543, 35547, 35552, 35554, 35575, 35583, 35603, 35608, 35615, 35632, 35682, 35683, 35745, 35755, 35767, 35779, 35781, 35793, 35795, 35811, 35820, 35826, 35827, 35833, 35836, 35866, 35879, 35901, 35904, 35907, 35909, 35913, 35931, 35936, 35937, 35941, 35951, 35957, 35966, 35976, 35983, 35989, 36002, 36008, 36012, 36014, 36023, 36030, 36047, 36056, 36071, 36081, 36120, 36124, 36132, 36149, 36155, 36167, 36168, 36197, 36202, 36224, 36230, 36241, 36245, 36276, 36281, 36283, 36305, 36312, 36324, 36336, 36349, 36357, 36360, 36367, 36392, 36419, 36425, 36427, 36428, 36435, 36444, 36463, 36464, 36477, 36479, 36489, 36499, 36507, 36508, 36526, 36540, 36544, 36582, 36583, 36592, 36596, 36610, 36660, 36695, 36703, 36708, 36709, 36715, 36723, 36734, 36736, 36772, 36776, 36796, 36801, 36803, 36812, 36820, 36845, 36861, 36882, 36906, 36926, 36954, 36961, 36969, 37014, 37048, 37056, 37074, 37094, 37105, 37113, 37131, 37142, 37154, 37160, 37183, 37198, 37200, 37202, 37245, 37252, 37267, 37276, 37299, 37311, 37342, 37345, 37346, 37380, 37389, 37403, 37406, 37407, 37410, 37422, 37429, 37434, 37435, 37451, 37487, 37496, 37504, 37506, 37513, 37516, 37519, 37537, 37544, 37549, 37568, 37573, 37574, 37580, 37586, 37589, 37591, 37599, 37600, 37624, 37657, 37663, 37665, 37684, 37697, 37705, 37707, 37730, 37766, 37791, 37797, 37816, 37830, 37834, 37837, 37855, 37863, 37868, 37871, 37873, 37880, 37885, 37894, 37906, 37910, 37917, 37919, 37920, 37928, 37932, 37934, 37979, 37984, 37985, 37993, 37995, 38000, 38005, 38015, 38021, 38071, 38098, 38109, 38130, 38139, 38140, 38148, 38165, 38197, 38208, 38226, 38234, 38250, 38267, 38307, 38316, 38331, 38336, 38343, 38348, 38350, 38358, 38370, 38371, 38378, 38381, 38392, 38399, 38402, 38410, 38415, 38420, 38424, 38426, 38427, 38431, 38450, 38455, 38466, 38490, 38496, 38498, 38515, 38521, 38536, 38540, 38544, 38556, 38561, 38564, 38571, 38584, 38586, 38601, 38603, 38628, 38645, 38653, 38666, 38674, 38676, 38680, 38683, 38697, 38708, 38719, 38732, 38769, 38771, 38793, 38795, 38826, 38832, 38837, 38842, 38851, 38861, 38880, 38883, 38899, 38908, 38922, 38928, 38930, 38938, 38941, 38947, 38955, 38956, 38957, 38970, 38976, 38987, 38998, 39007, 39019, 39041, 39043, 39053, 39063, 39066, 39075, 39095, 39118, 39128, 39154, 39158, 39163, 39178, 39220, 39247, 39254, 39256, 39274, 39286, 39287, 39296, 39297, 39299, 39315, 39337, 39355, 39366, 39369, 39394, 39395, 39430, 39451, 39459, 39488, 39493, 39508, 39542, 39544, 39549, 39553, 39559, 39562, 39566, 39571, 39578, 39587, 39621, 39646, 39658, 39662, 39664, 39668, 39683, 39684, 39686, 39692, 39704, 39705, 39717, 39731, 39733, 39737, 39741, 39746, 39755, 39786, 39804, 39805, 39812, 39819, 39837, 39838, 39851, 39852, 39869, 39882, 39888, 39890, 39906, 39914, 39917, 39921, 39943, 39945, 39961, 39963, 39965, 39972, 39977, 39979, 39984, 39990, 39997, 40004, 40014, 40017, 40019, 40028, 40064, 40067, 40081, 40085, 40091, 40103, 40106, 40117, 40135, 40149, 40152, 40167, 40169, 40173, 40178, 40184, 40186, 40203, 40205, 40213, 40219, 40243, 40247, 40258, 40272, 40321, 40327, 40330, 40382, 40390, 40405, 40418, 40424, 40432, 40448, 40461, 40470, 40487, 40488, 40490, 40498, 40503, 40512, 40524, 40560, 40563, 40593, 40616, 40659, 40663, 40664, 40666, 40672, 40677, 40679, 40700, 40728, 40746, 40747, 40760, 40775, 40800, 40810, 40830, 40847, 40849, 40866, 40901, 40904, 40920, 40922, 40931, 40943, 40962, 40967, 40969, 40978, 40985, 41006, 41009, 41014, 41018, 41041, 41043, 41058, 41086, 41117, 41133, 41138, 41143, 41144, 41157, 41173, 41175, 41179, 41198, 41205, 41216, 41232, 41272, 41288, 41289, 41314, 41322, 41326, 41364, 41367, 41383, 41394, 41430, 41434, 41457, 41464, 41483, 41518, 41524, 41526, 41529, 41537, 41539, 41556, 41562, 41574, 41578, 41583, 41588, 41593, 41635, 41644, 41650, 41670, 41682, 41693, 41705, 41720, 41730, 41732, 41734, 41737, 41741, 41757, 41766, 41777, 41839, 41855, 41866, 41868, 41871, 41875, 41879, 41890, 41909, 41914, 41921, 41924, 41927, 41930, 41938, 41939, 41945, 41954, 41969, 41974, 41981, 41984, 42003, 42009, 42016, 42021, 42023, 42027, 42040, 42049, 42052, 42061, 42065, 42069, 42072, 42092, 42098, 42101, 42106, 42109, 42112, 42120, 42123, 42128, 42145, 42146, 42156, 42172, 42173, 42181, 42188, 42194, 42202, 42207, 42215, 42227, 42234, 42238, 42258, 42259, 42266, 42268, 42274, 42280, 42284, 42287, 42310, 42335, 42344, 42348, 42378, 42380, 42391, 42413, 42429, 42434, 42436, 42441, 42446, 42447, 42455, 42464, 42465, 42482, 42491, 42496, 42497, 42509, 42557, 42561, 42566, 42577, 42592, 42593, 42594, 42606, 42628, 42634, 42637, 42647, 42664, 42676, 42682, 42695, 42710, 42717, 42718, 42722, 42726, 42739, 42762, 42768, 42779, 42786, 42796, 42803, 42808, 42817, 42821, 42830, 42836, 42843, 42845, 42847, 42861, 42866, 42876, 42883, 42893, 42896, 42907, 42915, 42920, 42936, 42938, 42942, 42953, 42971, 42982, 42986, 42987, 42988, 42991, 42998, 43002, 43009, 43016, 43018, 43019, 43042, 43044, 43058, 43072, 43087, 43090, 43103, 43107, 43109, 43113, 43120, 43121, 43125, 43141, 43150, 43162, 43167, 43170, 43179, 43187, 43206, 43216, 43223, 43231, 43234, 43235, 43247, 43263, 43270, 43271, 43272, 43279, 43299, 43310, 43322, 43350, 43364, 43381, 43390, 43394, 43410, 43412, 43425, 43446, 43454, 43463, 43473, 43482, 43500, 43509, 43521, 43523, 43530, 43551, 43568, 43585, 43591, 43605, 43615, 43629, 43641, 43664, 43666, 43680, 43690, 43710, 43735, 43741, 43742, 43766, 43771, 43781, 43784, 43786, 43820, 43829, 43834, 43839, 43845, 43864, 43888, 43906, 43910, 43912, 43921, 43939, 43970, 43974, 43976, 44012, 44030, 44042, 44044, 44054, 44068, 44078, 44081, 44083, 44100, 44104, 44123, 44128, 44134, 44143, 44145, 44160, 44183, 44199, 44200, 44210, 44213, 44240, 44272, 44308, 44314, 44339, 44377, 44396, 44398, 44412, 44467, 44492, 44502, 44514, 44520, 44525, 44527, 44536, 44543, 44547, 44576, 44577, 44580, 44585, 44586, 44613, 44614, 44615, 44627, 44629, 44654, 44658, 44659, 44661, 44669, 44677, 44683, 44701, 44707, 44718, 44728, 44737, 44738, 44751, 44776, 44781, 44792, 44817, 44832, 44845, 44854, 44859, 44881, 44891, 44900, 44920, 44922, 44962, 44973, 44981, 45000, 45014, 45016, 45021, 45025, 45050, 45054, 45095, 45104, 45112, 45120, 45127, 45128, 45140, 45154, 45158, 45176, 45192, 45199, 45202, 45213, 45219, 45237, 45241, 45255, 45262, 45263, 45266, 45271, 45280, 45282, 45285, 45295, 45296, 45301, 45318, 45334, 45341, 45350, 45358, 45365, 45382, 45390, 45402, 45403, 45414, 45417, 45423, 45438, 45449, 45503, 45506, 45507, 45520, 45525, 45534, 45554, 45565, 45579, 45583, 45584, 45585, 45597, 45602, 45619, 45624, 45626, 45634, 45635, 45652, 45656, 45657, 45661, 45664, 45665, 45673, 45685, 45695, 45696, 45705, 45712, 45720, 45726, 45735, 45741, 45744, 45753, 45756, 45758, 45785, 45805, 45813, 45815, 45817, 45825, 45850, 45852, 45897, 45904, 45916, 45942, 45944, 45955, 45956, 45957, 45975, 45978, 45982, 45991, 45994, 45998, 46020, 46021, 46026, 46040, 46049, 46055, 46060, 46065, 46068, 46074, 46078, 46094, 46100, 46118, 46125, 46143, 46144, 46145, 46149, 46181, 46186, 46195, 46205, 46208, 46231, 46236, 46242, 46269, 46276, 46279, 46286, 46303, 46306, 46330, 46333, 46346, 46368, 46371, 46375, 46383, 46386, 46388, 46393, 46414, 46415, 46431, 46432, 46437, 46443, 46453, 46468, 46473, 46477, 46480, 46483, 46513, 46515, 46529, 46536, 46537, 46540, 46560, 46561, 46563, 46575, 46580, 46593, 46607, 46618, 46634, 46648, 46653, 46656, 46662, 46667, 46701, 46704, 46729, 46738, 46743, 46770, 46795, 46798, 46808, 46811, 46812, 46819, 46839, 46842, 46854, 46868, 46885, 46889, 46893, 46912, 46917, 46941, 46944, 46951, 46954, 46975, 46977, 46997, 47000, 47009, 47032, 47034, 47042, 47047, 47051, 47060, 47077, 47111, 47124, 47126, 47140, 47153, 47158, 47159, 47164, 47173, 47198, 47200, 47208, 47211, 47219, 47224, 47241, 47245, 47249, 47260, 47262, 47264, 47265, 47266, 47272, 47280, 47308, 47311, 47316, 47328, 47332, 47346, 47357, 47365, 47405, 47419, 47447, 47499, 47517, 47521, 47544, 47567, 47572, 47576, 47579, 47598, 47623, 47627, 47628, 47632, 47635, 47639, 47642, 47645, 47648, 47657, 47665, 47666, 47682, 47707, 47713, 47718, 47726, 47739, 47744, 47748, 47761, 47768, 47779, 47789, 47796, 47839, 47840, 47853, 47884, 47922, 47939, 47947, 47949, 47954, 47974, 47995, 48000, 48023, 48033, 48035, 48036, 48039, 48041, 48068, 48069, 48073, 48090, 48103, 48106, 48108, 48133, 48134, 48146, 48152, 48161, 48196, 48202, 48203, 48204, 48230, 48243, 48248, 48257, 48277, 48287, 48300, 48303, 48333, 48348, 48371, 48379, 48383, 48389, 48392, 48394, 48396, 48399, 48401, 48411, 48440, 48457, 48462, 48488, 48490, 48491, 48504, 48510, 48526, 48537, 48538, 48561, 48567, 48570, 48594, 48599, 48603, 48648, 48649, 48663, 48668, 48676, 48684, 48689, 48710, 48719, 48737, 48747, 48758, 48763, 48796, 48798, 48809, 48829, 48840, 48860, 48873, 48904, 48907, 48912, 48917, 48918, 48935, 48939, 48951, 48973, 48985, 49000, 49022, 49039, 49059, 49077, 49079, 49080, 49086, 49099, 49110, 49136, 49147, 49161, 49167, 49168, 49169, 49171, 49173, 49180, 49192, 49194, 49203, 49214, 49223, 49258, 49265, 49280, 49291, 49294, 49319, 49328, 49334, 49335, 49338, 49339, 49352, 49364, 49370, 49384, 49386, 49388, 49404, 49423, 49449, 49450, 49468, 49476, 49479, 49498, 49499, 49501, 49511, 49548, 49559, 49569, 49585, 49591, 49597, 49608, 49616, 49623, 49691, 49694, 49709, 49712, 49731, 49734, 49752, 49804, 49811, 49815, 49825, 49827, 49830, 49833, 49837, 49850, 49862, 49869, 49878, 49881, 49916, 49923, 49943, 49958, 49963, 49964, 49970, 49975, 49976, 49977, 49986, 49996, 50007, 50022, 50024, 50027, 50035, 50048, 50058, 50064, 50075, 50080, 50088, 50101, 50108, 50109, 50116, 50131, 50134, 50144, 50146, 50161, 50169, 50172, 50177, 50191, 50230, 50231, 50237, 50244, 50255, 50258, 50271, 50276, 50278, 50284, 50307, 50308, 50317, 50323, 50326, 50352, 50367, 50387, 50394, 50395, 50396, 50402, 50409, 50420, 50424, 50433, 50436, 50448, 50462, 50497, 50519, 50534, 50535, 50538, 50561, 50576, 50587, 50597, 50598, 50618, 50656, 50658, 50660, 50670, 50679, 50686, 50692, 50697, 50703, 50711, 50715, 50734, 50735, 50774, 50784, 50792, 50793, 50796, 50798, 50800, 50803, 50819, 50828, 50832, 50835, 50846, 50853, 50863, 50874, 50876, 50877, 50881, 50885, 50890, 50896, 50900, 50907, 50912, 50913, 50914, 50919, 50939, 50946, 50956, 50970, 50995, 50997, 51029, 51032, 51035, 51045, 51051, 51052, 51061, 51096, 51102, 51104, 51126, 51152, 51172, 51187, 51196, 51199, 51208, 51210, 51231, 51237, 51246, 51252, 51267, 51270, 51277, 51290, 51292, 51293, 51299, 51305, 51311, 51318, 51329, 51337, 51348, 51375, 51387, 51390, 51404, 51411, 51413, 51416, 51422, 51454, 51461, 51478, 51479, 51485, 51486, 51491, 51496, 51509, 51519, 51523, 51535, 51541, 51551, 51562, 51568, 51572, 51584, 51608, 51616, 51628, 51634, 51648, 51657, 51662, 51679, 51711, 51718, 51726, 51727, 51735, 51746, 51752, 51754, 51764, 51770, 51794, 51808, 51810, 51822, 51824, 51843, 51845, 51854, 51869, 51871, 51874, 51878, 51885, 51891, 51903, 51931, 51932, 51937, 51940, 51943, 51949, 51951, 51959, 51960, 51965, 51969, 51973, 51988, 51999, 52040, 52045, 52079, 52085, 52094, 52106, 52108, 52115, 52124, 52145, 52154, 52180, 52191, 52192, 52204, 52214, 52236, 52238, 52249, 52250, 52268, 52273, 52277, 52300, 52306, 52320, 52325, 52326, 52351, 52356, 52406, 52424, 52426, 52447, 52450, 52456, 52463, 52483, 52503, 52507, 52508, 52522, 52532, 52543, 52551, 52569, 52573, 52578, 52579, 52598, 52604, 52619, 52627, 52640, 52643, 52657, 52663, 52669, 52686, 52687, 52693, 52695, 52699, 52718, 52721, 52727, 52734, 52741, 52762, 52774, 52780, 52783, 52788, 52798, 52802, 52812, 52813, 52818, 52835, 52839, 52845, 52848, 52849, 52872, 52903, 52952, 52972, 52986, 52995, 53006, 53019, 53029, 53031, 53045, 53055, 53063] -------------------------------------------------------------------------------- /Classsification Week 5 - Boosting/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classsification Week 5 - Boosting/.DS_Store -------------------------------------------------------------------------------- /Classsification Week 5 - Boosting/Assignment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classsification Week 5 - Boosting/Assignment 1/.DS_Store -------------------------------------------------------------------------------- /Classsification Week 5 - Boosting/Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Classsification Week 5 - Boosting/Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 2 - KNN Neighbors/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 2 - KNN Neighbors/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 2 - KNN Neighbors/Assignment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 2 - KNN Neighbors/Assignment 1/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 2 - KNN Neighbors/Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 2 - KNN Neighbors/Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 2 - KNN Neighbors/Quiz 1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 27, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from sklearn.metrics.pairwise import cosine_distances\n", 10 | "from sklearn.metrics.pairwise import cosine_similarity\n", 11 | "import numpy as np" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 28, 17 | "metadata": { 18 | "collapsed": true 19 | }, 20 | "outputs": [], 21 | "source": [ 22 | "a1=np.array([[2,0,1,1,1,1,1,1,1,0]])\n", 23 | "a2=np.array([[0,2,2,1,1,0,0,0,1,1]])" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 29, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "data": { 33 | "text/plain": [ 34 | "array([[ 0.43519414]])" 35 | ] 36 | }, 37 | "execution_count": 29, 38 | "metadata": {}, 39 | "output_type": "execute_result" 40 | } 41 | ], 42 | "source": [ 43 | "cosine_similarity(a1,a2)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 30, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "array([[ 0.56480586]])" 55 | ] 56 | }, 57 | "execution_count": 30, 58 | "metadata": {}, 59 | "output_type": "execute_result" 60 | } 61 | ], 62 | "source": [ 63 | "cosine_distances(a1, a2)" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 32, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "name": "stdout", 73 | "output_type": "stream", 74 | "text": [ 75 | "[[ 0.8660254]]\n" 76 | ] 77 | } 78 | ], 79 | "source": [ 80 | "vec1 = np.array([[1,1,0,1,1]])\n", 81 | "vec2 = np.array([[0,1,0,1,1]])\n", 82 | "#print(cosine_similarity([vec1, vec2]))\n", 83 | "print(cosine_similarity(vec1, vec2))" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 33, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "array([[ 3.60555128]])" 95 | ] 96 | }, 97 | "execution_count": 33, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "from sklearn.metrics.pairwise import euclidean_distances\n", 104 | "euclidean_distances(a1,a2)" 105 | ] 106 | } 107 | ], 108 | "metadata": { 109 | "kernelspec": { 110 | "display_name": "Python 3", 111 | "language": "python", 112 | "name": "python3" 113 | }, 114 | "language_info": { 115 | "codemirror_mode": { 116 | "name": "ipython", 117 | "version": 3 118 | }, 119 | "file_extension": ".py", 120 | "mimetype": "text/x-python", 121 | "name": "python", 122 | "nbconvert_exporter": "python", 123 | "pygments_lexer": "ipython3", 124 | "version": "3.6.3" 125 | } 126 | }, 127 | "nbformat": 4, 128 | "nbformat_minor": 2 129 | } 130 | -------------------------------------------------------------------------------- /Clustering Week 3- Kmeans/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 3- Kmeans/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 4- EM for Gaussian mixtures/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 4- EM for Gaussian mixtures/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 4- EM for Gaussian mixtures/Assignment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 4- EM for Gaussian mixtures/Assignment 1/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 4- EM for Gaussian mixtures/Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 4- EM for Gaussian mixtures/Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 5 - Mixed Membership Modeling via Latent Dirichlet Allocation/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 5 - Mixed Membership Modeling via Latent Dirichlet Allocation/.DS_Store -------------------------------------------------------------------------------- /Clustering Week 6 - Hierarchical Clustering & Closing Remarks/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Clustering Week 6 - Hierarchical Clustering & Closing Remarks/.DS_Store -------------------------------------------------------------------------------- /Machine Learning Case Study Week 2 - Regression/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Machine Learning Case Study Week 2 - Regression/.DS_Store -------------------------------------------------------------------------------- /Machine Learning Case Study Week 3 - Classification/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Machine Learning Case Study Week 3 - Classification/.DS_Store -------------------------------------------------------------------------------- /Machine Learning Case Study Week 4 - Clustering and Retrieving/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Machine Learning Case Study Week 4 - Clustering and Retrieving/.DS_Store -------------------------------------------------------------------------------- /Machine Learning Case Study Week 5 - Recommender System/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Machine Learning Case Study Week 5 - Recommender System/.DS_Store -------------------------------------------------------------------------------- /Machine Learning Case Study Week 5 - Recommender System/Quiz.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "data": { 10 | "text/plain": [ 11 | "24.883499999999998" 12 | ] 13 | }, 14 | "execution_count": 1, 15 | "metadata": {}, 16 | "output_type": "execute_result" 17 | } 18 | ], 19 | "source": [ 20 | "1.73*3.29+0.01*3.44+5.22*3.67" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 2, 26 | "metadata": {}, 27 | "outputs": [ 28 | { 29 | "data": { 30 | "text/plain": [ 31 | "50.79970000000001" 32 | ] 33 | }, 34 | "execution_count": 2, 35 | "metadata": {}, 36 | "output_type": "execute_result" 37 | } 38 | ], 39 | "source": [ 40 | "0.03*0.82+4.41*9.71+2.05*3.88" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 6, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "import numpy as np\n", 50 | "a1=np.array([0.03, 4.41, 2.05])" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 7, 56 | "metadata": { 57 | "collapsed": true 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "b1=np.array([3.29, 3.44, 3.67])\n", 62 | "b2=np.array([0.82, 9.71, 3.88])\n", 63 | "b3=np.array([8.34, 1.72, 0.02])" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 9, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "22.7926" 75 | ] 76 | }, 77 | "execution_count": 9, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "(a1*b1).sum()" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 10, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "50.799700000000009" 95 | ] 96 | }, 97 | "execution_count": 10, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "(a1*b2).sum()" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 11, 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "7.8764000000000003" 115 | ] 116 | }, 117 | "execution_count": 11, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "(a1*b3).sum()" 124 | ] 125 | } 126 | ], 127 | "metadata": { 128 | "kernelspec": { 129 | "display_name": "Python 3", 130 | "language": "python", 131 | "name": "python3" 132 | }, 133 | "language_info": { 134 | "codemirror_mode": { 135 | "name": "ipython", 136 | "version": 3 137 | }, 138 | "file_extension": ".py", 139 | "mimetype": "text/x-python", 140 | "name": "python", 141 | "nbconvert_exporter": "python", 142 | "pygments_lexer": "ipython3", 143 | "version": "3.6.3" 144 | } 145 | }, 146 | "nbformat": 4, 147 | "nbformat_minor": 2 148 | } 149 | -------------------------------------------------------------------------------- /Machine Learning Case Study Week 6 - Deep Learning, Searching for Images/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Machine Learning Case Study Week 6 - Deep Learning, Searching for Images/.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Machine-Learning-University-of-Washington 2 | 3 | ## My machine learning projects about regression, classification and clustering All projects are completed using Pandas, Numpy, Scikit-learn and matplotlib. 4 | 5 | There are 4 courses in the Machine Learning Specialization provided by University of Washington via Coursera. 6 | 7 | Course 1: Machine Learning Foundations: A Case Study Approach 8 | 9 | Course 2: Machine Learning: Regression 10 | 11 | Course 3: Machine Learning: Classification 12 | 13 | Course 4: Machine Learning: Clustering & Retrieval 14 | 15 | While the guidelines of the assignments were provided n in Graphlab, I choose to use Pandas, Numpy and Scikit-learn to enhance my coding skills. 16 | 17 | 18 | -------------------------------------------------------------------------------- /Regression Week 1 - Simple Linear/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 1 - Simple Linear/.DS_Store -------------------------------------------------------------------------------- /Regression Week 1 - Simple Linear/Yun-Regression_Week1_SimpleRegression-Assignment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import pandas as pd\n", 12 | "import numpy as np\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "from sklearn import datasets, linear_model\n", 15 | "from sklearn.metrics import mean_squared_error, r2_score" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 2, 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "# Load the traiing dataset\n", 27 | "kcsales_train = pd.read_csv('kc_house_train_data.csv')\n", 28 | "# Load the test dataset\n", 29 | "kcsales_test = pd.read_csv('kc_house_test_data.csv')" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "# Use only one feature\n", 41 | "kcsales_X_train = kcsales_train['sqft_living']\n", 42 | "kcsales_X_test = kcsales_test['sqft_living']" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 22, 48 | "metadata": { 49 | "collapsed": true 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "kcsales_y_train = kcsales_train['price']\n", 54 | "kcsales_y_test = kcsales_test['price']" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 6, 60 | "metadata": { 61 | "collapsed": true 62 | }, 63 | "outputs": [], 64 | "source": [ 65 | "# Create linear regression object\n", 66 | "regr = linear_model.LinearRegression()" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 7, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)" 78 | ] 79 | }, 80 | "execution_count": 7, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "# Train the model using the training sets for sqt\n", 87 | "regr.fit(kcsales_X_train.values.reshape((-1,1)), kcsales_y_train)" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 8, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "# Make predictions using the testing set of sqt\n", 99 | "kcsales_y_pred = regr.predict(kcsales_X_test.values.reshape((-1,1)))" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 20, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "array([ 356085.0615985 , 784662.49783662, 435033.53669499, ...,\n", 111 | " 663420.19679557, 604208.8404732 , 240481.93735006])" 112 | ] 113 | }, 114 | "execution_count": 20, 115 | "metadata": {}, 116 | "output_type": "execute_result" 117 | } 118 | ], 119 | "source": [ 120 | "kcsales_y_pred" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 9, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "Coefficients: \n", 133 | " [ 281.95883963]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "# The coefficients of sqt model\n", 139 | "print('Coefficients: \\n', regr.coef_)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 10, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "intercept: \n", 152 | " -47116.0790729\n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "# The intercept of sqt model\n", 158 | "print('intercept: \\n', regr.intercept_)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 17, 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "cal_sqt = (800000-regr.intercept_)/regr.coef_" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 18, 175 | "metadata": {}, 176 | "outputs": [ 177 | { 178 | "name": "stdout", 179 | "output_type": "stream", 180 | "text": [ 181 | "when sqt is 800000, sqt should be [ 3004.39624515]\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "# Quize answer\n", 187 | "print ('when sqt is 800000, sqt should be', cal_sqt)" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 19, 193 | "metadata": {}, 194 | "outputs": [ 195 | { 196 | "data": { 197 | "text/plain": [ 198 | "array([ 700074.84594751])" 199 | ] 200 | }, 201 | "execution_count": 19, 202 | "metadata": {}, 203 | "output_type": "execute_result" 204 | } 205 | ], 206 | "source": [ 207 | "regr.predict(2650)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 23, 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "name": "stdout", 217 | "output_type": "stream", 218 | "text": [ 219 | "Variance score: 0.49\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "# Explained variance score: 1 is perfect prediction\n", 225 | "print('Variance score: %.2f' % r2_score(kcsales_y_test, kcsales_y_pred))" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 31, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZQAAAD8CAYAAABQFVIjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3X2UXHWd5/H3tztpoDsiSRuYLCHV\n4ZjjLs6OCL1MQGfOSpCHjMdw5ugOTB+TEcbe6R1HXNedCdPOuqO2O+qOCgd56FHHQDcgw6pkmTDZ\ngOw565gBOivyIGTSQLqJoCSER9sFQ777x/1VUl1dD/dW3epb1fV5nfM7fetXv/tUldxv/R7u75q7\nIyIiUq+OrA9AREQWBgUUERFJhQKKiIikQgFFRERSoYAiIiKpUEAREZFUKKCIiEgqFFBERCQVCigi\nIpKKRVkfwHx5y1ve4n19fVkfhohIS9m1a9cBd18ep2zbBJS+vj4mJiayPgwRkZZiZlNxy6rJS0RE\nUlE1oJjZ28zswYL0spl93MyWmdkOM9sT/i4N5c3MrjazSTN7yMzOKNjWplB+j5ltKsg/08weDutc\nbWYW8hPvQ0REslE1oLj7bnc/3d1PB84EZoDvApuBe9x9DXBPeA1wEbAmpEHgOoiCA/Bp4DeBs4BP\n5wNEKDNYsN6FIT/RPkREJDtJm7zWAU+4+xSwAdgS8rcAF4flDcCNHvkn4AQzWwFcAOxw94Pu/gKw\nA7gwvHe8u+/0aC79G4u2lWQfIiKSkaQB5RLglrB8krs/CxD+nhjyTwaeLlhnX8irlL+vRH4t+xAR\nkYzEDihm1gW8H/i7akVL5HkN+bXsY3Yhs0EzmzCzif3791fZpIjIwjI+Pk5fXx8dHR309fUxPj7e\n0P0lqaFcBPxfd/95eP3zfDNT+PtcyN8HnFKw3krgmSr5K0vk17KPWdx91N373b1/+fJYw6hFRBaE\n8fFxBgcHmZqawt2ZmppicHCwoUElSUC5lKPNXQBbgfxIrU3AHQX5G8NIrLXAS6G5ajtwvpktDZ3x\n5wPbw3uvmNnaMLprY9G2kuxDRESA4eFhZmZmZuXNzMwwPDzcsH3GurHRzLqB9wL/viD7r4DbzOxy\nYBr4YMjfBqwHJolGhH0YwN0PmtlngQdCuc+4+8GwPAR8CzgOuCukxPsQEZHI9PR0ovw0WDSwauHr\n7+933SkvIu2ir6+Pqam5N7nncjn27t0beztmtsvd++OU1Z3yIiIL0MjICN3d3bPyuru7GRkZadg+\nFVBERBaggYEBRkdHyeVymBm5XI7R0VEGBgYatk81eYmISFlq8hIRkXmngCIiIqlQQBERkVQooIiI\nSCoUUEREJBUKKCIikgoFFBERSYUCioiIpEIBRUREUqGAIiIiqVBAERGRVCigiIhIKhRQREQkFQoo\nIiKSCgUUERFJhQKKiIikQgFFRERSESugmNkJZna7mT1uZo+Z2dlmtszMdpjZnvB3aShrZna1mU2a\n2UNmdkbBdjaF8nvMbFNB/plm9nBY52ozs5CfeB8iIpKNuDWUq4B/cPd/CbwDeAzYDNzj7muAe8Jr\ngIuANSENAtdBFByATwO/CZwFfDofIEKZwYL1Lgz5ifYhIiLZqRpQzOx44LeBbwC4++vu/iKwAdgS\nim0BLg7LG4AbPfJPwAlmtgK4ANjh7gfd/QVgB3BheO94d9/p0QPubyzaVpJ9iIhIRuLUUE4F9gN/\na2Y/MrOvm1kPcJK7PwsQ/p4Yyp8MPF2w/r6QVyl/X4l8atiHiIhkJE5AWQScAVzn7u8EfsHRpqdS\nrESe15BfSax1zGzQzCbMbGL//v1VNikiIvWIE1D2Afvc/b7w+naiAPPzfDNT+PtcQflTCtZfCTxT\nJX9liXxq2Mcs7j7q7v3u3r98+fIYpyoiIrWqGlDc/WfA02b2tpC1DvgJsBXIj9TaBNwRlrcCG8NI\nrLXAS6G5ajtwvpktDZ3x5wPbw3uvmNnaMLprY9G2kuxDREQysihmuT8Bxs2sC3gS+DBRMLrNzC4H\npoEPhrLbgPXAJDATyuLuB83ss8ADodxn3P1gWB4CvgUcB9wVEsBfJdmHiIhkx6KBVQtff3+/T0xM\nZH0YIiItxcx2uXt/nLK6U15ERFKhgCIiIqlQQBERkVQooIiISCoUUEREJBUKKCIikgoFFBERSYUC\nioiIpEIBRUREUqGAIiIiqVBAERGRVCigiIhIKhRQREQkFQooIiKSCgUUERFJhQKKiIikQgFFRERS\noYAiIiKpUEAREZFUKKCIiEgqYgUUM9trZg+b2YNmNhHylpnZDjPbE/4uDflmZleb2aSZPWRmZxRs\nZ1Mov8fMNhXknxm2PxnWtVr3ISIi2UhSQ3mPu5/u7v3h9WbgHndfA9wTXgNcBKwJaRC4DqLgAHwa\n+E3gLODT+QARygwWrHdhLfsQEZHs1NPktQHYEpa3ABcX5N/okX8CTjCzFcAFwA53P+juLwA7gAvD\ne8e7+053d+DGom0l2YeIiGQkbkBx4H+Z2S4zGwx5J7n7swDh74kh/2Tg6YJ194W8Svn7SuTXsg8R\nEcnIopjl3uXuz5jZicAOM3u8Qlkrkec15FcSa50Q/AYBVq1aVWWTIiJSj1g1FHd/Jvx9DvguUR/I\nz/PNTOHvc6H4PuCUgtVXAs9UyV9ZIp8a9lF83KPu3u/u/cuXL49zqiIiUqOqAcXMeszsTfll4Hzg\nEWArkB+ptQm4IyxvBTaGkVhrgZdCc9V24HwzWxo6488Htof3XjGztWF018aibSXZh4iIZCROk9dJ\nwHfDSN5FwM3u/g9m9gBwm5ldDkwDHwzltwHrgUlgBvgwgLsfNLPPAg+Ecp9x94NheQj4FnAccFdI\nAH+VZB8iIpIdiwZWLXz9/f0+MTGR9WGIiLQUM9tVcLtIRbpTXkREUqGAIiIiqVBAERGRVCigiIhI\nKhRQREQkFQooIiKSCgUUERFJhQKKiIikQgFFRERSoYAiIiKpUEAREZFUKKCIiEgqFFBERCQVCigi\nIpIKBRQREUmFAoqIiKRCAUVERFKhgCIiIqlQQBERkVQooIiISCpiBxQz6zSzH5nZneH1ajO7z8z2\nmNm3zawr5B8TXk+G9/sKtnFlyN9tZhcU5F8Y8ibNbHNBfuJ9iIhINpLUUK4AHit4/QXgK+6+BngB\nuDzkXw684O5vBb4SymFmpwGXAG8HLgSuDUGqE/gacBFwGnBpKJt4HyIikp1YAcXMVgK/A3w9vDbg\nXOD2UGQLcHFY3hBeE95fF8pvAG5199fc/SlgEjgrpEl3f9LdXwduBTbUuA8REclI3BrKV4E/BQ6H\n173Ai+5+KLzeB5wclk8GngYI778Uyh/JL1qnXH4t+xARkYxUDShm9j7gOXffVZhdoqhXeS+t/Gr7\nP8LMBs1swswm9u/fX2IVERFJS5wayruA95vZXqLmqHOJaiwnmNmiUGYl8ExY3gecAhDefzNwsDC/\naJ1y+Qdq2Mcs7j7q7v3u3r98+fIYpyoiIrWqGlDc/Up3X+nufUSd6t939wHgXuADodgm4I6wvDW8\nJrz/fXf3kH9JGKG1GlgD3A88AKwJI7q6wj62hnWS7kNERDKyqHqRsv4MuNXMPgf8CPhGyP8GcJOZ\nTRLVGi4BcPdHzew24CfAIeCP3f0NADP7KLAd6AS+6e6P1rIPERHJjrXLD/v+/n6fmJjI+jBERFqK\nme1y9/44ZXWnvIiIpEIBRUREUqGAItIixsfH6evro6Ojg76+PsbHx7M+JJFZ6umUF5F5Mj4+zuDg\nIDMzMwBMTU0xODgIwMDAQJaHJnKEaigiLWB4ePhIMMmbmZlheHg4oyMSmUsBRaQFTE9PJ8oXyYIC\nikgLWLVqVaJ8kSwooIi0gJGREbq7u2fldXd3MzIyktERicylgCLSAgYGBhgdHSWXy2Fm5HI5RkdH\n1SEvTUUBRaRAPUNz6x3WW2n98fFxhoeHmZ6eZtWqVYyMjCiYSPNx97ZIZ555potUMjY25t3d3U70\nKAQHvLu728fGxhq6brX16922SD2ACY95ndVcXiJBX18fU1NTc/JzuRx79+5t2LrV1gfq2rZIPZLM\n5aWAIhJ0dHRQ6v+DmXH48OESa6SzbrX1gbq2LVIPTQ4pUoN6hubWO6y30voaMiytQgFFJKhnaG69\nw3orra8hw9Iy4na2tHpSp7zEMTY25rlczs3Mc7lcoo7vetattn692xapFeqUn0t9KCIiyakPRURE\n5p0CiojMCz3PZeHT81BEpOH0PJf2ULWGYmbHmtn9ZvZjM3vUzP4y5K82s/vMbI+ZfdvMukL+MeH1\nZHi/r2BbV4b83WZ2QUH+hSFv0sw2F+Qn3oeINB89z6U9xGnyeg04193fAZwOXGhma4EvAF9x9zXA\nC8DlofzlwAvu/lbgK6EcZnYacAnwduBC4Foz6zSzTuBrwEXAacCloSxJ9yEizUnPc2kPVQNKGDn2\nani5OCQHzgVuD/lbgIvD8obwmvD+Ootu990A3Orur7n7U8AkcFZIk+7+pLu/DtwKbAjrJN2HSEOo\n/b8+ujmzPcTqlA81iQeB54AdwBPAi+5+KBTZB5wclk8GngYI778E9BbmF61TLr+3hn2IpC7f/j81\nNYW7H2n/V1CJTzdntodYAcXd33D304GVRDWKf1WqWPhbqqbgKeZX2scsZjZoZhNmNrF///4Sq4hU\nV679f9OmTaqxxKTnubSHRKO83P1FM/vfwFrgBDNbFGoIK4FnQrF9wCnAPjNbBLwZOFiQn1e4Tqn8\nAzXso/h4R4FRiG5sTHKuInnl2vnfeOMNQCOW4hoYGNDns8DFGeW13MxOCMvHAecBjwH3Ah8IxTYB\nd4TlreE14f3vh9v3twKXhBFaq4E1wP3AA8CaMKKri6jjfmtYJ+k+RFIXp51fI5ZE4jV5rQDuNbOH\niC7+O9z9TuDPgE+Y2SRR/8U3QvlvAL0h/xPAZgB3fxS4DfgJ8A/AH4emtEPAR4HtRIHqtlCWpPsQ\naYRS7f+laMSStDvN5SUSQ+EjeDs6Oo40dxXSA69kIdJcXiIl1Dr0t/h57oODgxqxJC1h71647jr4\n6U/naYdxpyVu9aTp69tbrc9lL7fe0NCQppOXpvPTn7p/9KPuMDe9/HJt2yTB9PWqoUhbqHXqj3Lr\nbdu2jb1793LTTTcB8KEPfWjehg/rJkvJe/55uPJKMIvSySfDNddkeEBxI0+rJ9VQWlNaD5Yys1m1\njHwys5rXq7XWU48s9inN46WX3D/zmdI1kErpsstq3ycJaiiZX+jnKymgtJ40L565XK5kYMjlcjWv\nV+s265HFPiU7MzPuf/3X7t3dyYPI4KD71FT9x6CAooCyIKR58awUnKo9erfcerXWeuqRxT5l/rz2\nmvv117ufeGLyADIw4L57d/rHpICigLIgpH3xLBU44tSCygUc1VCkXocOud90k3tfX/IAcvHF7g8+\n2PhjVEBRQFkQ5uPiWc8+1IciSR0+7P6d77i//e3JA8h73+u+c+f8H7MCigLKgjAfF896a0FpDRpI\nIot9Sm0OH3bfvt39rLOSB5BzznG/++5oG1lSQFFAWTCqXTzrvbj29vaqCUlS9YMfuL/nPckDyDve\n4b51a/YBpJgCigJKW6i3BjM2NuaLFy+eE0y6urr0q19i27XL/X3vSx5A3vpW91tucX/jjazPoDIF\nFAWUtlBvH0u59Xt7e91dTUtS2mOPuf/e7yUPICtWuH/96+6vv571GSSTJKAkeh6KSDOp9znl5cod\nPHjwyFMa83fJ65kn7WvvXvjsZ+Gb30y23vHHR+sNDsKxxzbk0JqOpl6RplduqpF6n1Neaf1ap2rR\ntCit79ln4WMfOzqdyerV8YJJZyd8/vPw8stRneSll6LttEswAdTkJc2t2g2J9fahpHnToob0tqYD\nB9w3b07ehAXuw8Puzz+f9Rk0FupDUUBZKKr1k5S7WTFu30dh2Z6eHu/o6Ci5v+L91nKs80n9P+XV\nOh8WuH/84+7PPpv1GcwvBRQFlKYX94JXrqYQVa5Lb7eWWsLQ0FDFQFJuO4XnUW69+Z4WRTWl2WZm\n3L/8ZfeenuQB5CMfcd+7N+szyJYCigJKU0tywSv3qz8/22/c8p2dnRUvqJ2dnRUDQrl7YIrPI+0a\nSi01jWaqKWWhnvmwfv/33R9/POszaC4KKAooTS3JBa9Sf0ap8pVqCpV+pVcKCEnPI07gi6PWmka7\nTSDZCvNhtTIFFAWUppb0gpekKanaRb7cr/RaaiiVglecgFRNrTWNhV5Dyc+H9eu/njyAnHee+w9/\nmPUZtJZUAwpwCnAv8BjwKHBFyF8G7AD2hL9LQ74BVwOTwEPAGQXb2hTK7wE2FeSfCTwc1rkasFr3\nUS4poDSPpBe8pDWarq6uREHIvbY+lDg1lHou4rXWNBZaH8pCmA+rlaUdUFbkL9jAm4B/Bk4Dvghs\nDvmbgS+E5fXAXeGivxa4z48GhyfD36VhOR8g7gfODuvcBVwU8hPto1JSQMlGrVPGF28jSfly83NV\nu8APDQ1VrKkUr1+tD6Xei3i9MyGnPcprPkeOLbT5sFpZqgFlzgpwB/BeYDewwo8Gnd1h+Qbg0oLy\nu8P7lwI3FOTfEPJWAI8X5B8pl3QflY5bAWX+1fpQq3Lbilu+UlNU3ItgpW0MDQ2VPK7e3l7v7e1N\n7YLbTDWNRh/L5z73937ccTsSB5C3vtX95pujfhRpjIYFFKAPmAaOB14seu+F8PdO4N0F+fcA/cAn\ngU8V5P9FyOsH7i7I/y3gzrCcaB+Vjl0BZf41ui0/6YOv8nN01XPspYJKIzXL/SRpf5ftNh9WK2tI\nQAGWALuA3/XKF/u/L3GxPxP4zyUCyn8C/k2JgPI/a9lHiWMeBCaAiVWrVjXq85Yyqt2bUW4obpwL\naCPvoC+3/cLU2dlZ12fTLIEirnpHjj31lPtllyUPIMcf737VVe6//GVjz0/KSz2gAIuB7cAnCvLU\n5CUl5S+WlX7hl7rQp3F/Sv4Xc2F/SGdnZ8kaRbWLerWO+no+n2ZpyooraQ3lmWfc/+RPkgcQ+JXD\nZoclC3aYc6tJNaAQdXzfCHy1KP9LzO4w/2JY/h1md5jfH/KXAU8RdcgvDcvLwnsPhLL5Tvn1teyj\nUlJAmR9xb/YrdVGqdlNi3LvSSx1Dfp0kAwMqBcV6aiitOKy32udVz3xYxx9/tcPSlvo82knaAeXd\n4Qt+CHgwpPVAL1FT057wNx8cDPga8ATRUOD+gm1dRjTUdxL4cEF+P/BIWOcajg4bTryPckkBZX7E\nrZkUX+zdvWKZxYsXVxwOXHgRinMM5ebsyl/ExsbGKq5fTx9Kq954WBjQTznlNP/ABx6sKYAUz4fV\nijW2dpJqQFkoSQFlfsS92a8w9fb2VrwjPm6qNlNw3OBWrZZV74W/FWsojZ4Pq9X6lNqJAooCSmZq\nqaEAvmTJkrqCQJxRXnFS3BpOPVrhF7nmw5I8BRQFlMzU0odSTyr1qz7OXe/11HDSqEnU+4s87V/0\nhw65j425r16dPIBs2JDNfFiq1cwPBRQFlEwlGeVVbyrVl5Fk352dnYlqOMWj0grvys833c3H51tv\nDafe+bD+8R8beIIxtEItb6FQQFFAaQq11hSSpuILeZI+lFIXoHK1LDM7EsDGxsZ88eLFc8p0dXU1\n5KJW+Gu83PQwlWpO9cyHdfbZR+fDapZaQSv2Q7UqBRQFlLrFuXCUK1P8y32+Uk9PT6LaUeGd88Xn\nMjQ0VPIc8r+Ca50vrJbvIB/M4pxToR/8wP3cc5MHkN/4Dfc77pg7H1Yz1QpadaRcK1JAUUCpS6kL\nx+LFi2fNUzU0NFTy4lIqv5ZU6zYWL14c6xji3FBZLmhUC5b1XNRqCSJH0zs9l3socQCBPQ6X+KpV\nqyseWzPVCprpWBY6BRQFlFmSNlPE+YVf7mJXbbbeuKl40sUk282fY7XtJznfJCnpRa32IPI2h1tq\nCCDPOFzusGjOd1pJM9UKmqm2tNApoCigHFHLf7x67wepN5WaxDFpM9rY2FjZmxcLm8bSPtekfSjJ\nRsXlHL5eQwB5yeFjDsdW3H61QNhstYJm6c9Z6BRQFFCOKHcRLnURSGN0VpwLdLUylS4McQJLR0dH\nxYv0okWLSnao15s6OjoqTmZZakRY5c/71xyuqiGAHHK40iH+vT1xAqFqBe1JAUUBxd2rTx9S3JFe\nb9+Hmfmxx1b+FdzR0eHr1q0r24S1bt26WOfW09OTekBIeq7l3ssHi2oBY25QW+rw+RoCiDt8zmFZ\nTeeSZLizagXtRwFFAcXd4/UN5H9h1lIzqbW5aNGiRWWbo7q6uuaMsCq84GU1gqxUqvSZdXZ2xqgF\nLXH4VE0B5Ior3K+55vY5I9OS/ChQ7ULiUEBRQHH37PtCFnLKNxkmC8THOnzc4ZUagsiow6qqgaDw\nx0G+Fljq38F83YQprS9JQOlAFqxVq1ZlfQgL1vr16wGYnp6uUGoR8BHgZ0TX8V8CXyF6Vl0148Db\niCbWNqJnxUX7mpmZYXh4uORaAwMD7N27F3fn0KFDuDs33XQTuVwOMyOXyzE2NsaBAwcYGBiIc6pz\nj2x8nL6+Pjo6Oujr62N8fLym7cgCFDfytHpqxxpKGjP4KpVPvb29RX05HQ6/7/BEDTWQ7zm8I/a+\ns7qBTx3z7QfVUASiX6vRvwdphOeff55f/OI8okcFOfAGUc3i1Bhr7wDO4WgN5GLgx7H3Xar2OR81\nh+HhYWZmZmblVaoxSXtRQFngcrlc1ofQUo455pgqJc4DdnL0B/r3gH8dY8s7w7r5AHI+sBMzS3yM\n3d3djIyMzMobHx9ncHCQqakp3J2pqSkGBwdTDyrlmvgqN/1Ju1BAWWCKf6WuX7+e7u7urA+rZbz2\n2mtFOecAd3M0gOwgeup0NT8GNnA0gJxD9NDR2dy9ZFDp7e1laGjoyA+Czs5OIPqBMDo6Oqf/Y75q\nDuX65dRfJ4D6UBaScu3bhfd9xJnCJK3pU1ozne5wRw19INF8WFE/yuxtVpoXrJaZg0uZr2lR1IfS\nftCw4fYMKOWGsKpjvlJ6m8PNNQSQZxz+0IvnwyqV8jc5lroQl1snaSCYz2lRdHNje1FAadOAosAR\nJ+W81vmwuruv9I985GOJJ6rMK3UhLhcIent7E120VXOQRkk1oADfBJ4DHinIW0bUmLwn/F0a8g24\nGpgkGvpyRsE6m0L5PcCmgvwzgYfDOlcDVus+KqVWDyhxfhWmPWvuwkj1zIf15148H1ZXV1fsfce5\noJcKBF1dXXPuso+7LdUcJG1pB5TfBs4oCihfBDaH5c3AF8LyeuCucNFfC9znR4PDk+Hv0rCcDxD3\nA2eHde4CLqplH9VSKweUcvNsFd/tPN/Pc2/OVO98WEur7qNcDSVpraLweytcL8mEniKNlmpAibZH\nX1FA2Q2sCMsrgN1h+Qbg0uJywKXADQX5N4S8FcDjBflHyiXdR7VzaOWAUqnmkf/lWmrKjawnUJyf\ntMRhuMYA8hWHk2rabyObl5J0sKtWIo02HwHlxaL3Xwh/7wTeXZB/D9APfBL4VEH+X4S8fuDugvzf\nAu6sZR/VzqHZA0qlC0O1vpHe3t6SzSbZX+wbkeqdD6t8cI6bCvs/GnEhj9vBrn4TmQ9JAkra96GU\nukvLa8ivZR9zC5oNmtmEmU3s37+/ymazU+qmtA996EOYGX19fSxbtqzi+s8///ycexBef/31Rh7y\nPKpnPqybmTsf1lTdRzQ1NcXw8DAjIyMcPnyYvXv3AqR2l/rIyMice4fM7Mj8YXm6a12aTpyog5q8\nGqpaZ3pXV1fZ6d4XXqp3PqzT5+1YC5sb064pDA0NzamZFm+zmR7JKwsX89Dk9SVmd5h/MSz/DrM7\nzO8P+cuAp4g65JeG5WXhvQdC2Xyn/Ppa9lEtNVtAadQjaFszbXB4KEHgyKcdDmdneuy5XK4h94DE\n2WazPZJXFiZSHuV1C/As8CtgH3A50EvUd7En/M0HBwO+BjxBNBS4v2A7lxEN9Z0EPlyQ3w88Eta5\nhqPDhhPvo1JqloDSTA+Iyi6d57CzSrAolX4Y1s36+I8mM0u1plDtYWeF21QfiswHdGNjcwaU9h3W\ne7bD3TUEkB87vL8Jjr986u3tTW36lDj/Pkp1zBc/tVGjviRNCihNGlDap2byDq9tPqxJh0u91HxY\nzZoqPeZ3aGgo0b+Pan1p1WofqrFIIyigNGFAGRoayvzi17hU63xYz3o0H1a1Z683Z6p2n09aEzzm\nt1UtMKhPRRohSUBZhDTc+Pg4119/fdaHkaIc8CngDxOu9wrRLUg3AP8v7YOad8VDdoslfUbIqlWr\nmJqaO6w5l8sdGZpcy/70rBKZL3oeSoONj4+zcePGqDrYsn4NuIqjP3r3Ei+YHAaGgTcRjaU4Pmyn\n9YNJLper+gyQpM8IKXX/SamHaSXdn55VIvMmblWm1VMWTV5jY2Mtesf6UoeRhM1X+TTicebDavU0\nNDRUsRO91r6Leu7AVx+KNALqQ8k+oIyNjbXQg6ry82G9UUMA+apHM/pmfQ7ppCVLlnhvb2/siRpL\nzaGW5egqze0laVNAmeeAUvyfeN26dU1+w2J+PqyXawggf+NpzIfVzKmrq+vIhbjS91jt34Eu5rIQ\nKKDMY0BpjXtLFjl8xKNRVUkDyM0ejeLK+hzST5WCRWdnZ8WbDM2s6qMD1NwkC4ECSoMDSuEv0eac\nY6ue+bDu8PmcDyurFKc5sru7u+ScWvmkaVCkHSQJKBrllVDxzMCHDx/O+pCCDcCPia5jbwDjwKkx\n1rsHeBdHZ+TdADzYoGNsHoODg+RyuYplZmZm2LZtW/TLq4TC4bgasiuiYcOJlZoyPBvnATs5+mP4\ne8BvxFhvJ/BejgaQ84AfNugYm9e2bdtiDcednp4uG3gKh+NqyK6IAkos4+PjR551UerGs/lxNnA3\nRwPIDqLJlqt5iKjWkQ8g54TttLfp6WkGBgaqllu1alWs+0PqvYdEZEGI2zbW6qnWPpSxsbGK8zU1\nLp3kcInDDQn7QCY96j9pxr6d5kn5vo1K86sVdqrHGcGlUV6yEKFO+XQCytjY2Dxe5HodftfhGodH\nEwSQ1p4PK4tUHChK3Xza29tt0c5bAAAH6klEQVSrgCDiCiipBJTG10xWOfw7hy87PJgggLzs0T0k\nx2Z+YW6l1NnZWbbmoJqFSHlJAkr+YVYLXn9/v09MTMQu39fXl3J/yUnAlcAVMcu/RtSBfm9Iu4Bm\nGAyQLTNj2bJlPP/883Pe6+zsZHBwkC1btswaONHd3c3o6GisPhMRmc3Mdrl7f5yy6pQvo/7hnkuB\nEY7+SP4ZlYPJr4hGW40A68L67wE+A/wfFEwif/RHf8RVV11VsgN8y5YtXHvttYyOjpLL5TAzcrmc\ngonIfIlblWn1lLTJq9rDjuamJQ5/7nAoQfOVO3zB4cKwfvZNQ41O+WlNKs0wsGTJkjnNjWY264FV\naqYSmR+oD6X+gFK9D+UYh485vJQwgLjDqEd9KNlf4OczFXd0VwoKChgizSFJQFEfSgXj4+NcccUV\nob1+EfAHRE1QKxLu/eaw3u6E67WOnp4ebrjhBjUtiSwwbdGHYmYXmtluM5s0s82N2MfAwAC3336A\n6Af2r4C/IV4wuQN4J0dvJhygVYNJLpdjbGyMsbGxWf0SY2Njs36ZvPrqqwomIm2uJWsoZtYJ/DPR\nHCL7gAeAS939J+XWqaWG8sorsHQpvPFGtZL3AP+FZp3CpKenh40bN7Jt2zamp6fp6enh1VdfnVWm\no6ODw4cPk8vlGBkZUXAQESBZDaVVnyl/FjDp7k8CmNmtRPOLlA0otTh8GHp74bnnit/ZSRRAsp/C\n5Nhjj6Wnp2fOMFo1QYnIfGvVJq+TgacLXu8Leal685th50740pfgu9+NAszY2Djd3efRyGDS29s7\nq0lpaGiIzs5OILrXYmho6Mh7v/zlLzlw4MCczjE1QYnIfGvVgGIl8ua03ZnZoJlNmNnE/v37a9rR\nqafCJz8JF18MZlG/SvF9DoUX/3xfQzWl+iHy6cCBA7OCwbXXXsuhQ4dwdw4dOsS1115b07mIiDRS\nq/ahnA38V3e/ILy+EsDd/1u5dWrpQxERaXftMMrrAWCNma02sy7gEmBrxsckItLWWrJT3t0PmdlH\nge1AJ/BNd38048MSEWlrLRlQANx9G7At6+MQEZFIqzZ5iYhIk1FAERGRVCigiIhIKlpy2HAtzGw/\nkH9i1luAAxkeTtZ0/jp/nX/7Snr+OXdfHqdg2wSUQmY2EXdc9UKk89f56/x1/o3Ytpq8REQkFQoo\nIiKSinYNKKNZH0DGdP7tTeff3hp2/m3ZhyIiIulr1xqKiIikrO0Cynw8OjgLZnaKmd1rZo+Z2aNm\ndkXIX2ZmO8xsT/i7NOSbmV0dPoeHzOyMgm1tCuX3mNmmrM4pKTPrNLMfmdmd4fVqM7svnMe3w0Si\nmNkx4fVkeL+vYBtXhvzdZnZBNmeSnJmdYGa3m9nj4d/A2W323f/H8O/+ETO7xcyOXejfv5l908ye\nM7NHCvJS+87N7Ewzezisc7WZlXpsyGylnsexUBPRRJJPAKcCXcCPgdOyPq6Uzm0FcEZYfhPRI5JP\nA74IbA75m4EvhOX1wF1Ez5ZZC9wX8pcBT4a/S8Py0qzPL+Zn8AngZuDO8Po24JKwfD0wFJb/A3B9\nWL4E+HZYPi38mzgGWB3+rXRmfV4xz30L8IdhuQs4oV2+e6KH6z0FHFfwvf/BQv/+gd8GzgAeKchL\n7TsH7gfODuvcBVxU9Ziy/lDm+Qs4G9he8PpK4Mqsj6tB53oH8F5gN7Ai5K0AdoflG4BLC8rvDu9f\nCtxQkD+rXLMmYCVwD3AucGf4T3AAWFT83RPNUn12WF4Uylnxv4fCcs2cgOPDBdWK8tvlu88/wXVZ\n+D7vBC5oh+8f6CsKKKl85+G9xwvyZ5Url9qtyWteHh2ctVCFfydwH3CSuz8LEP6eGIqV+yxa9TP6\nKvCnwOHwuhd40d0PhdeF53HkHMP7L4XyrXrupwL7gb8NTX5fN7Me2uS7d/efAv8dmAaeJfo+d9E+\n33+htL7zk8NycX5F7RZQYj06uJWZ2RLgfwAfd/eXKxUtkecV8puWmb0PeM7ddxVmlyjqVd5ruXMP\nFhE1fVzn7u8EfkHU3FHOgjr/0E+wgaiZ6l8APcBFJYou1O8/jqTnXNNn0W4BZR9wSsHrlcAzGR1L\n6sxsMVEwGXf374Tsn5vZivD+CuC5kF/us2jFz+hdwPvNbC9wK1Gz11eBE8ws/8yfwvM4co7h/TcD\nB2nNc4fouPe5+33h9e1EAaYdvnuA84Cn3H2/u/8K+A5wDu3z/RdK6zvfF5aL8ytqt4CyYB8dHEZg\nfAN4zN2/XPDWViA/cmMTUd9KPn9jGP2xFngpVJG3A+eb2dLwy+/8kNe03P1Kd1/p7n1E3+n33X0A\nuBf4QChWfO75z+QDobyH/EvCKKDVwBqijsmm5u4/A542s7eFrHXAT2iD7z6YBtaaWXf4f5A//7b4\n/ouk8p2H914xs7XhM91YsK3ysu5UyqATaz3RCKgngOGsjyfF83o3UZX0IeDBkNYTtQ3fA+wJf5eF\n8gZ8LXwODwP9Bdu6DJgM6cNZn1vCz+HfcnSU16lEF4RJ4O+AY0L+seH1ZHj/1IL1h8NnspsYo1qa\nJQGnAxPh+/8e0Yidtvnugb8EHgceAW4iGqm1oL9/4BaiPqNfEdUoLk/zOwf6w+f5BHANRYM+SiXd\nKS8iIqlotyYvERFpEAUUERFJhQKKiIikQgFFRERSoYAiIiKpUEAREZFUKKCIiEgqFFBERCQV/x9X\n3s6grQJywAAAAABJRU5ErkJggg==\n", 236 | "text/plain": [ 237 | "" 238 | ] 239 | }, 240 | "metadata": {}, 241 | "output_type": "display_data" 242 | } 243 | ], 244 | "source": [ 245 | "# Plot outputs\n", 246 | "\n", 247 | "plt.scatter(kcsales_X_test, kcsales_y_test, color='black')\n", 248 | "plt.plot(kcsales_X_test, kcsales_y_pred, color='blue', linewidth=3)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "collapsed": true 256 | }, 257 | "outputs": [], 258 | "source": [ 259 | "plt.show()\n", 260 | "plt.close()" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 27, 266 | "metadata": { 267 | "collapsed": true 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "#plt.xticks(())\n", 272 | "#plt.yticks(())" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 30, 278 | "metadata": { 279 | "collapsed": true 280 | }, 281 | "outputs": [], 282 | "source": [] 283 | } 284 | ], 285 | "metadata": { 286 | "kernelspec": { 287 | "display_name": "Python 3", 288 | "language": "python", 289 | "name": "python3" 290 | }, 291 | "language_info": { 292 | "codemirror_mode": { 293 | "name": "ipython", 294 | "version": 3 295 | }, 296 | "file_extension": ".py", 297 | "mimetype": "text/x-python", 298 | "name": "python", 299 | "nbconvert_exporter": "python", 300 | "pygments_lexer": "ipython3", 301 | "version": "3.6.3" 302 | } 303 | }, 304 | "nbformat": 4, 305 | "nbformat_minor": 2 306 | } 307 | -------------------------------------------------------------------------------- /Regression Week 2 - Multiple Regression/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 2 - Multiple Regression/.DS_Store -------------------------------------------------------------------------------- /Regression Week 3 - Accessing Performance/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 3 - Accessing Performance/.DS_Store -------------------------------------------------------------------------------- /Regression Week 4 - Ridge/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 4 - Ridge/.DS_Store -------------------------------------------------------------------------------- /Regression Week 5 - Lasso/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 5 - Lasso/.DS_Store -------------------------------------------------------------------------------- /Regression Week 5 - Lasso/W5 - Assigment 1/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 5 - Lasso/W5 - Assigment 1/.DS_Store -------------------------------------------------------------------------------- /Regression Week 5 - Lasso/W5 - Assigment 1/Yun-week-5-ridge-regression-assignment-1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Regression Week 5: Feature Selection and LASSO (Interpretation)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook, you will use LASSO to select features, building on a pre-implemented solver for LASSO (using scikit-learn). You will:\n", 15 | "* Run LASSO with different L1 penalties.\n", 16 | "* Choose best L1 penalty using a validation set.\n", 17 | "* Choose best L1 penalty using a validation set, with additional constraint on the size of subset.\n", 18 | "\n", 19 | "In the second notebook, you will implement your own LASSO solver, using coordinate descent. " 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "# Fire up pandas, numpy and sklearn" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 4, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "import pandas as pd\n", 38 | "import numpy as np\n", 39 | "from sklearn import datasets, linear_model\n", 40 | "from math import log" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "# Load in house sales data" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 6, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "# Load all dataset\n", 57 | "sales = pd.read_csv('kc_house_data.csv')" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": { 63 | "collapsed": true 64 | }, 65 | "source": [ 66 | "# Create new features" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "Create new features by performing following transformation on inputs:" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 8, 79 | "metadata": { 80 | "collapsed": true 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "from math import log, sqrt\n", 85 | "sales['sqft_living_sqrt'] = sales['sqft_living'].apply(sqrt)\n", 86 | "sales['sqft_lot_sqrt'] = sales['sqft_lot'].apply(sqrt)\n", 87 | "sales['bedrooms_square'] = sales['bedrooms']*sales['bedrooms']\n", 88 | "sales['floors_square'] = sales['floors']*sales['floors']" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 10, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "data": { 98 | "text/plain": [ 99 | "dtype('float64')" 100 | ] 101 | }, 102 | "execution_count": 10, 103 | "metadata": {}, 104 | "output_type": "execute_result" 105 | } 106 | ], 107 | "source": [ 108 | "sales['floors_square'].dtype" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "*Squaring bedrooms will increase the separation between not many bedrooms (e.g. 1) and lots of bedrooms (e.g. 4) since 1^2 = 1 but 4^2 = 16. Consequently this variable will mostly affect houses with many bedrooms.\n", 116 | "\n", 117 | "*On the other hand, taking square root of sqft_living will decrease the separation between big house and small house. The owner may not be exactly twice as happy for getting a house that is twice as big." 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "2.. Using the entire house dataset, learn regression weights using an L1 penalty of 5e2. Make sure to add \"normalize=True\" when creating the Lasso object. Refer to the following code snippet for the list of features.\n", 125 | "\n", 126 | "Note. From here on, the list 'all_features' refers to the list defined in this snippet." 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 11, 132 | "metadata": {}, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/plain": [ 137 | "Lasso(alpha=500.0, copy_X=True, fit_intercept=True, max_iter=1000,\n", 138 | " normalize=True, positive=False, precompute=False, random_state=None,\n", 139 | " selection='cyclic', tol=0.0001, warm_start=False)" 140 | ] 141 | }, 142 | "execution_count": 11, 143 | "metadata": {}, 144 | "output_type": "execute_result" 145 | } 146 | ], 147 | "source": [ 148 | "all_features = ['bedrooms', 'bedrooms_square',\n", 149 | " 'bathrooms',\n", 150 | " 'sqft_living', 'sqft_living_sqrt',\n", 151 | " 'sqft_lot', 'sqft_lot_sqrt',\n", 152 | " 'floors', 'floors_square',\n", 153 | " 'waterfront', 'view', 'condition', 'grade',\n", 154 | " 'sqft_above',\n", 155 | " 'sqft_basement',\n", 156 | " 'yr_built', 'yr_renovated']\n", 157 | "\n", 158 | "model_all = linear_model.Lasso(alpha=5e2, normalize=True) # set parameters\n", 159 | "model_all.fit(sales[all_features], sales['price']) # learn weights" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "3.. Quiz Question: Which features have been chosen by LASSO, i.e. which features were assigned nonzero weights?" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 12, 172 | "metadata": {}, 173 | "outputs": [ 174 | { 175 | "data": { 176 | "text/plain": [ 177 | "array([ 0. , 0. , 0. , 134.43931396,\n", 178 | " 0. , 0. , 0. , 0. ,\n", 179 | " 0. , 0. , 24750.00458561, 0. ,\n", 180 | " 61749.10309071, 0. , 0. , -0. ,\n", 181 | " 0. ])" 182 | ] 183 | }, 184 | "execution_count": 12, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "model_all.coef_" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "Quiz Answer: 'sqft_living', 'view', 'grade'\n" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 13, 203 | "metadata": { 204 | "collapsed": true 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "testing = pd.read_csv('wk3_kc_house_test_data.csv')\n", 209 | "training = pd.read_csv('wk3_kc_house_train_data.csv')\n", 210 | "validation = pd.read_csv('wk3_kc_house_valid_data.csv')" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "Make sure to create the 4 features as we did in #1:" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 14, 223 | "metadata": { 224 | "collapsed": true 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "testing['sqft_living_sqrt'] = testing['sqft_living'].apply(sqrt)\n", 229 | "testing['sqft_lot_sqrt'] = testing['sqft_lot'].apply(sqrt)\n", 230 | "testing['bedrooms_square'] = testing['bedrooms']*testing['bedrooms']\n", 231 | "testing['floors_square'] = testing['floors']*testing['floors']\n", 232 | "\n", 233 | "training['sqft_living_sqrt'] = training['sqft_living'].apply(sqrt)\n", 234 | "training['sqft_lot_sqrt'] = training['sqft_lot'].apply(sqrt)\n", 235 | "training['bedrooms_square'] = training['bedrooms']*training['bedrooms']\n", 236 | "training['floors_square'] = training['floors']*training['floors']\n", 237 | "\n", 238 | "validation['sqft_living_sqrt'] = validation['sqft_living'].apply(sqrt)\n", 239 | "validation['sqft_lot_sqrt'] = validation['sqft_lot'].apply(sqrt)\n", 240 | "validation['bedrooms_square'] = validation['bedrooms']*validation['bedrooms']\n", 241 | "validation['floors_square'] = validation['floors']*validation['floors']" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "5.. Now for each l1_penalty in [10^1, 10^1.5, 10^2, 10^2.5, ..., 10^7] (to get this in Python, type np.logspace(1, 7, num=13).)\n", 249 | "\n", 250 | "~Learn a model on TRAINING data using the specified l1_penalty. Make sure to specify normalize=True in the constructor:\n", 251 | "\n", 252 | "~Compute the RSS on VALIDATION for the current model (print or save the RSS)\n", 253 | "\n", 254 | "~Report which L1 penalty produced the lower RSS on VALIDATION." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 26, 260 | "metadata": {}, 261 | "outputs": [ 262 | { 263 | "name": "stdout", 264 | "output_type": "stream", 265 | "text": [ 266 | "3.982133273e+14\n", 267 | "10.0\n", 268 | "3.99041900253e+14\n", 269 | "31.6227766017\n", 270 | "4.29791604073e+14\n", 271 | "100.0\n", 272 | "4.63739831045e+14\n", 273 | "316.227766017\n", 274 | "6.45898733634e+14\n", 275 | "1000.0\n", 276 | "1.22250685943e+15\n", 277 | "3162.27766017\n", 278 | "1.22250685943e+15\n", 279 | "10000.0\n", 280 | "1.22250685943e+15\n", 281 | "31622.7766017\n", 282 | "1.22250685943e+15\n", 283 | "100000.0\n", 284 | "1.22250685943e+15\n", 285 | "316227.766017\n", 286 | "1.22250685943e+15\n", 287 | "1000000.0\n", 288 | "1.22250685943e+15\n", 289 | "3162277.66017\n", 290 | "1.22250685943e+15\n", 291 | "10000000.0\n" 292 | ] 293 | }, 294 | { 295 | "data": { 296 | "text/plain": [ 297 | "398213327300134.94" 298 | ] 299 | }, 300 | "execution_count": 26, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "RSS = []\n", 307 | "for l1_penalty in np.logspace(1, 7, num=13):\n", 308 | " # Learn a model on TRAINING data using the specified l1_penalty.\n", 309 | " model = linear_model.Lasso(alpha=l1_penalty, normalize=True)\n", 310 | " model_1 = model.fit(training[all_features], training['price'])\n", 311 | " # Predit price based on model_1 and valication data\n", 312 | " valid_prediction = model_1.predict(validation[all_features])\n", 313 | " # Calculate erros\n", 314 | " valid_errors = valid_prediction - validation['price']\n", 315 | " # Calculate RSS\n", 316 | " valid_RSS = (valid_errors * valid_errors).sum()\n", 317 | " RSS.append(valid_RSS)\n", 318 | " print(valid_RSS)\n", 319 | " print(l1_penalty)\n", 320 | "min(RSS)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "6.. Quiz Question: Which was the best value for the l1_penalty, i.e. which value of l1_penalty produced the lowest RSS on VALIDATION data?\n", 328 | "\n", 329 | "Answer: 10" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "7.. Now that you have selected an L1 penalty, compute the RSS on TEST data for the model with the best L1 penalty." 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 28, 342 | "metadata": {}, 343 | "outputs": [ 344 | { 345 | "data": { 346 | "text/plain": [ 347 | "98467402552698.797" 348 | ] 349 | }, 350 | "execution_count": 28, 351 | "metadata": {}, 352 | "output_type": "execute_result" 353 | } 354 | ], 355 | "source": [ 356 | "l1_penalty = 10\n", 357 | "model = linear_model.Lasso(alpha=l1_penalty, normalize=True)\n", 358 | "model_2 = model.fit(training[all_features], training['price'])\n", 359 | "# Predit price based on model_1 and valication data\n", 360 | "test_prediction = model_2.predict(testing[all_features])\n", 361 | "# Calculate erros\n", 362 | "test_errors = test_prediction - testing['price']\n", 363 | "# Calculate RSS\n", 364 | "test_RSS = (test_errors * test_errors).sum()\n", 365 | "test_RSS" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "8.. Quiz Question: Using the best L1 penalty, how many nonzero weights do you have? Count the number of nonzero coefficients first, and add 1 if the intercept is also nonzero. A succinct way to do this is" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 29, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "15" 384 | ] 385 | }, 386 | "execution_count": 29, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "np.count_nonzero(model_2.coef_) + np.count_nonzero(model_2.intercept_)" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "# Limit the number of nonzero weights" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "9.. What if we absolutely wanted to limit ourselves to, say, 7 features? This may be important if we want to derive \"a rule of thumb\" --- an interpretable model that has only a few features in them.\n", 407 | "\n", 408 | "You are going to implement a simple, two phase procedure to achieve this goal:\n", 409 | " \n", 410 | "~Explore a large range of ‘l1_penalty’ values to find a narrow region of ‘l1_penalty’ values where models are likely to have the desired number of non-zero weights.\n", 411 | "\n", 412 | "~Further explore the narrow region you found to find a good value for ‘l1_penalty’ that achieves the desired sparsity. Here, we will again use a validation set to choose the best value for ‘l1_penalty’." 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "10.. Assign 7 to the variable ‘max_nonzeros’." 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 30, 425 | "metadata": { 426 | "collapsed": true 427 | }, 428 | "outputs": [], 429 | "source": [ 430 | "max_nonzeros = 7" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "## Exploring the larger range of values to find a narrow range with the desired sparsity" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "11.. Exploring large range of l1_penalty\n", 445 | "\n", 446 | "For l1_penalty in np.logspace(1, 4, num=20):\n", 447 | "\n", 448 | "Fit a regression model with a given l1_penalty on TRAIN data. Add \"alpha=l1_penalty\" and \"normalize=True\" to the parameter list.\n", 449 | "\n", 450 | "Extract the weights of the model and count the number of nonzeros. Take account of the intercept as we did in #8, adding 1 whenever the intercept is nonzero. Save the number of nonzeros to a list." 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 32, 456 | "metadata": {}, 457 | "outputs": [ 458 | { 459 | "data": { 460 | "text/plain": [ 461 | "[15, 15, 15, 15, 13, 12, 11, 10, 7, 6, 6, 6, 5, 3, 3, 2, 1, 1, 1, 1]" 462 | ] 463 | }, 464 | "execution_count": 32, 465 | "metadata": {}, 466 | "output_type": "execute_result" 467 | } 468 | ], 469 | "source": [ 470 | "num = []\n", 471 | "for l1_penalty in np.logspace(1, 4, num=20):\n", 472 | " # Learn a model on TRAINING data using the specified l1_penalty.\n", 473 | " model = linear_model.Lasso(alpha=l1_penalty, normalize=True)\n", 474 | " model_3 = model.fit(training[all_features], training['price'])\n", 475 | " nonzeros = np.count_nonzero(model_3.coef_) + np.count_nonzero(model_3.intercept_)\n", 476 | " num.append(nonzeros)\n", 477 | "num" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 33, 483 | "metadata": {}, 484 | "outputs": [ 485 | { 486 | "data": { 487 | "text/plain": [ 488 | "array([ 10. , 14.38449888, 20.69138081, 29.76351442,\n", 489 | " 42.81332399, 61.58482111, 88.58667904, 127.42749857,\n", 490 | " 183.29807108, 263.66508987, 379.26901907, 545.55947812,\n", 491 | " 784.75997035, 1128.83789168, 1623.77673919, 2335.72146909,\n", 492 | " 3359.81828628, 4832.93023857, 6951.92796178, 10000. ])" 493 | ] 494 | }, 495 | "execution_count": 33, 496 | "metadata": {}, 497 | "output_type": "execute_result" 498 | } 499 | ], 500 | "source": [ 501 | "np.logspace(1, 4, num=20)" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "12.. Out of this large range, we want to find the two ends of our desired narrow range of l1_penalty. At one end, we will have l1_penalty values that have too few non-zeros, and at the other end, we will have an l1_penalty that has too many non-zeros.\n", 509 | "\n", 510 | "More formally, find:" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "The largest l1_penalty that has more non-zeros than ‘max_nonzeros’ (if we pick a penalty smaller than this value, we will definitely have too many non-zero weights)Store this value in the variable ‘l1_penalty_min’ (we will use it later)\n", 518 | "\n", 519 | "The smallest l1_penalty that has fewer non-zeros than ‘max_nonzeros’ (if we pick a penalty larger than this value, we will definitely have too few non-zero weights)Store this value in the variable ‘l1_penalty_max’ (we will use it later)" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "Hint: there are many ways to do this, e.g.:\n", 527 | "\n", 528 | "Programmatically within the loop above\n", 529 | "Creating a list with the number of non-zeros for each value of l1_penalty and inspecting it to find the appropriate boundaries." 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "Answer:\n", 537 | "#the largest l1_penalty: l1_penalty_min = 127.42749857\n", 538 | "#the smallest l1_penalty: l1_penalty_max = 263.66508987" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "14.. Exploring narrower range of l1_penalty\n", 546 | "\n", 547 | "We now explore the region of l1_penalty we found: between ‘l1_penalty_min’ and ‘l1_penalty_max’. We look for the L1 penalty in this range that produces exactly the right number of nonzeros and also minimizes RSS on the VALIDATION set.\n", 548 | "\n", 549 | "For l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):\n", 550 | "\n", 551 | "Fit a regression model with a given l1_penalty on TRAIN data. As before, use \"alpha=l1_penalty\" and \"normalize=True\".\n", 552 | "Measure the RSS of the learned model on the VALIDATION set\n", 553 | "Find the model that the lowest RSS on the VALIDATION set and has sparsity equal to ‘max_nonzeros’. (Again, take account of the intercept when counting the number of nonzeros.)" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 44, 559 | "metadata": {}, 560 | "outputs": [ 561 | { 562 | "name": "stdout", 563 | "output_type": "stream", 564 | "text": [ 565 | "[127.42749857, 134.59789811210527, 141.76829765421053, 148.93869719631579, 156.10909673842104, 163.2794962805263, 170.44989582263156, 177.62029536473682, 184.79069490684208, 191.96109444894734, 199.13149399105259, 206.30189353315785, 213.47229307526311, 220.64269261736837, 227.81309215947365, 234.98349170157891, 242.15389124368417, 249.32429078578943, 256.49469032789472, 263.66508986999997]\n", 566 | "[10, 10, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6]\n", 567 | "[435374677102611.62, 437009229124363.88, 438236128386837.94, 439158937799561.0, 440037365263228.5, 440777489641495.5, 441566698090006.62, 442406413188507.38, 443296716874128.62, 444239780525925.75, 445230739842366.81, 446268896864489.31, 447112919434395.5, 447998187851290.0, 448924706672948.69, 449892475899371.5, 450901498777749.0, 451952426654576.5, 453043924367150.56, 454176669662146.88]\n" 568 | ] 569 | } 570 | ], 571 | "source": [ 572 | "l1_penalty_min = 127.42749857\n", 573 | "l1_penalty_max = 263.66508987\n", 574 | "sparsity = []\n", 575 | "Valid_RSS = []\n", 576 | "l1_penalty_list =[]\n", 577 | "for l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):\n", 578 | " #print(l1_penalty)\n", 579 | " l1_penalty_list.append(l1_penalty)\n", 580 | " # Learn a model on TRAINING data using the specified l1_penalty.\n", 581 | " model = linear_model.Lasso(alpha=l1_penalty, normalize=True)\n", 582 | " model_4 = model.fit(training[all_features], training['price'])\n", 583 | " nonzeros = np.count_nonzero(model_4.coef_) + np.count_nonzero(model_4.intercept_)\n", 584 | " sparsity.append(nonzeros)\n", 585 | " # Predit price based on model_4 and validation data\n", 586 | " valid_prediction = model_4.predict(validation[all_features])\n", 587 | " # Calculate erros\n", 588 | " valid_errors = valid_prediction - validation['price']\n", 589 | " # Calculate RSS\n", 590 | " RSS_v = (valid_errors * valid_errors).sum()\n", 591 | " Valid_RSS.append(RSS_v)\n", 592 | "print(l1_penalty_list)\n", 593 | "print(sparsity)\n", 594 | "print(Valid_RSS)" 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "execution_count": 48, 600 | "metadata": {}, 601 | "outputs": [ 602 | { 603 | "data": { 604 | "text/html": [ 605 | "
\n", 606 | "\n", 619 | "\n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | "
l1_penaltysparsityValid_RSS
0127.427499104.353747e+14
1134.597898104.370092e+14
2141.76829884.382361e+14
3148.93869784.391589e+14
4156.10909774.400374e+14
5163.27949674.407775e+14
6170.44989674.415667e+14
7177.62029574.424064e+14
8184.79069574.432967e+14
9191.96109474.442398e+14
10199.13149474.452307e+14
11206.30189464.462689e+14
12213.47229364.471129e+14
13220.64269364.479982e+14
14227.81309264.489247e+14
15234.98349264.498925e+14
16242.15389164.509015e+14
17249.32429164.519524e+14
18256.49469064.530439e+14
19263.66509064.541767e+14
\n", 751 | "
" 752 | ], 753 | "text/plain": [ 754 | " l1_penalty sparsity Valid_RSS\n", 755 | "0 127.427499 10 4.353747e+14\n", 756 | "1 134.597898 10 4.370092e+14\n", 757 | "2 141.768298 8 4.382361e+14\n", 758 | "3 148.938697 8 4.391589e+14\n", 759 | "4 156.109097 7 4.400374e+14\n", 760 | "5 163.279496 7 4.407775e+14\n", 761 | "6 170.449896 7 4.415667e+14\n", 762 | "7 177.620295 7 4.424064e+14\n", 763 | "8 184.790695 7 4.432967e+14\n", 764 | "9 191.961094 7 4.442398e+14\n", 765 | "10 199.131494 7 4.452307e+14\n", 766 | "11 206.301894 6 4.462689e+14\n", 767 | "12 213.472293 6 4.471129e+14\n", 768 | "13 220.642693 6 4.479982e+14\n", 769 | "14 227.813092 6 4.489247e+14\n", 770 | "15 234.983492 6 4.498925e+14\n", 771 | "16 242.153891 6 4.509015e+14\n", 772 | "17 249.324291 6 4.519524e+14\n", 773 | "18 256.494690 6 4.530439e+14\n", 774 | "19 263.665090 6 4.541767e+14" 775 | ] 776 | }, 777 | "execution_count": 48, 778 | "metadata": {}, 779 | "output_type": "execute_result" 780 | } 781 | ], 782 | "source": [ 783 | "df = pd.DataFrame({'l1_penalty':l1_penalty_list,'sparsity':sparsity,'Valid_RSS':Valid_RSS})\n", 784 | "df" 785 | ] 786 | }, 787 | { 788 | "cell_type": "code", 789 | "execution_count": 51, 790 | "metadata": {}, 791 | "outputs": [ 792 | { 793 | "data": { 794 | "text/html": [ 795 | "
\n", 796 | "\n", 809 | "\n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | "
l1_penaltysparsityValid_RSS
0127.427499104.353747e+14
1134.597898104.370092e+14
2141.76829884.382361e+14
3148.93869784.391589e+14
4156.10909774.400374e+14
5163.27949674.407775e+14
6170.44989674.415667e+14
7177.62029574.424064e+14
8184.79069574.432967e+14
9191.96109474.442398e+14
10199.13149474.452307e+14
11206.30189464.462689e+14
12213.47229364.471129e+14
13220.64269364.479982e+14
14227.81309264.489247e+14
15234.98349264.498925e+14
16242.15389164.509015e+14
17249.32429164.519524e+14
18256.49469064.530439e+14
19263.66509064.541767e+14
\n", 941 | "
" 942 | ], 943 | "text/plain": [ 944 | " l1_penalty sparsity Valid_RSS\n", 945 | "0 127.427499 10 4.353747e+14\n", 946 | "1 134.597898 10 4.370092e+14\n", 947 | "2 141.768298 8 4.382361e+14\n", 948 | "3 148.938697 8 4.391589e+14\n", 949 | "4 156.109097 7 4.400374e+14\n", 950 | "5 163.279496 7 4.407775e+14\n", 951 | "6 170.449896 7 4.415667e+14\n", 952 | "7 177.620295 7 4.424064e+14\n", 953 | "8 184.790695 7 4.432967e+14\n", 954 | "9 191.961094 7 4.442398e+14\n", 955 | "10 199.131494 7 4.452307e+14\n", 956 | "11 206.301894 6 4.462689e+14\n", 957 | "12 213.472293 6 4.471129e+14\n", 958 | "13 220.642693 6 4.479982e+14\n", 959 | "14 227.813092 6 4.489247e+14\n", 960 | "15 234.983492 6 4.498925e+14\n", 961 | "16 242.153891 6 4.509015e+14\n", 962 | "17 249.324291 6 4.519524e+14\n", 963 | "18 256.494690 6 4.530439e+14\n", 964 | "19 263.665090 6 4.541767e+14" 965 | ] 966 | }, 967 | "execution_count": 51, 968 | "metadata": {}, 969 | "output_type": "execute_result" 970 | } 971 | ], 972 | "source": [ 973 | "df.sort_values(by=['Valid_RSS'])" 974 | ] 975 | }, 976 | { 977 | "cell_type": "markdown", 978 | "metadata": {}, 979 | "source": [ 980 | "15.. Quiz Question: What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to ‘max_nonzeros’?" 981 | ] 982 | }, 983 | { 984 | "cell_type": "markdown", 985 | "metadata": {}, 986 | "source": [ 987 | "Answer: 156.109097" 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": {}, 993 | "source": [ 994 | "16.. Quiz Question: What features in this model have non-zero coefficients?" 995 | ] 996 | }, 997 | { 998 | "cell_type": "code", 999 | "execution_count": 54, 1000 | "metadata": {}, 1001 | "outputs": [ 1002 | { 1003 | "data": { 1004 | "text/plain": [ 1005 | "array([ -0.00000000e+00, -0.00000000e+00, 1.06108902e+04,\n", 1006 | " 1.63380252e+02, 0.00000000e+00, -0.00000000e+00,\n", 1007 | " -0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\n", 1008 | " 5.06451687e+05, 4.19600436e+04, 0.00000000e+00,\n", 1009 | " 1.16253554e+05, 0.00000000e+00, 0.00000000e+00,\n", 1010 | " -2.61223488e+03, 0.00000000e+00])" 1011 | ] 1012 | }, 1013 | "execution_count": 54, 1014 | "metadata": {}, 1015 | "output_type": "execute_result" 1016 | } 1017 | ], 1018 | "source": [ 1019 | "l1_penalty = 156.109097\n", 1020 | "# Learn a model on TRAINING data using the specified l1_penalty.\n", 1021 | "model = linear_model.Lasso(alpha=l1_penalty, normalize=True)\n", 1022 | "model_5 = model.fit(training[all_features], training['price'])\n", 1023 | "model_5.coef_" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "markdown", 1028 | "metadata": {}, 1029 | "source": [ 1030 | "Answer: feature 3, 4, 10,11,13,16" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "code", 1035 | "execution_count": 56, 1036 | "metadata": {}, 1037 | "outputs": [ 1038 | { 1039 | "data": { 1040 | "text/plain": [ 1041 | "['bedrooms',\n", 1042 | " 'bedrooms_square',\n", 1043 | " 'bathrooms',\n", 1044 | " 'sqft_living',\n", 1045 | " 'sqft_living_sqrt',\n", 1046 | " 'sqft_lot',\n", 1047 | " 'sqft_lot_sqrt',\n", 1048 | " 'floors',\n", 1049 | " 'floors_square',\n", 1050 | " 'waterfront',\n", 1051 | " 'view',\n", 1052 | " 'condition',\n", 1053 | " 'grade',\n", 1054 | " 'sqft_above',\n", 1055 | " 'sqft_basement',\n", 1056 | " 'yr_built',\n", 1057 | " 'yr_renovated']" 1058 | ] 1059 | }, 1060 | "execution_count": 56, 1061 | "metadata": {}, 1062 | "output_type": "execute_result" 1063 | } 1064 | ], 1065 | "source": [ 1066 | "all_features" 1067 | ] 1068 | }, 1069 | { 1070 | "cell_type": "markdown", 1071 | "metadata": {}, 1072 | "source": [ 1073 | "Answer:'bathrooms','sqft_living','waterfront','view','grade','yr_built'" 1074 | ] 1075 | } 1076 | ], 1077 | "metadata": { 1078 | "kernelspec": { 1079 | "display_name": "Python 3", 1080 | "language": "python", 1081 | "name": "python3" 1082 | }, 1083 | "language_info": { 1084 | "codemirror_mode": { 1085 | "name": "ipython", 1086 | "version": 3 1087 | }, 1088 | "file_extension": ".py", 1089 | "mimetype": "text/x-python", 1090 | "name": "python", 1091 | "nbconvert_exporter": "python", 1092 | "pygments_lexer": "ipython3", 1093 | "version": "3.6.3" 1094 | } 1095 | }, 1096 | "nbformat": 4, 1097 | "nbformat_minor": 2 1098 | } 1099 | -------------------------------------------------------------------------------- /Regression Week 5 - Lasso/W5 - Assignment 2/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RachelPengmkt/Machine-Learning-University-of-Washington/c09c64cb3ef864608928f2bd2095ead15ccc17a0/Regression Week 5 - Lasso/W5 - Assignment 2/.DS_Store -------------------------------------------------------------------------------- /Regression Week 5 - Lasso/W5 - Assignment 2/Yun-week-5-lasso-regression-assignment-2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Regression Week 5: LASSO (coordinate descent)\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook, you will implement your very own LASSO solver via coordinate descent. You will:\n", 15 | "* Write a function to normalize features\n", 16 | "* Implement coordinate descent for LASSO\n", 17 | "* Explore effects of L1 penalty" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": { 24 | "collapsed": true 25 | }, 26 | "outputs": [], 27 | "source": [ 28 | "# Fire up pandas, scikitlearn and numpy\n", 29 | "import pandas as pd\n", 30 | "import numpy as np\n", 31 | "from sklearn import datasets, linear_model" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "# Load in house sales data" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "sales = pd.read_csv('kc_house_data.csv')\n", 50 | "train_data = pd.read_csv('kc_house_train_data.csv')\n", 51 | "test_data = pd.read_csv('kc_house_test_data.csv')" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "# Import useful functions from previous notebook" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "3.. Next, from Module 2 (Multiple Regression), copy and paste the ‘get_numpy_data’ function (or equivalent) that takes a data set, a list of features (e.g. [‘sqft_living’, ‘bedrooms’]), to be used as inputs, and a name of the output (e.g. ‘price’). This function returns a ‘feature_matrix’ (2D array) consisting of first a column of ones followed by columns containing the values of the input features in the data set in the same order as the input list. It also returns an ‘output_array’ which is an array of the values of the output in the data set (e.g. ‘price’)." 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 152, 71 | "metadata": { 72 | "collapsed": true 73 | }, 74 | "outputs": [], 75 | "source": [ 76 | "def get_numpy_data(df, features, output):\n", 77 | " df['constant'] = 1 # this is how you add a constant column to an SFrame\n", 78 | " # add the column 'constant' to the front of the features list so that we can extract it along with the others:\n", 79 | " features = ['constant'] + features # this is how you combine two lists\n", 80 | " # select the columns of DataFrame given by the features list into the DataFrame features_df (now including constant):\n", 81 | " features_df = df[features]\n", 82 | " #print(df[features])\n", 83 | " # the following line will convert the features_df into a numpy matrix:\n", 84 | " feature_matrix = features_df.values\n", 85 | " # assign the column of data frame associated with the output to the SArray output_sarray\n", 86 | " output_sarray = df[output]\n", 87 | " # the following will convert the SArray into a numpy array by first converting it to a list\n", 88 | " output_array = output_sarray.values\n", 89 | " return(feature_matrix, output_array)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "4.. Similarly, copy and paste the ‘predict_output’ function (or equivalent) from Module 2. This function accepts a 2D array ‘feature_matrix’ and a 1D array ‘weights’ and return a 1D array ‘predictions’." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": { 103 | "collapsed": true 104 | }, 105 | "outputs": [], 106 | "source": [ 107 | "#A function to predict output given regression weights\n", 108 | "def predict_output(feature_matrix, weights):\n", 109 | " # assume feature_matrix is a numpy matrix containing the features as columns and weights is a corresponding numpy array\n", 110 | " # create the predictions vector by using np.dot()\n", 111 | " predictions = np.dot(feature_matrix, weights)\n", 112 | " return(predictions)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "5.. In the house dataset, features vary wildly in their relative magnitude: ‘sqft_living’ is very large overall compared to ‘bedrooms’, for instance. As a result, weight for ‘sqft_living’ would be much smaller than weight for ‘bedrooms’. This is problematic because “small” weights are dropped first as l1_penalty goes up.\n", 120 | "\n", 121 | "To give equal considerations for all features, we need to normalize features as discussed in the lectures: we divide each feature by its 2-norm so that the transformed feature has norm 1." 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "6.. Write a short function called ‘normalize_features(feature_matrix)’, which normalizes columns of a given feature matrix. The function should return a pair ‘(normalized_features, norms)’, where the second item contains the norms of original features. As discussed in the lectures, we will use these norms to normalize the test data in the same way as we normalized the training data.\n", 129 | "\n", 130 | "Let's see how we can do this normalization easily with Numpy: let us first consider a small matrix." 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "X = np.array([[3.,5.,8.],[4.,12.,15.]])\n", 140 | "print (X)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "Numpy provides a shorthand for computing 2-norms of each column:" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "norms = np.linalg.norm(X, axis=0) # gives [norm(X[:,0]), norm(X[:,1]), norm(X[:,2])]\n", 157 | "print (norms)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "To normalize, apply element-wise division:" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "print (X / norms) # gives [X[:,0]/norm(X[:,0]), X[:,1]/norm(X[:,1]), X[:,2]/norm(X[:,2])]" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "Using the shorthand we just covered, write a short function called `normalize_features(feature_matrix)`, which normalizes columns of a given feature matrix. The function should return a pair `(normalized_features, norms)`, where the second item contains the norms of original features. As discussed in the lectures, we will use these norms to normalize the test data in the same way as we normalized the training data. " 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": { 187 | "collapsed": true 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "def normalize_features(features):\n", 192 | " #calculate norms\n", 193 | " norms = np.linalg.norm(features, axis=0)\n", 194 | " #calculate normalized features\n", 195 | " normalized_features = features/norms\n", 196 | " return (normalized_features, norms)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "To test the function, run the following:" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "features, norms = normalize_features(np.array([[3.,6.,9.],[4.,8.,12.]]))\n", 213 | "print (features)\n", 214 | "# should print\n", 215 | "# [[ 0.6 0.6 0.6]\n", 216 | "# [ 0.8 0.8 0.8]]\n", 217 | "print (norms)\n", 218 | "# should print\n", 219 | "# [5. 10. 15.]" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "# Implementing Coordinate Descent with normalized features" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "We seek to obtain a sparse set of weights by minimizing the LASSO cost function\n", 234 | "```\n", 235 | "SUM[ (prediction - output)^2 ] + lambda*( |w[1]| + ... + |w[k]|).\n", 236 | "```\n", 237 | "(By convention, we do not include `w[0]` in the L1 penalty term. We never want to push the intercept to zero.)\n", 238 | "\n", 239 | "The absolute value sign makes the cost function non-differentiable, so simple gradient descent is not viable (you would need to implement a method called subgradient descent). Instead, we will use **coordinate descent**: at each iteration, we will fix all weights but weight `i` and find the value of weight `i` that minimizes the objective. That is, we look for\n", 240 | "```\n", 241 | "argmin_{w[i]} [ SUM[ (prediction - output)^2 ] + lambda*( |w[1]| + ... + |w[k]|) ]\n", 242 | "```\n", 243 | "where all weights other than `w[i]` are held to be constant. We will optimize one `w[i]` at a time, circling through the weights multiple times. \n", 244 | " 1. Pick a coordinate `i`\n", 245 | " 2. Compute `w[i]` that minimizes the cost function `SUM[ (prediction - output)^2 ] + lambda*( |w[1]| + ... + |w[k]|)`\n", 246 | " 3. Repeat Steps 1 and 2 for all coordinates, multiple times" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "For this assignment, we use **cyclical coordinate descent with normalized features**, where we cycle through coordinates 0 to (d-1) in order, and assume the features were normalized as discussed above. The formula for optimizing each coordinate is as follows:\n", 254 | "```\n", 255 | " ┌ (ro[i] + lambda/2) if ro[i] < -lambda/2\n", 256 | "w[i] = ├ 0 if -lambda/2 <= ro[i] <= lambda/2\n", 257 | " └ (ro[i] - lambda/2) if ro[i] > lambda/2\n", 258 | "```\n", 259 | "where\n", 260 | "```\n", 261 | "ro[i] = SUM[ [feature_i]*(output - prediction + w[i]*[feature_i]) ].\n", 262 | "```\n", 263 | "\n", 264 | "Note that we do not regularize the weight of the constant feature (intercept) `w[0]`, so, for this weight, the update is simply:\n", 265 | "```\n", 266 | "w[0] = ro[i]\n", 267 | "```" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "## Effect of L1 penalty" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "Let us consider a simple model with 2 features:" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": { 288 | "collapsed": true 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "simple_features = ['sqft_living', 'bedrooms']\n", 293 | "my_output = 'price'\n", 294 | "(simple_feature_matrix, output) = get_numpy_data(sales, simple_features, my_output)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "Don't forget to normalize features:" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": { 308 | "collapsed": true 309 | }, 310 | "outputs": [], 311 | "source": [ 312 | "simple_feature_matrix, norms = normalize_features(simple_feature_matrix)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "We assign some random set of initial weights and inspect the values of `ro[i]`:" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": { 326 | "collapsed": true 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "weights = np.array([1., 4., 1.])" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "Use `predict_output()` to make predictions on this data." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "prediction = predict_output(simple_feature_matrix, weights)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "Compute the values of `ro[i]` for each feature in this simple model, using the formula given above, using the formula:\n", 354 | "```\n", 355 | "ro[i] = SUM[ [feature_i]*(output - prediction + w[i]*[feature_i]) ]\n", 356 | "```\n", 357 | "\n", 358 | "*Hint: You can get a Numpy vector for feature_i using:*\n", 359 | "```\n", 360 | "simple_feature_matrix[:,i]\n", 361 | "```" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "***QUIZ QUESTION***\n", 369 | "\n", 370 | "Recall that, whenever `ro[i]` falls between `-l1_penalty/2` and `l1_penalty/2`, the corresponding weight `w[i]` is sent to zero. Now suppose we were to take one step of coordinate descent on either feature 1 or feature 2. What range of values of `l1_penalty` **would not** set `w[1]` zero, but **would** set `w[2]` to zero, if we were to take a step in that coordinate? " 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": null, 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "#calculate ro[1] approach 1:\n", 380 | "i=1\n", 381 | "feature_i = simple_feature_matrix[:,i]\n", 382 | "w = weights\n", 383 | "ro_i = (feature_i *(output - prediction + w[i]*feature_i)).sum()\n", 384 | "ro_1" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "#calculate ro[1] approach 2:\n", 394 | "feature_1 = simple_feature_matrix[:,1]\n", 395 | "w = weights\n", 396 | "ro_1 = (feature_1 *(output - prediction + w[i]*feature_1)).sum()\n", 397 | "ro_1" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "2*ro_1" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "-2*ro_1" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": {}, 422 | "outputs": [], 423 | "source": [ 424 | "#calculate ro[2]\n", 425 | "feature_2 = simple_feature_matrix[:,2]\n", 426 | "w = weights\n", 427 | "ro_2 = (feature_2 *(output - prediction + w[2]*feature_2)).sum()\n", 428 | "ro_2" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "2*ro_2" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": { 444 | "collapsed": true 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "# values of l1_penalty would not set w[1] zero: l1_penalty<2*ro_1 or l1_penalty<-2*ro_1" 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": null, 454 | "metadata": { 455 | "collapsed": true 456 | }, 457 | "outputs": [], 458 | "source": [ 459 | "# values of l1_penalty would set w[2] zero: 11_penalty>=2*ro_2 or l1_penalty >= -2*ro_2" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "#the combined range of l1_penalty: [2*ro_2 ,2*ro_1]\n", 467 | "\n", 468 | "Answer: [161933397.33247894, 175878941.64650351]\n", 469 | "\n" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": {}, 475 | "source": [ 476 | "***QUIZ QUESTION***\n", 477 | "\n", 478 | "What range of values of `l1_penalty` would set **both** `w[1]` and `w[2]` to zero, if we were to take a step in that coordinate? " 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": { 485 | "collapsed": true 486 | }, 487 | "outputs": [], 488 | "source": [ 489 | "# values of l1_penalty would set w[1] zero: l1_penalty>=2*ro_1 or l1_penalty >= -2*ro_2\n", 490 | "# values of l1_penalty would set w[2] zero: l1_penalty>=2*ro_2 or l1_penalty >= -2*ro_2" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": {}, 496 | "source": [ 497 | "#the combined range of l1_penalty: l1_penalty>2*ro_1\n", 498 | "\n", 499 | "Answer: l1_penalty>175878941.64650351" 500 | ] 501 | }, 502 | { 503 | "cell_type": "markdown", 504 | "metadata": {}, 505 | "source": [ 506 | "So we can say that ro[i] quantifies the significance of the i-th feature: the larger ro[i] is, the more likely it is for the i-th feature to be retained." 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "## Single Coordinate Descent Step" 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": {}, 519 | "source": [ 520 | "Using the formula above, implement coordinate descent that minimizes the cost function over a single feature i. Note that the intercept (weight 0) is not regularized. The function should accept feature matrix, output, current weights, l1 penalty, and index of feature to optimize over. The function should return new weight for feature i." 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": { 527 | "collapsed": true 528 | }, 529 | "outputs": [], 530 | "source": [ 531 | "def lasso_coordinate_descent_step(i, feature_matrix, output, weights, l1_penalty):\n", 532 | " # compute prediction\n", 533 | " prediction = predict_output(feature_matrix, weights)\n", 534 | " # compute ro[i] = SUM[ [feature_i]*(output - prediction + weight[i]*[feature_i]) ]\n", 535 | " ro_i = (feature_matrix[:,i] *(output - prediction + weights[i]*feature_matrix[:,i])).sum()\n", 536 | " if i == 0: # intercept -- do not regularize\n", 537 | " new_weight_i = ro_i\n", 538 | " elif ro_i < -l1_penalty/2.:\n", 539 | " new_weight_i = ro_i + l1_penalty/2\n", 540 | " elif ro_i > l1_penalty/2.:\n", 541 | " new_weight_i = ro_i - l1_penalty/2\n", 542 | " else:\n", 543 | " new_weight_i = 0.\n", 544 | " \n", 545 | " return new_weight_i" 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "metadata": {}, 551 | "source": [ 552 | "If you are using Numpy, test your function with the following snippet:" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": 126, 558 | "metadata": {}, 559 | "outputs": [ 560 | { 561 | "name": "stdout", 562 | "output_type": "stream", 563 | "text": [ 564 | "0.425558846691\n" 565 | ] 566 | } 567 | ], 568 | "source": [ 569 | "# should print 0.425558846691\n", 570 | "import math\n", 571 | "print (lasso_coordinate_descent_step(1, np.array([[3./math.sqrt(13),1./math.sqrt(10)],\n", 572 | " [2./math.sqrt(13),3./math.sqrt(10)]]), np.array([1., 1.]), np.array([1., 4.]), 0.1))" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "## Cyclical coordinate descent " 580 | ] 581 | }, 582 | { 583 | "cell_type": "markdown", 584 | "metadata": {}, 585 | "source": [ 586 | "Now that we have a function that optimizes the cost function over a single coordinate, let us implement cyclical coordinate descent where we optimize coordinates 0, 1, ..., (d-1) in order and repeat.\n", 587 | "\n", 588 | "When do we know to stop? Each time we scan all the coordinates (features) once, we measure the change in weight for each coordinate. If no coordinate changes by more than a specified threshold, we stop." 589 | ] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": {}, 594 | "source": [ 595 | "For each iteration:\n", 596 | "1. As you loop over features in order and perform coordinate descent, measure how much each coordinate changes.\n", 597 | "2. After the loop, if the maximum change across all coordinates is falls below the tolerance, stop. Otherwise, go back to step 1.\n", 598 | "\n", 599 | "Return weights\n", 600 | "\n", 601 | "**IMPORTANT: when computing a new weight for coordinate i, make sure to incorporate the new weights for coordinates 0, 1, ..., i-1. One good way is to update your weights variable in-place. See following pseudocode for illustration.**\n", 602 | "```\n", 603 | "for i in range(len(weights)):\n", 604 | " old_weights_i = weights[i] # remember old value of weight[i], as it will be overwritten\n", 605 | " # the following line uses new values for weight[0], weight[1], ..., weight[i-1]\n", 606 | " # and old values for weight[i], ..., weight[d-1]\n", 607 | " weights[i] = lasso_coordinate_descent_step(i, feature_matrix, output, weights, l1_penalty)\n", 608 | " \n", 609 | " # use old_weights_i to compute change in coordinate\n", 610 | " ...\n", 611 | "```" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": 127, 617 | "metadata": { 618 | "collapsed": true 619 | }, 620 | "outputs": [], 621 | "source": [ 622 | "def lasso_coordinate_descent_step(i, feature_matrix, output, weights, l1_penalty):\n", 623 | " # compute prediction\n", 624 | " prediction = predict_output(feature_matrix, weights)\n", 625 | " # compute ro[i] = SUM[ [feature_i]*(output - prediction + weight[i]*[feature_i]) ]\n", 626 | " ro_i = (feature_matrix[:,i] *(output - prediction + weights[i]*feature_matrix[:,i])).sum()\n", 627 | " if i == 0: # intercept -- do not regularize\n", 628 | " new_weight_i = ro_i\n", 629 | " elif ro_i < -l1_penalty/2.:\n", 630 | " new_weight_i = ro_i + l1_penalty/2\n", 631 | " elif ro_i > l1_penalty/2.:\n", 632 | " new_weight_i = ro_i - l1_penalty/2\n", 633 | " else:\n", 634 | " new_weight_i = 0.\n", 635 | " \n", 636 | " return new_weight_i" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "For each iteration:\n", 644 | "1. As you loop over features in order and perform coordinate descent, measure how much each coordinate changes.\n", 645 | "2. After the loop, if the maximum change across all coordinates is falls below the tolerance, stop. Otherwise, go back to step 1." 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": 168, 651 | "metadata": { 652 | "collapsed": true 653 | }, 654 | "outputs": [], 655 | "source": [ 656 | "def lasso_cyclical_coordinate_descent(feature_matrix, output, initial_weights, l1_penalty, tolerance):\n", 657 | " converged = False \n", 658 | " weights = np.array(initial_weights) # make sure it's a numpy array\n", 659 | " while not converged:\n", 660 | " # while we haven't reached the tolerance yet, update each feature's weight\n", 661 | " changes = []\n", 662 | " for i in range(len(weights)):\n", 663 | " #print('i is:', i)\n", 664 | " old_weights_i = weights[i] # remember old value of weight[i], as it will be overwritten\n", 665 | " #print('old_weights_i', old_weights_i)\n", 666 | " # the following line uses new values for weight[0], weight[1], ..., weight[i-1]\n", 667 | " # and old values for weight[i], ..., weight[d-1]\n", 668 | " weights[i] = lasso_coordinate_descent_step(i, feature_matrix, output, weights, l1_penalty)\n", 669 | " #print('new_weights_i', weights[i])\n", 670 | " # use old_weights_i to compute change in coordinate\n", 671 | " changes_i = np.abs(weights[i]-old_weights_i)\n", 672 | " #print('changes_i', changes_i)\n", 673 | " changes.append(changes_i) \n", 674 | " if max(changes)