├── .gitignore ├── Dimension Reduction.ipynb ├── Introductory Machine Learning.ipynb ├── README.md ├── Unsupervised Learning.ipynb ├── data ├── neg_tweets.txt ├── pos_tweets.txt ├── sunnyData.csv └── weather.csv ├── ml_bayes.ipynb ├── ml_classification.ipynb ├── ml_optimization.ipynb └── tree_modeling.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .ipynb_checkpoints/ 3 | __pycache__/ 4 | *.pyc 5 | *.pyo 6 | -------------------------------------------------------------------------------- /Dimension Reduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Setup\n", 8 | "\n", 9 | "This guide was written in R 3.2.3 and Python 3.5.\n", 10 | "\n", 11 | "\n", 12 | "### R and R Studio\n", 13 | "\n", 14 | "Download [R](https://www.r-project.org/) and [R Studio](https://www.rstudio.com/products/rstudio/download/).\n", 15 | "\n", 16 | "\n", 17 | "### Packages\n", 18 | "\n", 19 | "```\n", 20 | "install.packages(\"\")\n", 21 | "```" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": { 27 | "collapsed": true 28 | }, 29 | "source": [ 30 | "## Principal Components Regression\n", 31 | "\n" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "## Final Words\n", 39 | "\n", 40 | "### Resources" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": { 47 | "collapsed": true 48 | }, 49 | "outputs": [], 50 | "source": [] 51 | } 52 | ], 53 | "metadata": { 54 | "kernelspec": { 55 | "display_name": "Python 3", 56 | "language": "python", 57 | "name": "python3" 58 | }, 59 | "language_info": { 60 | "codemirror_mode": { 61 | "name": "ipython", 62 | "version": 3 63 | }, 64 | "file_extension": ".py", 65 | "mimetype": "text/x-python", 66 | "name": "python", 67 | "nbconvert_exporter": "python", 68 | "pygments_lexer": "ipython3", 69 | "version": "3.6.1" 70 | } 71 | }, 72 | "nbformat": 4, 73 | "nbformat_minor": 2 74 | } 75 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Machine Learning with Python 2 | 3 | 4 | ## 1.0 Background 5 | 6 | Recall in data structures learning about the different types of tree structures - binary, red black, and splay trees. In tree based modeling, we work off these structures for classification prediction. 7 | 8 | Tree based machine learning is great because it's incredibly accurate and stable, as well as easy to interpret. Despite being a linear model, tree based models map non-linear relationships well. The general structure is as follows: 9 | 10 | 11 | ## 2.0 Decision Trees 12 | 13 | Decision trees are a type of supervised learning algorithm used in classification that works for both categorical and continuous input/output variables. This typle of model includes structures with nodes which represent tests on attributes and the end nodes (leaves) of each branch represent class labels. Between these nodes are what we call edges, which represent a 'decision' that separates the data from the previous node based on some criteria. 14 | 15 | ![alt text](https://www.analyticsvidhya.com/wp-content/uploads/2016/04/dt.png "Logo Title Text 1") 16 | 17 | Looks familiar, right? 18 | 19 | ### 2.1 Nodes 20 | 21 | As mentioned above, nodes are an important part of the structure of Decision Trees. In this section, we'll review different types of nodes. 22 | 23 | #### 2.1.1 Root Node 24 | 25 | The root node is the node at the very top. It represents an entire population or sample because it has yet to be divided by any edges. 26 | 27 | #### 2.1.2 Decision Node 28 | 29 | Decision Nodes are the nodes that occur between the root node and leaves of your decision tree. It's considered a decision node because it's a resulting node of an edge that then splits once again into either more decision nodes, or the leaves. 30 | 31 | #### 2.1.3 Leaves/Terminal Nodes 32 | 33 | As mentioned before, leaves are the final nodes at the bottom of the decision tree that represent a class label in classification. They're also called terminal nodes because more nodes do not split off of them. 34 | 35 | #### 2.1.4 Parent and Child Nodes 36 | 37 | A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node. 38 | 39 | ### 2.2 Pros & Cons 40 | 41 | #### 2.2.1 Pros 42 | 43 | 1. Easy to Understand: Decision tree output is fairly easy to understand since it doesn't require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis. 44 | 45 | 2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can create new variables / features that has better power to predict target variable. You can refer article (Trick to enhance power of regression model) for one such trick. It can also be used in data exploration stage. For example, we are working on a problem where we have information available in hundreds of variables, there decision tree will help to identify most significant variable. 46 | 47 | 3. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree. 48 | 49 | 4. Data type is not a constraint: It can handle both numerical and categorical variables. 50 | 51 | 5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure. 52 | 53 | #### 2.2.2 Cons 54 | 55 | 1. Over fitting: Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning (discussed in detailed below). 56 | 57 | 2. Not fit for continuous variables: While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories. 58 | 59 | 60 | 61 | In this output, the rows show result for trees with different numbers of nodes. The column `xerror` represents the cross-validation error and the `CP` represents the complexity parameter. 62 | 63 | ### 2.2 Pruning Decision Trees 64 | 65 | Decision Tree pruning is a technique that reduces the size of decision trees by removing sections (nodes) of the tree that provide little power to classify instances. This is great because it reduces the complexity of the final classifier, which results in increased predictive accuracy by reducing overfitting. 66 | 67 | Ultimately, our aim is to reduce the cross-validation error. First, we index with the smallest complexity parameter: 68 | 69 | 70 | ## 3.0 Random Forests 71 | 72 | Recall the ensemble learning method from the Optimization lecture. Random Forests are an ensemble learning method for classification and regression. It works by combining individual decision trees through bagging. This allows us to overcome overfitting. 73 | 74 | ### 3.1 Algorithm 75 | 76 | First, we create many decision trees through bagging. Once completed, we inject randomness into the decision trees by allowing the trees to grow to their maximum sizes, leaving them unpruned. 77 | 78 | We make sure that each split is based on randomly selected subset of attributes, which reduces the correlation between different trees. 79 | 80 | Now we get into the random forest by voting on categories by majority. We begin by splitting the training data into K bootstrap samples by drawing samples from training data with replacement. 81 | 82 | Next, we estimate individual trees ti to the samples and have every regression tree predict a value for the unseen data. Lastly, we estimate those predictions with the formula: 83 | 84 | ![alt text](https://github.com/lesley2958/ml-tree-modeling/blob/master/rf-pred.png?raw=true "Logo Title Text 1") 85 | 86 | where ŷ is the response vector and x = [x1,...,xN]T ∈ X as the input parameters. 87 | 88 | 89 | ### 3.2 Advantages 90 | 91 | Random Forests allow us to learn non-linearity with a simple algorithm and good performance. It's also a fast training algorithm and resistant to overfitting. 92 | 93 | What's also phenomenal about Random Forests is that increasing the number of trees decreases the variance without increasing the bias, so the worry of the variance-bias tradeoff isn't as present. 94 | 95 | The averaging portion of the algorithm also allows the real structure of the data to reveal. Lastly, the noisy signals of individual trees cancel out. 96 | 97 | ### 3.3 Limitations 98 | 99 | Unfortunately, random forests have high memory consumption because of the many tree constructions. There's also little performance gain from larger training datasets. 100 | 101 | 102 | ========================================================================================== 103 | 104 | 105 | # Machine Learning Optimization 106 | 107 | ## Table of Contents 108 | 109 | - [0.0 Setup](#00-setup) 110 | + [0.1 Python and Pip](#01-python-and-pip) 111 | + [0.2 Libraries](#02-libraries) 112 | - [1.0 Background](#10-background) 113 | - [2.0 Ensemble Learning](#20-ensemble-learning) 114 | - [3.0 Bagging](#30-bagging) 115 | + [3.1 Algorithm](#31-algorithm) 116 | - [4.0 Boosting](#40-boosting) 117 | + [4.1 Algorithm](#41-algorithm) 118 | + [4.2 Boosting in R](#42-boosting-in-r) 119 | - [5.0 AdaBoosting](#50-adaboosting) 120 | + [5.1 Benefits](#51-benefits) 121 | + [5.2 Limits](#52-limits) 122 | + [5.3 AdaBoost in R](#53-adaboost-in-r) 123 | 124 | ## 0.0 Setup 125 | 126 | TThis guide was written in Python 3.6. 127 | 128 | ### 0.1 Python and Pip 129 | 130 | Download [Python](https://www.python.org/downloads/) and [Pip](https://pip.pypa.io/en/stable/installing/). 131 | 132 | ### 0.2 Libraries 133 | 134 | Let's install the modules we'll need for this tutorial. Open up your terminal and enter the following commands to install the needed python modules: 135 | 136 | ``` 137 | pip3 install scipy 138 | pip3 install numpy 139 | ``` 140 | 141 | ## 1.0 Background 142 | 143 | 144 | ## 2.0 Ensemble Learning 145 | 146 | Ensemble Learning allows us to combine predictions from different multiple learning algorithms. This is what we consider to be the "ensemble". By doing this, we can have a result with a better predictive performance compared to a single learner. 147 | 148 | It's important to note that one drawback is that there's increased computation time and reduces interpretability. 149 | 150 | 151 | ## 3.0 Bagging 152 | 153 | Bagging is a technique where reuse the same training algorithm several times on different subsets of the training data. 154 | 155 | ### 3.1 Algorithm 156 | 157 | Given a training dataset D of size N, bagging will generate new training sets Di of size M by sampling with replacement from D. Some observations might be repeated in each Di. 158 | 159 | If we set M to N, then on average 63.2% of the original dataset D is represented, the rest will be duplicates. 160 | 161 | The final step is that we train the classifer C on each Ci separately. 162 | 163 | 164 | ## 4.0 Boosting 165 | 166 | Boosting is an optimization technique that allows us to combine multiple classifiers to improve classification accuracy. In boosting, none of the classifiers just need to be at least slightly better than chance. 167 | 168 | Boosting involves training classifiers on a subset of the training data that is most informative given the current classifiers. 169 | 170 | ### 4.1 Algorithm 171 | 172 | The general boosting algorithm first involves fitting a simple model to subsample of the data. Next, we identify misclassified observations (ones that are hard to predict). we focus subsequent learners on these samples to get them right. Lastly, we combine these weak learners to form a more complex but accurate predictor. 173 | 174 | ## 5.0 AdaBoosting 175 | 176 | Now, instead of resampling, we can reweight misclassified training examples: 177 | 178 | ### 5.1 Benefits 179 | 180 | Aside from its easy implementation, AdaBoosting is great because it's a simple combination of multiple classifiers. These classifiers can also be different. 181 | 182 | ### 5.2 Limits 183 | 184 | On the other hand, AdaBoost is sensitive to misclassified points in the training data. 185 | -------------------------------------------------------------------------------- /Unsupervised Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Setup\n", 8 | "\n", 9 | "This guide was written in Python 3.5.\n", 10 | "\n", 11 | "### Python and Pip\n", 12 | "\n", 13 | "Download [Python](https://www.python.org/downloads/) and [Pip](https://pip.pypa.io/en/stable/installing/).\n", 14 | "\n", 15 | "### Libraries\n", 16 | "\n", 17 | "Let's install the modules we'll need for this tutorial. Open up your terminal and enter the following commands to install the needed python modules: \n", 18 | "\n", 19 | "```\n", 20 | "pip3 install time\n", 21 | "pip3 install sklearn\n", 22 | "```" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Introduction\n", 30 | "\n", 31 | "As we've covered before, there are two general categories that machine learning falls into. First is supervised learning, which we've covered with regression analysis, decision trees, and support vector machines. \n", 32 | "\n", 33 | "Recall that supervised learning is when your explanatory variables X come with an target variable Y. In contrast, unsupervised learning has no labels, so we a lot of X's with no Y's. In unsupervised learning all we can do is try our best to extract some meaning out of the data's underlying structure and do some checks to make sure that our methods are robust.\n", 34 | "\n", 35 | "### Clustering \n", 36 | "\n", 37 | "One example of an unsupervised learning algorithm is clustering! Clustering is exactly what it sounds like. It's a way of grouping “similar” data points together into clusters or subgroups, while keeping each group as distinct as possible. \n", 38 | "\n", 39 | "In this way data points belonging to different clusters will be quite different from each other, too. This is useful because oftentimes we'll come across datasets which exhibit this kind of grouped structure. Now, you might be thinking how are two points considered similar? That's a fair point and there are two ways in which we determine that: 1. Similarity 2. Cluster centroid. We'll go into detail on what these two things mean in the next section. \n", 40 | "\n", 41 | "### Similarity \n", 42 | "\n", 43 | "Intuitively, it makes sense that similar things should be close to each other, while different things should be farther apart. So to formalize the notion of similarity, we choose a distance metric (see below) that can quantify exactly how \"close\" two points are to each other. The most commonly used distance metric is the Euclidean distance which we should all be pretty familiar with (think: distance formula from middle school), and that's what we'll be using in our example today. We'll introduce some other distance metrics towards the end.\n", 44 | "\n", 45 | "### Cluster Centroid\n", 46 | "\n", 47 | "The cluster centroid is the most representative feature of the entire cluster. We say \"feature\" instead of \"point\" because the centroid may not necessarily be an existing point in the cluster. You can find it by averaging the values of all the points belonging to a specific group. But any relevant information about the cluster centroid tells us everything that we need to know about all other points in the same cluster.\n", 48 | "\n", 49 | "\n", 50 | "## K Means Clustering\n", 51 | "\n", 52 | "The k-means algorithm has a simple objective: given a set of data points, it tries to separate them out into k distinct clusters. It uses the same principle that we mentioned earlier: keep the data points within each cluster as similar as possible. You have to provide the value of k to the algorithm, so you should have a general idea of how many clusters you're expecting to see in your data. This sin't a precise science, but we can utilize visualization techniques to help us choose a proper k. \n", 53 | "\n", 54 | "So let’s begin by doing just that. Remember that clustering is an unsupervised learning method, so we’re never going to have a perfect answer for our final clusters, but we'll do our best to make sure that the results we get are reasonable and replicable. \n", 55 | "\n", 56 | "By replicable, we mean that our results can be arrived at by someone else using a different starting point. By reasonable, we mean that our results have to show some correlation with what we expect to encounter in real life.\n", 57 | "\n", 58 | "The following image is just an example of the visualization we might get. Notice the three colors and the ways in which they could be separated, so we can set k to 3. Right now we’re operating under the assumption that we know how many clusters we want, but we’ll go into more detail about relaxing this assumption and how to choose the best possible k at the end of the workshop.\n", 59 | "\n", 60 | "![alt text](https://camo.githubusercontent.com/6e540cb12555953bf43925fc20d46b6da1768017/687474703a2f2f707562732e7273632e6f72672f73657276696365732f696d616765732f525343707562732e65506c6174666f726d2e536572766963652e46726565436f6e74656e742e496d616765536572766963652e7376632f496d616765536572766963652f41727469636c65696d6167652f323031322f414e2f6332616e3136313232622f6332616e3136313232622d66332e676966 \"Logo Title Text 1\")\n", 61 | "\n", 62 | "### Centroid Initialization\n", 63 | "\n", 64 | "First we initialize three random cluster centroids. We initialize these clusters randomly because every iteration of k-means will \"correct\" them towards the right clusters. Since we are heading to a correct answer anyway, we don't really care about where we start.\n", 65 | "\n", 66 | "As we explained before, these centroids are our “representative points” -- they contain all the information that we need about other points in the same cluster. It makes sense to think about these centroids as being the physical center of each cluster, so let’s pretend like our randomly initialized cluster centers are the actual centroids, and group our points accordingly. Here we use our distance metric of choice, in this case the Euclidean distance. So for every single data point we have, we compute the two distances: one from the first cluster centroid, and the other from the second centroid. We assign this data point to the cluster at which the distance to the centroid is the smallest. This makes sense, because intuitively we’re grouping points which are closer together.\n", 67 | "\n", 68 | "\n", 69 | "### Cluster Formation\n", 70 | "\n", 71 | "Now we have something that's starting to resemble three distinct clusters! But remember that we need to update the centroids that we started with -- we've just added in a bunch of new data points to each cluster, so we need our representative point, or our centroid, to reflect that.\n", 72 | "\n", 73 | "So we'll just do quick averaging of all the values within each cluster and call that our new centroid. The new centroids are further \"within\" the data than the older centroids. Notice that we're not quite done yet -- we have some straggling points which don't really seem to belong in either cluster. Let's run another iteration of k-means and see if that separates out the clusters better. So recall that we're just computing the distances from the centroids for each data point, and re-assigning those that are closer to centroids of the other cluster.\n", 74 | "\n", 75 | "\n", 76 | "### Iteration\n", 77 | "\n", 78 | "We keep computing the centroids for every iteration using the steps before. After doing the few iterations, maybe you'll notice that the clusters don't change after a certain point. This actually turns out to be a good criterion for stopping the cluster iterations! At that point we're just wasting time and computational resources. So let's formalize this idea of a stopping criterion. We define a small value, ε, and we can terminate the algorithm when the change in cluster centroids is less than epsilon. This way, epsilon serves as a measure of how much error we can tolerate.\n" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "## Image Segmentation\n", 86 | "\n", 87 | "Now we'll move onto a k-means example with images! \n", 88 | "\n", 89 | "Images often have a few dominant colors -- for example, the bulk of the image is often made up of the foreground color and the background color. In this example, we'll write some code that uses scikit-learn's k-means clustering implementation to find the what these dominant colors may be.\n", 90 | "\n", 91 | "Once we know what the most important colors are in an image, we can compress (or \"quantize\") the image by re-expressing the image using only the set of k colors that we get from the algorithm. We'll be analyzing the two following images:\n", 92 | "\n", 93 | "![alt text](https://github.com/adicu/AccessibleML/blob/master/datasets/kmeans/imgs/leo_bb.png?raw=true \"Logo Title Text 1\")\n", 94 | "\n", 95 | "![alt text](https://github.com/adicu/AccessibleML/blob/master/datasets/kmeans/imgs/mario.png?raw=true \"Logo Title Text 1\")\n", 96 | "\n", 97 | "We'll be using the following modules, so make sure to import them:\n" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 1, 103 | "metadata": { 104 | "collapsed": true 105 | }, 106 | "outputs": [], 107 | "source": [ 108 | "import numpy as np\n", 109 | "import matplotlib.pyplot as plt\n", 110 | "import matplotlib.image as mpimg\n", 111 | "from sklearn.cluster import KMeans\n", 112 | "from sklearn.utils import shuffle\n", 113 | "from time import time" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Then we begin this exercise by reading in the image as a matrix and normalizing it:\n" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 2, 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "img = mpimg.imread(\"./leo.png\")\n", 132 | "img = img * 1.0 / img.max()" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "An image is represented here as a three-dimensional array of floating-point numbers, which can take values from 0 to 1. If we look at `img.shape`, we'll find that the first two dimensions are x and y, and then the last dimension is the color channel. There are three color channels (one each for red, green, and blue). A set of three channel values at a single (x, y)-coordinate is referred to as a \"pixel\".\n" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 3, 145 | "metadata": { 146 | "collapsed": true 147 | }, 148 | "outputs": [], 149 | "source": [ 150 | "width, height, num_channels = img.shape\n", 151 | "num_pixels = width * height" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "We're going to use a small random sample of 10% of the image to find our clusters:\n" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "num_sample_pixels = num_pixels / 10" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "Next we need to reshape the image data into a single long array of pixels (instead of a two-dimensional array of pixels) in order to take our sample.\n" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 6, 182 | "metadata": { 183 | "collapsed": true 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "img_reshaped = np.reshape(img, (num_pixels, num_channels))\n", 188 | "img_sample = shuffle(img_reshaped, random_state=0)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Now that we have our data, let's construct our k-means object and feed it some data. It will find the best k clusters, as determined by a distance function. We're going to try to find the 20 colors which best represent the colors in the picture, so we set k to 20:\n" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "metadata": { 202 | "collapsed": true 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "K = 20" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Here, we're instantiating the kmeans object just as we have done with other machine learning models. the t0 is initialized to track how fast this algorithm takes to fit, which is the next step in this process. Lastly, we just print how long it took. Note: this code has to be run at the same time so we can get an accurate estimate of how long it took!\n" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 8, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "K-means clustering complete. Elapsed time: 47.97915315628052 seconds\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "t0 = time()\n", 231 | "kmeans = KMeans(n_clusters=K, random_state=0)\n", 232 | "kmeans.fit(img_sample)\n", 233 | "print(\"K-means clustering complete. Elapsed time: {} seconds\".format(time() - t0))" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "The centers of each of the clusters represents a color that was significant in the image. We can grab the values of these colors from kmeans.cluster_centers_. We can also call kmeans.predict() to match each pixel in the image to the closest color, which will let us know the size of each cluster (and also serve as a way to quantize the image)\n" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 9, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "data": { 250 | "text/plain": [ 251 | "array([[ 0.59594023, 0.37377197, 0.23699242],\n", 252 | " [ 0.07824585, 0.06161205, 0.04534107],\n", 253 | " [ 0.98117697, 0.98098147, 0.97990966],\n", 254 | " [ 0.29123059, 0.28127983, 0.22996978],\n", 255 | " [ 0.88017613, 0.67859817, 0.51104909],\n", 256 | " [ 0.52016801, 0.51832098, 0.43664253],\n", 257 | " [ 0.37142357, 0.3536061 , 0.2840144 ],\n", 258 | " [ 0.27072299, 0.13391563, 0.05868939],\n", 259 | " [ 0.97850031, 0.85233569, 0.70350802],\n", 260 | " [ 0.47486466, 0.27632579, 0.16393569],\n", 261 | " [ 0.67109919, 0.48759544, 0.33595228],\n", 262 | " [ 0.01431993, 0.01277512, 0.01052323],\n", 263 | " [ 0.47390169, 0.43343523, 0.33971789],\n", 264 | " [ 0.21901846, 0.21610278, 0.17843075],\n", 265 | " [ 0.57573938, 0.59766436, 0.536443 ],\n", 266 | " [ 0.17545493, 0.07990164, 0.03137593],\n", 267 | " [ 0.3704485 , 0.19818981, 0.1059725 ],\n", 268 | " [ 0.15592512, 0.15792704, 0.13496453],\n", 269 | " [ 0.93447602, 0.77906561, 0.61215806],\n", 270 | " [ 0.78221744, 0.57008743, 0.40647745]], dtype=float32)" 271 | ] 272 | }, 273 | "execution_count": 9, 274 | "metadata": {}, 275 | "output_type": "execute_result" 276 | } 277 | ], 278 | "source": [ 279 | "kmeans.cluster_centers_" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "As you can see, there are K cluster centers, each of which is a RGB color" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "Now, we can predict on sample pixels and see how long that takes:" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 10, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "name": "stdout", 303 | "output_type": "stream", 304 | "text": [ 305 | "k-means labeling complete. Elapsed time: 0.39964914321899414 seconds\n" 306 | ] 307 | } 308 | ], 309 | "source": [ 310 | "t0 = time()\n", 311 | "labels = kmeans.predict(img_reshaped)\n", 312 | "print(\"k-means labeling complete. Elapsed time: {} seconds\".format(time() - t0))" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "You should get an answer under a second! Next, we can construct a histogram of the points in each cluster:" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 11, 325 | "metadata": { 326 | "collapsed": true 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "n, bins, patches = plt.hist(labels, bins=range(K+1))\n", 331 | "for p, color in zip(patches, kmeans.cluster_centers_):\n", 332 | " plt.setp(p, 'facecolor', color)" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "As you might be able to tell from the above histogram, the most dominant color in the scene is the background color, followed by a large drop down to the foreground colors. This isn't all that surprising, since visually we can see that the space is mostly filled with the background color -- that's why it's called the \"background\".\n" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "Now, let's redraw the scene using only the cluster centers. This can be used for image compression, since we only need to store the index into the list of cluster centers and the colors corresponding to each center, rather than the colors corresponding to each pixel in the image.\n" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 12, 352 | "metadata": { 353 | "collapsed": true 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "quantized_img = np.zeros(img.shape)\n", 358 | "for i in range(width):\n", 359 | " for j in range(height):\n", 360 | " # We need to do some math here to get the correct\n", 361 | " # index position in the labels array\n", 362 | " index = i * height + j\n", 363 | " quantized_img[i][j] = kmeans.cluster_centers_[labels[index]]\n", 364 | "\n", 365 | "quantized_imgplot = plt.imshow(quantized_img)" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "![alt text]( \"Logo Title Text 1\")\n", 373 | "\n", 374 | "Notice that the image looks similar, but that the gradients are no longer as smooth and there are a few image artifacts scattered throughout. This is because we're only using the k best colors, which excludes the steps along the gradient." 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "## Limitations and Extensions\n", 382 | "\n", 383 | "In our very first example, we started with k = 3 centroids. In case you're wondering how we arrived at this magic number and why, read on.\n", 384 | "\n", 385 | "### Known Number of Centroids \n", 386 | "\n", 387 | "Sometimes, you may be in a situation where the number of clusters is provided to you beforehand. For example, you may be asked to categorize a vast range of different bodily actions to the three main subdivisions of the brain (cerebrum, cerebellum and medulla). \n", 388 | "\n", 389 | "Here you know that you are looking for three main clusters where each cluster will represent the part of the brain the data point is grouped to. So in this situation, you expect to have three centroids." 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "### Unknown Number of Centroids\n", 397 | "\n", 398 | "However, there may be other situations while training in which you may not even know how many centroids to pick up from your data. Two extreme situations generally happen.\n", 399 | "\n", 400 | "#### Extreme Cases\n", 401 | "\n", 402 | "You could either end up making each point its own representative (a perfect centroid) at the risk of losing any grouping tendencies. This is usually called the overfitting problem. While each point perfectly represents itself, it gives you no general information about the data as a whole and will be unable to tell you anything relevant about new data that is coming in.\n", 403 | "\n", 404 | "You could end up choosing only one centroid from all the data (a perfect grouping). Since there is no way to generalize an enormous volume of data to one point alone, this method loses relevant distinguishing features of the data.This is kind of like saying that all the people in the world drink water, so we can cluster them all by this feature. In Machine Learning terminology, this is called the underfitting problem. Underfitting implies that we are generalizing all of our data to a potentially trivial common feature.\n", 405 | "\n", 406 | "#### Stability\n", 407 | "\n", 408 | "Unfortunately, there's no easy way to determine the optimal value of k. It's a hard problem: we have to think about balancing out the number of clusters that makes the most sense for our data, while at the same time making sure that we don't overfit our model to the exact dataset that we have. There are a few ways that we can address this, and we'll briefly mention them here.\n", 409 | "\n", 410 | "The most intuitive explanation is the idea of stability. If the clusters we obtain represent a true, underlying pattern in our data, it makes sense that the clusters shouldn't change very much on separate but similar samples. So if we randomly subsample or split our data into smaller parts and run the clustering algorithm again, the cluster memberships shouldn't drastically change. If they did, that'd be an indication that our clusters were too finely-tuned to the random noise in our data. Therefore, we can compute stability scores for a fixed value of k and observe which value of k gives us the most stable clusters. This idea of perturbation is really important for machine learning in general, and will come up time and time again.\n", 411 | "\n", 412 | "We can also use penalization approaches, where we use different criterion such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to keep the value of k under control.\n" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": null, 418 | "metadata": { 419 | "collapsed": true 420 | }, 421 | "outputs": [], 422 | "source": [] 423 | } 424 | ], 425 | "metadata": { 426 | "kernelspec": { 427 | "display_name": "Python 3", 428 | "language": "python", 429 | "name": "python3" 430 | }, 431 | "language_info": { 432 | "codemirror_mode": { 433 | "name": "ipython", 434 | "version": 3 435 | }, 436 | "file_extension": ".py", 437 | "mimetype": "text/x-python", 438 | "name": "python", 439 | "nbconvert_exporter": "python", 440 | "pygments_lexer": "ipython3", 441 | "version": "3.6.1" 442 | } 443 | }, 444 | "nbformat": 4, 445 | "nbformat_minor": 2 446 | } 447 | -------------------------------------------------------------------------------- /data/pos_tweets.txt: -------------------------------------------------------------------------------- 1 | " I cheer myself up when I'm down by listening to my playlist called, Genius: Ballads and Cellos. I love my iPod and my taste of music." 2 | " just watched the movie Wanted... it was pretty darn good." 3 | " now I'm happy " 4 | "--plotting like i'm mike..'game plan:pass the ball to lebron AT ALL TIMES and DONT FOUL'..certainly we'll win haha..go cavs goooo!" 5 | "@ mcdonalds with my litto sis aka cuzin lol cristyyyyy " 6 | "@ PBnJen : Thanks for the great tour and making me even more excited to work in PR! You Rock and so does S" 7 | "@_Chelsea_Marie does target ship things to london? thanks so much! im such a demi fan shes amazing! " 8 | "@adeline_sky that sounds fantastic! You're amazing! We need to watch some Muse gigs too! Shall we do it Saturday night?" 9 | "@adrianogarcia Ok, tend to use a dongle, but nice to know I could, even nicer to know iPhones can't " 10 | "@agallerylondon Hi! Great to see your tweet follow. Thanks! How's the weekends for you?" 11 | "@Alex_Jeffreys Welcome back. Emailed you earlier this morning with some stuff. We'll speak real soon. What a time in Vegas we all had " 12 | "@AlexAllTimeLow i have a house boat in seattle you could all get wasted on next time youre here...can hols 100 people " 13 | "@AlexAllTimeLow Sydney Aquarium is the best place in the world When youre done walk over the bridge and go to the mall (YYY)" 14 | "@AlexAllTimeLow thankyou so much for taking photos with us and being so awesome. Enjoy sydney " 15 | "@alix_says, it's just like school except there's even more homework and everyone's a lot nicer (normally) " 16 | "@AllThingsFresh beautiful " 17 | "@AmyEutsler connection's pretty good in Sydney today... Maybe you should come down for a visit??? " 18 | "@AmyLee27 Hi Amy. I great. I took a vacation day today in order to attend a Executive Committee meeting tonight. Obama makes me so proud " 19 | "@amysav83 i hve 2 leave 4 work :O i posted a youtube vid for you to watch. hope you like it. i may log on at work. have a great day " 20 | "@Andipham Sweet! it's really close to me..must be very nice! try posting on Craig'slist with images..should get some fast repsponse " 21 | "@angiekaybee yep i saw him in february and i get to see @dfcook on may 8th. looking forward to seeing him again " 22 | "@AnnaSaccone Love your new cards! I would definitely hire you "4";"@supricky06 that was one of the most enjoyable experiences I've had on YouTube. Well done http://bit.ly/1a7zPw" 23 | "@arataka yay! you found them!!! " 24 | "@ashleeadams doesn't make you look fat at all man. Good photo. " 25 | "@ashleytisdale when are you coming to ireland? i'd love to see you in ireland and yay for starbucks" 26 | "@athena422 Right on...I saw 'Ryche in Vegas at the end of April and they were awesome as usual " 27 | "@Audrey_O Girl, what are you contemplating. We ARE going to Vegas! The concert? Well, that part is up in the air! haha" 28 | "@BadBoyOfOpera We have here, that being Sussex in England! absolutely Loving it " 29 | "@batistini21 lols you go for the cavs. orlando owned them today " 30 | "@bbrannan I'm and good and glad you are too! Don't you think we should get the new twitter, Oprah to share in twitter causes? " 31 | "@BennyDavro Honey, you're not only on the 1st page, you're within about 100 votes of Obama. Rock that grass skirt, luv. " 32 | "@beycah Good for you Becca, have a good sleep. " 33 | "@billyraycyrus heey, brazil loves you and miley " 34 | "@bodycoach I'll look into that cho, I personally like strawberry but I'm going to become coach soon " 35 | "@Bparker_Seattle I totally 4 got about Golden Girls that is a gr8 Show " 36 | "@BradmanTV Thanks....it seemed the right moment. I love your show. Your songs bring me joy." 37 | "@BrandiPowers You too follow a fellow @Lakers fan beautiful " 38 | "@BrazillofBLAK cant wait 2 hear it " 39 | "@BrazilTour you are very welcome! " 40 | "@BrianViloria Not this year, Mr. LeBron James! Hahaha! " 41 | "@brisvegasbetty thank you I'm glad you liked it " 42 | "@CaterinaFalcone You are taking vacation??? OMG it's about time! I'm not going to miss you bc you're gonna bring me starbucks everyday! " 43 | "@catharinamcfly think of London and you'll survive! Can't believe that it's less then a year until I graduate! London Here I come! " 44 | "@CCArquette ps. please try to see if you get Cougar Town to be broadcast internationally!!! im brazilian but such a great fan of yours! " 45 | "@chanc Can't wait to see you in 4 sleeps and 3 days! " 46 | "@chasepino I wish I was as cool as you.. " 47 | "@CheapyD sweet, new episode of cagcast, thnx cheapy " 48 | "@ChrisCavs What a beautiful day for a birthday! Happy b-day Chris! " 49 | "@chrisjohnski iPhone say Chirs is here http://bit.ly/kIpFw start stalking him LOL " 50 | "@cobaltcow that's so true! Thnx man, that means an awful lot to me " 51 | "@comeagainjen hey x] just wanted to say your awesome, and angilena or however your spell it has nothing on you " 52 | "@Cowbelly Enjoy your time at Pike's Place Market today...it is a perfect Seattle day for the market and for a photo shoot " 53 | "@Cronotriggers I'm very tempted to buy it on the PSN/PSP, in hopes Capcom will follow up with a RE2 release " 54 | "@crystalchappell Good luck N ur meets. Idea to save GL - go on Oprah " 55 | "@csquaredsmiles TWATLIGHT loool.. I know.. now they relate rain to it.. haha thats why i wanna move to england is always cold! " 56 | "@daNanner Yeah, just finished #castle. Was pretty good. " 57 | "@danielkirkley Lead Me To The Cross is one of my favorites!! Great job, D!!! " 58 | "@danlee29 I'm glad you were disappointed and the Lakers destroyed them. " 59 | "@DanniAsheOnline Awwww she called me sexy nice! you rock Danni! you just made my night gurl! muah! I'll give you props anytime! " 60 | "@DanniAsheOnline I think that would work nicely gurl! go for it! I believe many, many people would sign up! " 61 | "@Dannymcfly Hey Danny how are things ? this your last night in brazill ? please writeback love karina xx" 62 | "@Dannymcfly heyhey and they love youuu! hahahaha! having a good time over in brazil then?" 63 | "@danzelikman Here's to hoping you come home with a Las Vegas bailout. " 64 | "@DaRealSunisaKim Thanks for the Twitter add, Sunisa! I got to meet you once at a HIN show here in the DC area and you were a sweetheart. " 65 | "@DaveBos its okay ive done it once didnt but Wooo Stayed up longer then expected lol " 66 | "@DaveMatthewsB New album is fantastic! See you in London, can't wait! " 67 | "@daydreamer20 Good post. " 68 | "@daygan Wow! This Ubuntu feature is really cool ?? again! Got More??" 69 | "@ddlovato ohh myy goshh!! ,, are you the opening act for the jonas brothers at wembley in London on the 15th of june ?? xx " 70 | "@ddlovato Demi I'll deff see you on Brazil (S�o Paulo) on the show " 71 | "@ddlovato I HAVE A SURPRISE FOR YOU WHEN U COME TO LONDON cnt wait to seeya ! " 72 | "@ddlovato yesterday " 73 | "@DebbieFletcher I saw them on Oprah the other day, they are really good! " 74 | "@divabat Hahahaaaaa! I'd like to meet your Mum someday, and talk about Oprah with her, Her likes and Dislikes, and here her opinion on it " 75 | "@DJTinaSapp Ha! I'll get it back to you as soon as possible! " 76 | "@dmf71 rrrrrr you so very sweet a big hi to you!!!!!!! " 77 | "@DONNASAWR Thank you Donna. " 78 | "@DonnieWahlberg YOU give me joy....and you have for years. You are such an amazing man and we are all lucky to know you. " 79 | "@dornx give me time and ill come with you sa london! haha. good luck with that! " 80 | "@dpburland yes i think i might just do that hehe might watch sleepless in seattle my fave " 81 | "@Dr_DinaSadik Awesome! u'll definitely know some swear words then! haha" 82 | "@DrMollieMarti Best of luck to you Dr and your new Rockstar sisters. I know some women that are into MLM. I'll tell them about you. " 83 | "@DrSecret Nice to meet you too buddy " 84 | "@dsiegel99 now if i could get to san diego the same night i could celebrate it twice lol thanks " 85 | "@DwightHoward great job Dwight! I pray you also win against the Cavs. " 86 | "@egodbois it's good to see you here as well on our facebook! Thanks for being a fan " 87 | "@eliizabetty maybe we will meet there. i want to go and study in london/england too " 88 | "@EP31 we have soccer, too~! The New England Revolution, aka the Revs! TAYLOR TWELLMAN FTW" 89 | "@feliciaday Hey can you tell me if we can use Xbox 360 to use Twitter as Tickers/popups? That would be Sweet! " 90 | "@ferretfreakx4 I had so much fun tonight! And I'm totally stealing all your pictures when you upload them. hehehe" 91 | "@FlipFlopsPearls YEAHH!!! Email me it! I will ship out the blanket and a few lil extras "4";"@tiranw thats bc u love the CAVS LOL *WINK WINK" 92 | "@fourzoas Good night! " 93 | "@FoxxFiles nah...The Cavs are done Go Magic!" 94 | "@GabrielRossi the only Tweet that could top this would be a DM from President Obama asking me to become the czar of youth empowerment " 95 | "@garpu Great! I've actually always wanted to visit Seattle, so perhaps some day. " 96 | "@GLEETV you guys should come to Seattle!!! It's a pretty big city " 97 | "@googoodolls http://twitpic.com/56m0y - Really nice! Grettings from Brazil =P i love goo goo dolls (L'" 98 | "@guardiantech sounds great can't wait for E3 and the WWDC. New iPhone and PSP Go, possible new Xbox (wishful thinking)" 99 | "@guykawasaki congrats on 100,000 followers. Wow almost 4 times of what I have " 100 | "@hannahpoulton good morning! you sound very chirpy " 101 | "@harper I'm in the same boat as you. Happily, there is a McDonalds close by so we can both enjoy a greasy breakfast." 102 | "@hertbeat it is still only 1st coffee of the day for me ! Happy Tuesday, off to see Jeff Dunham and Achmed the dead terrorist tonight ! " 103 | "@hitwithafish very nice pics! Such a cute family! " 104 | "@HollywoodHansM lol like how gud kobe iz!! lol and its gunn be the lakers " 105 | "@Honey3223 Lurkers now that was interesting " 106 | "@hotglassblower Oprah would have touched your feminine side today and since you luv her som much I even DVR'd it " 107 | "@iamdiddy your telling me.. just finished doing 200 crunches! step it up Diddy ! LETS GO!" 108 | "@iamjonathancook Good morning Jonathan its 1:51am in Sydney having midnight snacks...stupid time difference,u couldve joined me!haha xx" 109 | "@ijustine the gym is an awesome place! work it out " 110 | "@ILUVNKOTB he wants u to follow who he follows on twitter. some very nice organizations " 111 | "@imagejennation @whitrt we found a great Chinese place to hang out at " 112 | "@IndieArtDesign well you will be excited to know, Kevin will be the new featured artist for our next Sydney flyer " 113 | "@IntriguingDs Ahh...I love your music. Missed you in Seattle. " 114 | "@ivegotnerve Yes, I am in love with someone who lives in England. But I was able to see him last May, 3rd " 115 | "@JamesDeen you are my own personal Jeepers " 116 | "@jaredleto come to england and play an eco friendly gig.would love that " 117 | "@jaspreetgill http://twitpic.com/6ubr9 - woooho! they get better and better! I'm watching ur videos on YouTube right now haha, as I'v ..." 118 | "@JBsFanArgentina Hey I luv this pic!!! was amazing of the last CHAT of The JB in FACEBOOK! " 119 | "@jcutaia day 3 of mcdonalds breakfast " 120 | "@Jeff_Sparxxx well in that case, you know what i mean hahaha. i love a good __________. (fill in the blank with the role of your choice.) " 121 | "@JenniferRosex3 I know! I love the song hello seattle" 122 | "@jenocidal I think my iPhone was thinking the same thing because the next song was cake, I will survive. thank you." 123 | "@Jeremeyxvx you know you broke edge in vegas happy 8th year brahhh" 124 | "@JonasAustralia for sure!! Except the new moon bit. IMAGINE @JONASBROTHERS LIVE in AUSTRALIA OMJ " 125 | "@JonasAustralia i voted like ten times " 126 | "@JonasAustralia This Friday as in 5th June Friday?! Argh. OMG. Which website?! Im so excited! " 127 | "@JonasBrothers I LOVE YOU MR PRESIDENT" 128 | "@Jonasbrothers Nice skillz Nick x love always, Marjorie " 129 | "@jonasnessica heyy sweets!! I started reading your fanfic! It's awesome so far ignre what I wrote on YouTube "4";"doesn't want to sleep just won a free ipod case woot! WWDC Contest. btw there is a new girl at work...very nice!!!" 130 | "@JonathanRKnight Hi Jon! Great to hear from you! See you on the cruise, I cannot wait! Hope all is well on the Knight bus! You are loved!" 131 | "@JonathanRKnight i'd share my latte with you. just got back from a starbucks run myself " 132 | "@kailaengland I like a girl who has style too. Cocktail dress " 133 | "@katayy what do you think of Kelly clarkson? her new album has been pumping through my MP3 player for weeks now " 134 | "@kayotae Just had a martini myself. " 135 | "@kennyseattle1 Hi Kenny !! Welcome to twitterville and get ready to waste tons of hours having fun on here. See u live at 5am on Q13 FOX " 136 | "@keyannaaa the new story of your life will include magic fountain with me " 137 | "@kmacable...you're sweet...yea I'm ok! " 138 | "@KVAY2K Hope you are having a wonderful night. I will be up again all night. Sent you a message on YOUTUBE. " 139 | "@Lakers ready to win tonight!!! " 140 | "@lakersnation its not really my style, but thats a really well done wallpaper, thanks for making it. keep up the good work " 141 | "@lancearmstrong http://twitpic.com/6vb49 - wow lance great pic brother! you my idol! my mom just beat breast cancer! " 142 | "@LanceGross lakers all day everyday!!!!! " 143 | "@Larissa_Ione I LOVED Taming the Fire! I emailed sydney croft with all the reasons why " 144 | "@laurieann444 I am so happy to hear they played Follow My Way. They were practicing b4 show in Seattle. Excellent!!!! More 4 U 2Nite!!! " 145 | "@LBheart_Jessica Yes... I'm going to Kenya in August for 10 days w/ Tumaini International, loving on cute AIDS orphans. SO excited! " 146 | "@Lee_Knight lmao! thanks Lee XD, would u like to join in our craziness as well lolol ROFL come " 147 | "@lhawthorn What a coincidence! I was just stalking your Twitter feed and the GSoC news. "4";"@ColinCancer HAHA very funny! So what or who encouraged you to create a Twitter?" 148 | "@lightinthesky wow. enjoy! " 149 | "@lilyroseallen i was at your sydney concert on wednesday. used my fake id to get drunk, and hooked up with a hot english boy.woo! thanks " 150 | "@LisaNoelRuocco oh and i like your new hair too. i think it really suits you. " 151 | "@LittleFletcher Hi Carrie! How are you? Girl, Brazil love you! Come visit here! Have a nice day! xx" 152 | "@llmatticusll yuuuup! 9 miles it's for cancer! for a good deed!" 153 | "@LOlakers7 AWESOME GAME TODAY " 154 | "@londonbridget13 ill always have your back!! " 155 | "@Lovely_London Pretty good. How about you? " 156 | "@LovelyCharles Get an iPhone. " 157 | "@luvsJonasandVFC YUP! I'm seeing them in August! woot woot! my first concert was with Corbin Bleu, Drake Bell, and Aly" 158 | "@m3L1nd4 not in youtube,, but in dvd.. about the story it's mix between meteor garden and hana yori dango.. you must watch it.. " 159 | "@MacNH the boy needs to get his ass back to New England where he's properly appreciated " 160 | "@malave585 And you said we were through back during the Celtics! haha. Hopefully you guys will get LeBron. Bring some excitement to NY" 161 | "@mandyblake I was just about to X out of here for the night and saw your post, I started laughing! You are funny!! " 162 | "@mari_chiquitita nice pics from last years festival. How about we plan to meet there next year! " 163 | "@markhoppus hey mark!! so glad to see you guys all back together seen youse when you played newcastle arena , england, great show!! " 164 | "@maryaflower McDonalds coffee beat starbucks by a mile in a blind taste test " 165 | "@McChelsea get a glass of wine, a nice book, and just chillax out there. oh wait, it's seattle, i'll probably rain. hah " 166 | "@mdietrich that`s why you should go for holiday as soon ! (whithout iPhone " 167 | "@meowmixfever we should go to the grocery store and buy a large box of them that'd make my life complete haha _Myana" 168 | "@Michael_Cera starbucks is the best. i work there, i should know what're you getting there?" 169 | "@MichaelNi Trader Joe's is one of my favorite things about Seattle. " 170 | "@michellebranch The month of June is also officially LGBT Pride month, declared by Obama! Celebrate " 171 | "@mileycyrus BRAZIL LOVES U SO MUCH!! YOU ARE AWESOME, MILEY! COME TO BRAZIL.. WE CAN'T FOR THIS DAY COME " 172 | "@mileycyrus http://twitpic.com/7e01t - ahh..i´ve got the same ipode ..which song so you listen?love ya " 173 | "@mileycyrus i think u should come to south america all your argentinians and brazilians fans are going crazy hahaha! please reply " 174 | "@mileycyrus i voted for you! i hope u win!!!! and Taylor Swift too!!! " 175 | "@mileycyrus Miley takes a tranquilizer for sleep which is very good, I take the time #mileybrazil" 176 | "@MissSydneyJ Im good, lol... I feel awake " 177 | "@mistadee - I've got Tweetie for the iPod Touch - recommended " 178 | "@mitchelmusso I love your new hair You have to come to england sometime!" 179 | "@mollieandjackie Love and Theft is AMAZING! They opened for Taylor Swift back last year!!! Miss ya'll!!! " 180 | "@Mpits its super easy lol I wish everyone would jump on the bandwagon " 181 | "@Mr_LasVegas Breakfast menu runs 24/7 in Vegas ...I remember having breakfast at 1pm after a big night..." 182 | "@ms_elli i could make it perfect for you. Come to Starbucks in England!! " 183 | "@Msdebramaye I heard about that contest! Congrats girl!! " 184 | "@mydesire I saw that earlier on Darker Sights/Sounds. I subscribe to that blog in my google reader. Yummmminesssssss." 185 | "@NaiveLondonGirl good for you girl! recommend papers, tea and toast as well... " 186 | "@NasaCaligeek I think even the coolest ppl have a geek inside waiting patiently to get out... " 187 | "@nick_carter Don't forget my dear, you'll always be in my mind and my heart ok? BIG kiss and special affection from me (Brazil)! I ? u!" 188 | "@nickjonas Are You Gunna Go Shopping? =D London Rocks For Shopping x" 189 | "@nicolerichie only one of the best sappy love stories ever " 190 | "@nicolerichie that was my favorite show/series when i was little! " 191 | "@nicolerichie YES! lol...when you live in canada, anne part of your childhood. I grew up with all the books and tv series " 192 | "@nikhilnarayanan saw the tvc yesterday.. worked for me " 193 | "@NILANTI atleast ur not the only 1 up. Watch the lakers kick ass again " 194 | "@noahbeaar Hi Noah! Omgsh I Think Your So Cute, You Look Like Miley Awwww! How Are Youu? Im 14 " 195 | "@NostraSeamus If Obama took a stand, it would give the Ahmedinejad supporters more to fight against. We're too famous for this party " 196 | "@NYFab Happy graduations! Say hi to Obama for us!! (he's a bowler)" 197 | "@obama_binladen sure tell me about the project that you received funding for! thats great news " 198 | "@Ocartti hey!! yay! tweet tweet! hah. oh n if I want to follow oprah I can!! she asked me too! " 199 | "@officialmiranda Thats good! I'm from brazil n i love ICarly Did you follow your fans too?" 200 | "@OldmanHo lol yeah, this time pretty much the whole thing so close again, but looks like Cavs are being totally outplayed by Magic!" 201 | "@omgimlondon get me one of those coffe cakes from starbucks while u there " 202 | "@Oprah ...now I am curious to watch it. And will, tomorrow Sunday if I can. Thanks! " 203 | "@Oprah Good nite Oprah, good show today " 204 | "@Oprah I can't WAIT to see that movie! " 205 | "@Oprah np gurl! that was a great show! " 206 | "@Oprah Phylicia Rashad is my idol...I dream of meeting her. I love you too Oprah... " 207 | "@Oprah Sweet Dreams O. You did good with your " 208 | "@Oprah We are so happy to have you back in the Tarheel State, even if you are in Blue Devil country " 209 | "@oprah welcome to twitter! " 210 | "@oprah, nite! that's a really cute pup by the way. " 211 | "@patriciaco HAHA. Sometimes I like fantasizing that I'm Taylor Swift. I'm a psycho like that. " 212 | "@pcdnicole Hi Nicole, glad you loved Sydney! Was amazing working with you the other night at 301! Looking forward to you coming back " 213 | "@petrobrasbrazil i have a new follower which i like @petrobrasbrazil please take care of nature " 214 | "@phyonetizen Thanks for adding me on Facebook, Phyo " 215 | "@pocketfulofme hahaha i so should of! but my nails were wet. hehe.but he was helpful he flipped the magazine pages for me " 216 | "@psproduction omg they r having a daddy n me princess ball out here in staten island 4 fathers' day!!! N katie n u popped into my head! " 217 | "@purrplexa And its a wonderful thing! I'm all in favor of using the entire English language (shit, I use #expletives too, I think) " 218 | "@r_keith_hill Thans for your response. Ihad already find this answer " 219 | "@rachhiiee_ jenny knows i love er. " 220 | "@rashmid congratulations!! tht makes a lot of us very very happy. " 221 | "@ravensymone hayy! big fan love your music and your show !! " 222 | "@realjohngreen Yeah, Dutch people are up as well "0";"I wanna be in a punk rock band again " 223 | "@rhonda_brown thanks for the tip on Sam's Club! " 224 | "@RobKardashian Rob, Your my fav Kardashian, Please say Hi to me " 225 | "@RoundSparrow Awesome - hope it was great! I am about 2 hours away from seeing it at the Sydney (now second world viewing) premiere " 226 | "@SashaBaby22 haha! Ps, nah but for the west to win? Oh well. Me n my baby daddy Lebron gonn cuddle " 227 | "@scrambledeggos Seattle is very nice. So green it reminds me of Germany " 228 | "@sdownes1972 thx Stu will do! " 229 | "@seanmurphymusic I'am in love with Taylor Swift's " 230 | "@seattlemommy you rock!! i need to get motivated to run a race... " 231 | "@SeattleWillow Have a nice time work! " 232 | "@secretfanofu big news Andrew: I'm moving up to SF at the end of the month! So making the most of my last month in LA. " 233 | "@seodubai Go #Lakers go! Good morning!" 234 | "@shannamichelle @levifig YES - We all want to see all the pics! I see a facebook album in the future! " 235 | "@Shantymanfan I got your package! Wow, I'm so excited! My very own one! So, do I plant it in a pot for now, and water it till it grows? " 236 | "@ShawnORourke lol tru dat! thanks bro! " 237 | "@shaycarl I'm watching your videos in youtube you are so funny!! )" 238 | "@SilMuri yeah that'd be awesome... You're gonna be the new Oprah! Freakin' yeah *ninja-rolls over to Sil to eat chocolate*" 239 | "@SkinFaceMcGee You're description of this Sat. morning was perfection...thanks. Seattle is following suite." 240 | "@socalgurl83 LOL nice, thanks for translating " 241 | "@Soraspsp now that summer is here you can come and hang out with aunt beth more often " 242 | "@special72 Thanks, Special 72! " 243 | "@spicebugsmom A few more for you to follow: @aims7 @alirushton @dillyh @Oprah @timescolonist @JohnCleese ...Welcome to Twitter mums " 244 | "@StarChile.....ahhh.....alright. Thanks. Congrat's on your team's win today " 245 | "@stasia19 i got work at 8:30 so i'm going to bed now. i shall ttyl, have fun doing your project STARBUCKS DATE SOON ?" 246 | "@stevonelson I like that one. I've used it on a project here." 247 | "@stretcharmy Dim Mak was sick! ripped it apart! Vegas was cool n fun, SF was dope, people went nuts! LA friday was fire ! yeah was good!" 248 | "@suitelifeofkell that's freakin' cooool i love twitter .haha" 249 | "@suitelifeofkell yayyy! lol. i just requested herrr. what did she say? " 250 | "@SwitchItNow ahhh shiner Bock in Austin YES! thanks for the Twitter @reply so I can relive too " 251 | "@sydney_sider yes thanks I think they're amazing too! the images were taken by @insidecuisine photographer the very talented @rovingrob " 252 | "@sydneymckinley im not gonna lie, i absolutely LOVED JONAS. hahahaha, those boys can do no wrong " 253 | "@tashlee Oh, and good to hear from you again! i'm taking some students to Sydney, you should join us for lunch/dinner if you can." 254 | "@taylorswift13 It's National Listen To Taylor Swift Day " 255 | "@taylorswift13 Taylor Swift I think you're so pretty it makes my heart melt everytime I see your face. " 256 | "@taylorswift13 you're soooo talented and I wish I could go to one of your concerts! Ily miss taylor swift! " 257 | "@TayStarr awww thats my job, im like mcdonalds girl, i love to see you smile " 258 | "@TDLQ Well it was just so so close the whole time! But at least they pulled it out! WOOT LAKERS! " 259 | "@teleken It's a feat of USB engineering! Makes every day a party. " 260 | "@the_real_shaq you TOTALLY own3d @oprah! Congrats! http://digg.com/d1p1zg" 261 | "@thedebbyryan debby *-* hi, i'm carol from brazil, and i love you so much *-* please reply me " 262 | "@ThePartyScene " 263 | "@theragingocean BonJour Spacecowboy,I wish I was either of it .I work with Kids, can go out whenever I want and go wherever I want to " 264 | "@tommcfly I are welcome always, Brazil is your house now " 265 | "@tommcfly thought you'd think this is cool. my cousin is going to personally meet the head of NASA to discuss future projects. Cool, huh? " 266 | "@tommcfly will you watch Green Day on UK this year? can't wait to see them in Brazil... Billie is trying to learn some portuguese! " 267 | "@Triciepop Yall goin to Vegas? Make sure yall hit me up so we can drink till we drown Lol. " 268 | "@triplejHack true off to the #startrekmovie world premiere soonish. enjoy talking about supa-fast interwebs with conroy " 269 | "@tru_artiste my dude ima have the pics up soon prolly facebook" 270 | "@TrueGabe yep to the wine, and feijoa sorbet, muffins, chutney etc... we have a large tree always open to suggestions tho " 271 | "@TunerKid Hey, hey, hey, hey! I'm ur 28th follower. @oprah sent me. "0";"pray for me please, the ex is threatening to start sh** at my/our babies 1st Birthday party. what a jerk. and I still have a headache " 272 | "@tweetshrink Hah, yea, I suppose "4";"@David_Henrie haha i WISH i coudl meet you.. you should stop by seattle some time home of the STARBUKS "4";"Ok bed time. I wish I didn't have an exam on thursday otherwise I would just go around harassing people to vote! lol. Night guys! " 273 | "@twinsmvb i will be rooting for twins in seattle this weekend! shhh, don't tell my mariners friends " 274 | "@VampireBill Goodnight and take care " 275 | "@VegasRex Slater's wife got him with a bottle pretty well. Nice visit to UMC " 276 | "@veldagraydon I looked for @Oprah too after she mentioned that she tweeted on twitter and found her too. Have a glorious day. " 277 | "@wale I'm gonna havta temp stop fllwing u while ur talkin abt kobe bc I loveeeeeeee him " 278 | "@warp I love that Aussie cattle dog story " 279 | "@WCEllis I feeling MUCH better about the Lakers' situation now. Thank you for asking " 280 | "@wendy_bowser Thank you for the kind words! I appreciate it. Have a great night " 281 | "@wmmarc great interview! Glad to see it " 282 | "@Wossy Oh, that is so close to being a dream lineup for me. Will watch it if/when it finally shows in Msia. If not, youtube! " 283 | "@youngcj86 I was basically just saying, that Kobe's competition will be LeBron eventually. I'm not Anti-Kobe, I'm just Pro-LeBron " 284 | "@YourWebChick Starbucks? What's your favorite drink? I gotta tell you... I'm more of a Dunkin Donuts man. " 285 | "@zinedistro Your welcome. Looking forward to the event!! Not long until May 1 now...." 286 | "@zoecello Love the " 287 | "*ahem* I think @LiZAmtl @sabrina215 @Etown_Jenn " 288 | "#Iremember when the Lakers won title #15 about 4 1/2 hours ago #golakers" 289 | "#xboxe3 Gameplay looks awesome, just as always from Bungie! " 290 | "1 vs 100 on Xbox Live was fun " 291 | "100 followers - that is exciting " 292 | "aaaaahhhh.... finally done. next stop zippy's. " 293 | "About to head to Sydney. Drinks, friends, ink, Amity Affliction, Getaway Plan and Elora Danan! Fun times await! " 294 | "Adding some music on my Ipod " 295 | "ahhhh time to feed the tribe lol chicken casserole with veg yummo " 296 | "All rockstars are back home: while 'some of us' freshen up, 'others' watch Magic/Lakers game, then we'll celebrate Sweden Rock in Florida " 297 | "also, SOOOOO excited to see Taylor Swift, in Omaha, August 9th!! " 298 | "Amazing night in Dallas with my Advanced TV Production class Happy Birhtday, Sam!!" 299 | "and now off to bed after an amazing night chatting with a pretty amazing guy ( you know who you are)" 300 | "And the west was won! Yeah lakers! " 301 | "Another good day nitey nite everyone! First day of school tomorrow!!! Ahh I'm kinda nervous yet excited...til tomorrow!" 302 | "another gorgeous day. ah see england isnt always that bad..... " 303 | "Anyone else love the fact that they #Lakers are not even playing tonight but #Kobe is #5 on trending topics! " 304 | "As a birthday gift i took my sis to Starbucks for the first time...shes gives it two thumbs up " 305 | "As I lay me down to sleep I pray the Lord our souls he keeps! And that Lebron comes to the ny Knicks!!! let's WAKE UP!! (via @iamdiddy)" 306 | "Ashton is going to be Oprah! All hail the Twitter King! LOL!" 307 | "At Starbucks, waiting for some fwends Rainy day, hot choco and jazz. Just perfect!" 308 | "At work...only 2 more days to go and then i am off for 7 more days yay!!!" 309 | "aww, I love the dutch. i'm probably going today to protest Mcdonalds or KFC for @peta2 " 310 | "Back from school. NO MORE SCHOOL FOR TWO WEEKS :)" 311 | "back in seattle. woot woot. Drinkin in Bellevue. I see seattle has warm weather now. Thanks for keeping the heat on while we were gone. " 312 | "Back in Sydney. *Hugs*, Everyone " 313 | "being sexy and really happy " 314 | "Best " 315 | "Blackberry Bold and Kingston 8gb microsd card en route " 316 | "bookin a hotel in sydney this week for july 10 hehe YESS" 317 | "brand new 8GB ipod touch for sale! at reduced price of course. " 318 | "BTW, husband watched Obama " 319 | "By the way... I never imagined that I'd have 111 followers. Thank you all, even the robots " 320 | "byebye lebron.. maybe next year " 321 | "can't wait for the lakers to lose " 322 | "can't wait for vegas " 323 | "Can't wait to see stuff on LittleBigPlanet Portable for PSP. " 324 | "cant wait for the lakers game 2nite " 325 | "cant wait till 3am, mcdonalds breakfast " 326 | "Cavs watch party at the Q tonight again " 327 | "Celebrating Phil being one year cancer free! " 328 | "Chrisette Michelle just came on the ipod . she's so mellow " 329 | "Cookin dinner for the family and luvin it sendin luv to my hubbie,tw,luvladyt,tenise,nasa" 330 | "Dallas vegas goodness http://twitpic.com/3lzt1 On my way to to the SusCon " 331 | "Dang. Heroes and The Hills amazed me tonight! off to do more studying!" 332 | "day 2 is almost over! i can't wait to eat wade...hehe..." 333 | "Diddy has courtside seats at the game...he will see up close and personal the whupping Magic is going to put on the Lakers!! " 334 | "Does anyone else really really LOVE the Lakers? " 335 | "Doing Beat the Bridge race in Seattle this morning! Should b beautiful " 336 | "done with all preparation... starting revision now.. i have my signals ans systems exam today .. wish me luck people.... " 337 | "Eatin nachos watchin the game.! all i gOtta say is LebrOn is a Beast. ! dO anybOdy feel me.? " 338 | "eating mcdonalds and watching b work out! yum! I need to start working out. flabs do not look good on the beach! lols" 339 | "England have won the rugby they played like a well oiled machine " 340 | "England lost to the Netherlands in the cricket!!! hahahahahahahaha that's pure gold " 341 | "England trip update: just saw stonehenge and now heading to london! " 342 | "England winning by 2 goals to nil right now " 343 | "Enjoying a nice cup of tea and watching Norway's winning song on youtube " 344 | "even though i should be in bed im watching videos on youtube i love watching the final dance in step up 2..... http://tinyurl.com/mg8ppc" 345 | "Finally almost home " 346 | "Fine-tuning part of a song Maddy and I r making. Sounding good. I feel pro. Not. Hahaha" 347 | "Finished the Moonwalk at 7hr45mins, and will have raised 1,000GBP for breast cancer. A great night. To contribute: http://bit.ly/6JW9f" 348 | "First day of shows is done and looking forward to Lakers v Magic finals " 349 | "FOLLOW -" 350 | "follow @skate4cancer ! you know you all want to " 351 | "Fox and Hound for the Cavs game....RISE UP!!!! " 352 | "FREE UNLIMITED RINGTONES!!! - http://tinyurl.com/freeringring - USA ONLY - Awesome 4 iphone " 353 | "Getting my hair cut while texting with my brother and getting updates about Iran " 354 | "give me good songs for my ipod please " 355 | "Going On The London Eye Today Wooooo Going To Take Lots Of Pictures So Add Me On Facebook To See Them When I Tweet That I'm Back Okay. " 356 | "Going to Brians graduation. Wearing Taylor Swift's designed dress. " 357 | "going to pick up @30comau in a sec its our anniversary today " 358 | "Going to Seattle again! #boarding SFO to SEA - http://v.cristdr.com/3MK" 359 | "going to see " 360 | "Going to take a week long trip to seattle by myself soon " 361 | "going to the sunshine coast on Thursday should be pretty awesome, i can update my tan " 362 | "Good luck for Lauren and the rest of the cast of TBS, can't wait for something to leak " 363 | "Good morning everyone! It is such a beautiful day here in New England! " 364 | "GOOD MORNING MY LADYS and Jon lol! " 365 | "Good morning to all followers. I wish you a nice Tuesday and good luck with your business. | Allen Verfolgern einen sonnigen Dienstag. " 366 | "Good night and good day twitters! " 367 | "Goodmorning cali hi to my vegas family " 368 | "gosh, i love taylor swift in the least creepiest way, well"4";"@jtripodi hi! You're awsome! K? Bye " 369 | "got a great first verse and chorus goin' i love this song " 370 | "Got me a spangly new PSP-3000 today, in Mystic Silver, with a copy of Resistance Retribution. Happy bunny! " 371 | "Gran Turismo on PSP! FF13 to be on PC! What more can I ask for?! Gran Turismo 5 on PC perhaps? " 372 | "Great day in London, lovely to see all! " 373 | "great song @Sharpatoulas: " 374 | "Great stuff this wk: my bike ride in the wild on Sun."4";"With brian liz sam ryan john and cooper going to starbucks " 375 | "great time at @starbucks with @algarcia3505 " 376 | "great weekend w/family " 377 | "Guess who has a new job?! Oh yeah...i do! And i'm going to be working with @lebron1004! Yay!! " 378 | "GuiPulp is open " 379 | "ha im soo obsessed with taylor swift's album she just soo talented " 380 | "Hacked my PSP with ChickHEN, now it's running 5.00-M33 " 381 | "had a great ending to a great day! Being brave pays off. Thank you Jesus-seriously. Now off to watch the Hills and then bed!" 382 | "Had a great lunch with one of my " 383 | "Had a great night..really enjoying life " 384 | "had a nice lunch at McDonalds so sleepy now" 385 | "Had a piece of fried chicken, some PSP luv and now off to bed. " 386 | "Had my first Iced Carmel Macchiato from Starbucks this year today!! Then got to enjoy the sun and have class outside in the garden! " 387 | "had the bestest day in london with dominic " 388 | "Had to happen, @Oprah is on twitter, and only after 24 hours, she's got 260,371 followers and counting... You go girlfriend!" 389 | "haha realized today my dad says " 390 | "HAHAHA! The LeBron/Kobe commercials are cheering me up " 391 | "HAHAHAHAHAHA seriously uh well we can wait 4 michelle 2 get here and we can go get something from mcdonalds or sonic either 1 its good" 392 | "hanging out with biology til 4am woo !" 393 | "hanging out with Emily. I love her. She's wonderful. " 394 | "Happy birthday sydney elizabeth mitchin " 395 | "Happy birthday to my little sis abi!!! Celebrating with my family sister is now nine! so crazy got her a taylor swift poster " 396 | "Happy Morning, la toat? lumea! " 397 | "Heading to @Starbucks with Keri for a Vanilla Non-Fat Latte and Coffee Cake... YUM " 398 | "Hey @daughter_4oprah! tomorrow is your mom's birthday right? tell her happy bday for me!! tnx! is she going to be on oprah??" 399 | "hey guess was @magicmanil the Lakers won and KOBE is mvp just thought I would tell ya haha" 400 | "hey! there's a sci-fi festival in London this week! http://www.sci-fi-london.com/ http://bit.ly/sfqtb" 401 | "hi Dianna ...this is so cool ... i just recently started twittering " 402 | "Home just in time for Chelsea Lately " 403 | "Home sweet home after a long day in D.C. with Bennett, I had so much fun! I saw the original Obama " 404 | "http://bit.ly/16BoLQ FTSK's version of Taylor Swift's LOVE STORY )) love it! I love them both! " 405 | "http://twitpic.com/5c9gs - A picture of Taylor Swift. I really like this picture. I have an exact one on my walls of my room! " 406 | "http://twitpic.com/5oll7 - We are on board....The Mini stored with the other Minis.... England here we come! " 407 | "http://twitpic.com/6r413 - On the way to Vegas " 408 | "hung out by Notre Dame today hoping for a glimpse of Obama...and I got it! the energy was amazing!" 409 | "I *want* a PSP Go. BBC says it will be at the E3 expo.. can't wait " 410 | "I am heckling some LeBron fans at the sportsbook. Oh how sweet " 411 | "i am working on my media room design and i love love love my client profile " 412 | "i can't wait for the release of Monster Hunter Freedom Unite! and also awaiting little big planet and pixel junk monsters on the psp." 413 | "i cant get over Taylor swift songs, repeated day and long. She's so awesome! " 414 | "i drew a cute baby zebra i think he is my favorite so far " 415 | "i enjoyed shopping with my mam i really cant wait till london " 416 | "I got back from Brazil last week, and everyone there is good. Here we are in full force preparing for the arrival of our boy " 417 | "I have a BIG pool tournament today. If we win, we go to Vegas. Everybody send good vibes my way, please! " 418 | "I have a great fun day with my best friend!! Watch pride and prejudice, eat mcdonalds and taking funny videos " 419 | "i just woke up and im ready to eat my McDonalds " 420 | "I know am the owner of the ultimate iPod thanks to @grantcrusor... My music collection is quite serious. That is all. Good night folks " 421 | "i look forward to seeing mr. mcdonalds on monday " 422 | "I LOVE @Health4UandPets u guys r the best!! " 423 | "i love miley cyrus and taylor swift...theyre music always makes me feel better " 424 | "i love my grandma " 425 | "I love my new tiny cute little iPod! Thank you @Santino_gq! Xoxoxox!!! " 426 | "I loved today " 427 | "i think all musicians should release instrumental versions of their previous albums, like taylor swift did. i love instrumental versions! " 428 | "i think it's awesome that @mishacollins has more votes than @oprah on the tweeterwall. keep up the work! " 429 | "I want to go straight to South of Brazil,, I heard that the average temperature is 5º Celsius... " 430 | "I want to meet Kevin Jonas in person. and Coco Martin. and Taylor Swift. " 431 | "I was watching Pride and Prejudice, one of the best movies ever, I love this movie! and the book too " 432 | "I wonder what jon thinks when he see's all his tweets, i picture him and jordan busting a nut laughing at us. oh yeah.. jordan..." 433 | "I'm getting stuff ready for my super awesome day at Sydney and meeting with John Green. SOO EXCITED!!" 434 | "I'm going to prepare a tea... and then, try to have a nice conversation with a lovely person from England " 435 | "I'm looking some videos of McFly in Brazil, always awsome " 436 | "i'm really happy. i get to see taylor swift and i got my TOMS shoes today " 437 | "I'm so Taylor Swift and Katy Perry. Love them. " 438 | "i'm thinking i'm blessed that ive a cmputr " 439 | "I've got a new toy and have joined iphonedom I love it!" 440 | "im eating a chocolate crakle. i bet you all are jelous. its a mini party " 441 | "im glad you had a good time i wntd to do something nice for u and we did! what r u up to?" 442 | "im going to danniis party nd im gona dress up as a giant duck yaay!!" 443 | "IN LONDON Camp site with my Iphone heheheheeh in a good mood " 444 | "In sunny Sunderland for a wedding. Out tonight, England match tomorrow and wedding itself Sunday. Wedding in haunted house " 445 | "in watched hollywood.tv videos on youtube i love the way alison says " 446 | "inspiration is 98% hard work , 1% luck, and 1% youtube. " 447 | "instead i told my mom to go to mcdonalds hahaha" 448 | "Is currently downloading the taylor swift album lol" 449 | "is going to picnic it up then watch Game 2 of the Lakers-Magic series Go Lakers!" 450 | "is home. Hmmm how lovely to be in 85 degree weather. " 451 | "is humming " 452 | "Is in starbucks with her mummy " 453 | "is interested to see what #Sony plans for their #E3 conference: they might get a new PSP customer out of this is in me if it's good " 454 | "is loving the new Taylor Swift song.. and has renewed love for Last Kiss by Pearl Jam.. " 455 | "Is lying in bed with a babe " 456 | "is watchin Taylor Swift-The Fearless Tour on NBC!!!!!!!!!!!!!!!!!! " 457 | "It's all over and England are the winners with 4 goals to 0. Roll on Wednesday when I can actually watch on terrestrial TV " 458 | "It's Amazing Lakers mission Completed " 459 | "its amazing how a starbucks caramel frappucino can relieve stress..itz seriously an exordianry drink..am feelin much the better now! " 460 | "Jasmine Tea, New iPod, Girls Next Door, Anchorman, dinner in the oven, clean teeth, puppy asleep next to me, the weekend. Oh so good! " 461 | "Jus beat #Barcelona FC 4 - Nil on #PSP... Damn it feels good " 462 | "Just ayden and i hanging @ mcdonalds, having breakfast " 463 | "Just built a shelf for my xbox http://twitpic.com/6idya" 464 | "Just downloaded taylor swift because bauer has got me hooked " 465 | "Just got back from market-market, bought the 8GB memory card for the psp, already. " 466 | "just got home from soccer. Mcdonalds is sooo good " 467 | "just playing psp .. " 468 | "Just saw Sunshine Cleaning. I love Amy Adams " 469 | "just watched Oprah and learned all about " 470 | "Karaok� @ Figa on Mondays feels like a Gossip Girl episode! lots of fun. XOXO " 471 | "kevin ruddy and obama are following me !! haha i fell loved " 472 | "L.A. vs. Orlando...I have faith in L.A. Lakers. I'm a true fan! Go Lakers!!! " 473 | "Lakers and Orlando on NBA Finals! Woohoo! Sorry guys.. but your Nuggets " 474 | "lakers are going to the finals " 475 | "Lakers are going to the finals baby! I heart my boyfriend " 476 | "LAKERS FTW BABYY!! my boys did work tonight!!! " 477 | "Lakers up 13 at half ! Looking good !! " 478 | "Lakers win...Kobe is such a beast I swear....Lebron is still the GOAT ....Magic played a good game though bad calls saved the Lakers ass" 479 | "Lakers! Going to the finals!! Weee! " 480 | "Lakers=WorldCHhampions!!! Wooo!! Dangg The " 481 | "lazy river, hot tub and fire were so relaxing. Magic are kicking the Cavs' butt! It's going to be a fantastic night " 482 | "Lebron have you seen my 3 Championship rings! LOL " 483 | "Left home this morning without my iPhone. Spent an ENTIRE day outdoors without cell, Internet, texts, etc. It was AWESOME! " 484 | "Let's Go Lakers! Kick em in the Nuggs! haha." 485 | "liked seeing President Obama visit 5 Guys " 486 | "Listened to Joanna Wang singing Lets Start from Here on YouTube. Beautiful song, beautiful voice! " 487 | "Listening to Ana Free =D love you Ana. From brazil" 488 | "Listening to Beatles songs on Pandora so get hyped up for Beatles Rock Band #xboxE3" 489 | "listening to dashboard " 490 | "Listening to love story by taylor swift in the car and singing along " 491 | "LOL LOL if you're looking for good laugh, check out the homie @astronautKI facebook page " 492 | "love all the drunk pics on facebook of me " 493 | "Love how my sister thinks she's Miley Cyrus or Taylor Swift " 494 | "Love that the Obamas are bringing back date night. Men, take note http://bit.ly/ebPBZ" 495 | "loves chocolate milk and that is GF YEAH.." 496 | "loving life... and loving you " 497 | "Lunch date with @Londonmitch to go to Leathenhall Market's cheese shop was really nice " 498 | "Made my evening: Starbucks barista complimented me on my hamsa scarf. When he heard I designed it and sell them on etsy he was " 499 | "Matt came to visit! canton with him for dinner then prolly stopping somewhere for drinks and the cavs " 500 | "May be going to London next week " 501 | "me and rosa are going 2 start our own youtube channel! i'm super excited that we get 2 share our RaNdOmNeSs with the world!!!! " 502 | "Modern warfare 2 gameplay looks goood #xboxe3" 503 | "Momz just made it back from Vegas, yayyyyy! " 504 | "Monsters Vs Aliens in 3D was fantastic ... Ginormica is my new favourite superhero " 505 | "More songs on my iPod. Love it! " 506 | "Morning everyone! What a beautiful Day...Yay! " 507 | "Morning Tweetland, a long day ahead! Hope everyone has a great day " 508 | "My hair is blue. " 509 | "My Lakers won, now its time to see Magic win against the Celts " 510 | "my version of coffee is a large sprite from mcdonalds " 511 | "n i thought i was addicted to facebook. this is proof that i like to hear myself talk " 512 | "new music videos today in my YouTube Channel. thanks and please subscribe. lol. http://bit.ly/17NsuD" 513 | "new video on youtube YAYY its shuffling though~ not singing xDD a video for that is comin right up " 514 | "Nice run last night in the rain. Nobody about just me and some twittering birds!! Spinning tonight #triathlon training " 515 | "nice song by Taylor Swift http://bit.ly/hcsm7" 516 | "Nice, my contract was extended for another month " 517 | "Not long till LONDON BABY!!! and then the EMIRATES ON SUNDAY!!!! YEAH!! cant wait... " 518 | "OBAMA SPECIAL ON NBC! Whooo, hooray Brian Williams! " 519 | "Obama was awesome at my graduation tonight! What a hunk " 520 | "of course he makes me feel better " 521 | "Off to London today to film with the gorgeous Davina McCall " 522 | "off to shipmates to watch the laker game lets go lakers!!!" 523 | "Oh man! I just got an idea while filming in the car! i need to talk to my fellow youtuber @sezrules about this!" 524 | "OK all... Off to sleep on my magnetic mattress. http://bit.ly/hPNrI G'night all! Sweet Dreams. " 525 | "OK bedtime for me... We made it through another Monday!!! YAY!!! Night Tweets! *poof*" 526 | "ok... headed to bed. tomorrow I open shop for freddy and eddy " 527 | "omg! well..here is the best news of all time in Brazil with McFly: http://bit.ly/zp3If Look at this boys! " 528 | "On the bus and ready to go about to watch twilight on Meredith's ipod." 529 | "on the train on the way into london to then get another train to see his bestest friend " 530 | "On the way to vegas looking all fierce " 531 | "on youtube trying to find some good videos to see " 532 | "ouuu cant wait for PSP Go i hope kingdom hearts is made for that instead of normal psp cause this is so much better" 533 | "pierce surprised me w/ 2 koi fish today! he's the best!!! did i mention he's remodeling the whole apt?i'm so grateful for him in my life!" 534 | "playing and singing some taylor swift on piano. she writes realli beautiful songs " 535 | "playing with my new toys " 536 | "President obama will be arraive to ksa riyadh tomorrow....welcome there we love you so much " 537 | "psyched about my starbucks card. thanks lynn! " 538 | "Raining Thursday in Brazil!!! And My mom's Birthay... That woman is a warrior... Have a great day everyone!" 539 | "Red Devils champion of England for the 3rd time in a row. Next stop CL title " 540 | "road trip with the boy. so far we have discovered an Obama cafe. haha" 541 | "rolling down the windows, listening to music, " 542 | "Rye143gg (1:58:00 AM): Lol. Ur really pretty....so pls don't pull a britney " 543 | "Sawzy is helping me pack for Vegas " 544 | "Seattle is sunny atm. Need to go take advantage of it before it rains again. I'm outta here for a bit.. " 545 | "Shopping at paperdoll and chilling with @joannasaw " 546 | "SissyDawnie: @CokieTheCat - Marvelous on 10 cancer-free years!!!!! YAY!! *** Thanks! Yeah. We're all pretty psyched about that! " 547 | "skateboarder snowboarder breakdancer surfer NASA person then ill be famous" 548 | "So they pulled it off? Congrats to the Cavs. I know Usher is happy. " 549 | "sooo excited....i just bought tickets to Taylor Swift. ahhhhhh " 550 | "Sounds good to me - a national beer day : http://tinyurl.com/cspspj Now, can we do this over here? " 551 | "spent the evening outside in the beautiful warm weather. finally feels like spring " 552 | "starbucks " 553 | "starbucks to the desk, this is the life " 554 | "starbucks with the soon-to-be bride and her sister " 555 | "Still recovering from a longg morning yesterday at the Today Show with Taylor Swift!! SO FUNN " 556 | "Still up. Playin cards with the girls. " 557 | "taylor swift was " 558 | "Taylor Swift's songs just officially completed my music life. (: Well, not really. I just really love all of her songs. " 559 | "taylor swift's songs make me happy, i don't know why haha " 560 | "Thank gosh for whom ever invented nasal wash! " 561 | "Thank you for #followfriday shout outs @RecipeFeast @freshfromFL @SeattleTallPopp @calm_compassion @CherylDLee and @bacieabbracci " 562 | "Thank you, Afrin Nasal Spray! Also, I got a giant teacup tonight! " 563 | "Thanks to my new followers! " 564 | "That's the Birthday Boy's iPod Touch and DSi set-up and sorted. Just to think I would have been happy with a ball + boots at that age! " 565 | "The Best Day - Taylor Swift " 566 | "THE LAKERS ARE SO GOING TO WIN AND ADVANCE TO THE FINALS " 567 | "The Lakers got it.. hahaah this is why i love LA" 568 | "thinks the moon looks fantabulous tonight. I'm headed to mcdonalds with my Mom. " 569 | "This is awesome, England! Keep it up! " 570 | "This latest episode of NewNowNext is like full of awesome! " 571 | "Those who picked the Cavs are so awwwwfully quiet now! Nananana....nananana....hey hey....goodbye!" 572 | "Time to break out Howard's puppet! Bye bye LeBron! " 573 | "Today is National Cancer Survivors Day!!! Congratulations today is a celebration!! woohoo " 574 | "Today was so great. My roommate and I played some soccer and I also got a nice run in on this sunny day Time for some reading and sleep." 575 | "totally looking forward to my adventure day with @elissaislegit tomorrow Lakers, Zac, the hood, SAYWHHA!?!?! " 576 | "tweeting from my brand new Sony PSP... just because I CAN!! " 577 | "Two major blasts from the past peeps today on Twitter and Facebook. Fun. " 578 | "Uploading pics from Sunday night out - it was a good night, lots of fun, and I got to know my boyfriend's sister better " 579 | "using my new app p-twit for psp and i love it! snitter and p-twit are the best! go and try it yourself.. " 580 | "w00t w00t! momz talks of possible goin' back to texas de brazil 4 another multi-celebration some time soon " 581 | "Waiting for Game 2 To Start! GO LAKERS " 582 | "Waking up in Vegas - Katy Perry i like this song " 583 | "waking up with my nespresso " 584 | "wants the new psp. It's SOOOOO COOL!! " 585 | "was sitting next to the " 586 | "Watched all 3 hours of " 587 | "watchin oprah, then workout time!!...need to get my butt back into my workout routine! " 588 | "watchin the cavs game with friends " 589 | "watching funny youtube vidoes " 590 | "Watching Oprah that I had taped from earlier! It's was about Twitter today! " 591 | "watching the taylor swift special on dateline NBC " 592 | "watching what happens in vegas and lovin it " 593 | "watching you belong to mee video --taylor swift is amazing, love her...my new idol lol " 594 | "Wedding dress shopping at a swanky boutique in Seattle with my best friend....never in a million years could I have imagined this day " 595 | "Went to Calgary with some of the lads (Gaspar had to sit in the boot ) to watch the Flames versus LA Kings. Calgary won 4-1. Nice night." 596 | "What an awesome day Boy am I grateful " 597 | "What I love from the White House special: Barack's favorite chocolates- now as gifts to visitors: Fran's from Seattle!" 598 | "What you know about them Lakers! Haha. . .yeaaaaah baby. " 599 | "Whirlpool Galaxy Deep Field : http://apod.nasa.gov/apod/ap090526.html what an amazing universe " 600 | "Whoo lakers won!! Now I can breathe haha " 601 | "Woo CAVS. Happy Mother's Day! " 602 | "Woohoo! Todays a good day.. Going to get my new ipod! " 603 | "Wow Announcement of new Halo game from Bungie HALO: Reach - this rumor was true too, scheduled for 2010 #xboxe3" 604 | "Wow, another sunny day in London! Maybe it really has changed its mind about being a miserable sod all the time " 605 | "Writin' my 4 pages of notes for my Media Exam this Afternoon, while listenin' to Keith Urban, Taylor Swift n' Dierks Bentley " 606 | "yay verizon might get an iphone " 607 | "Yayy! Just got my ticket for the What Happens In Vegas Tour! " 608 | "yeah guys i will totally bring dublin to the party. he would love the company. " 609 | "Yeah Lakers!!!!!! I am so stinkin' excited!!! Now the cavs need to shut the magic out for the best finals possible!! " 610 | "You belong with me - Taylor Swift " 611 | "You belong with me by Taylor Swift. How come I don't know about this awesome song? Slacking off now." 612 | "You Belong With Me-Taylor Swift ? " 613 | "You Belong With Me"4";"YAY!!!! LAKERS WIN " 614 | "you guys should check out fred on youtube hes our favorite youtube star " 615 | "you know twitter is going places when oprah starts tweeting. oh and larry king! " 616 | "Youtube and Facebook ftw! " 617 | "Youtube comedy people are just amazing. " -------------------------------------------------------------------------------- /data/sunnyData.csv: -------------------------------------------------------------------------------- 1 | Text,IsSunny 2 | sunny,1 3 | rainy,-1 4 | sunny sunny,1 5 | sunny rainy,-1 6 | rainy sunny,-1 7 | rainy rainy,-1 8 | sunny sunny sunny,1 9 | sunny rainy sunny,1 10 | sunny sunny rainy,1 11 | rainy sunny sunny,1 12 | rainy rainy sunny,-1 -------------------------------------------------------------------------------- /data/weather.csv: -------------------------------------------------------------------------------- 1 | Weather,Play 2 | Sunny,No 3 | Overcast,Yes 4 | Rainy,Yes 5 | Sunny,Yes 6 | Sunny,Yes 7 | Overcast,Yes 8 | Rainy,No 9 | Rainy,No 10 | Sunny,Yes 11 | Rainy,Yes 12 | Sunny,No 13 | Overcast,Yes 14 | Overcast,Yes 15 | Rainy,No 16 | -------------------------------------------------------------------------------- /ml_bayes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Setup\n", 8 | "\n", 9 | "This guide was written in Python 3.6.\n", 10 | "\n", 11 | "### Python and Pip\n", 12 | "\n", 13 | "If you haven't already, please download [Python](https://www.python.org/downloads/) and [Pip](https://pip.pypa.io/en/stable/installing/).\n", 14 | "\n", 15 | "Let's install the modules we'll need for this tutorial. Open up your terminal and enter the following commands to install the needed python modules: \n", 16 | "\n", 17 | "```\n", 18 | "pip3 install scikit-learn==0.18.1\n", 19 | "pip3 install nltk==3.2.4\n", 20 | "```" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## Introduction\n", 28 | "\n", 29 | "In this tutorial set, we'll review the Naive Bayes Algorithm used in the field of machine learning. Naive Bayes works on Bayes Theorem of probability to predict the class of a given data point, and is extremely fast compared to other classification algorithms. \n", 30 | "\n", 31 | "Because it works with an assumption of independence among predictors, the Naive Bayes model is easy to build and particularly useful for large datasets. Along with its simplicity, Naive Bayes is known to outperform even some of the most sophisticated classification methods.\n", 32 | "\n", 33 | "This tutorial assumes you have prior programming experience in Python and probablility. While I will overview some of the priciples in probability, this tutorial is **not** intended to teach you these fundamental concepts. If you need some background on this material, please see my tutorial [here](https://github.com/lesley2958/intro-stats).\n", 34 | "\n", 35 | "\n", 36 | "### Bayes Theorem\n", 37 | "\n", 38 | "Recall Bayes Theorem, which provides a way of calculating the *posterior probability*: \n", 39 | "\n", 40 | "$$ P(A \\mid B) = \\frac{P(B \\mid A) \\, P(A)}{P(B)} $$" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "Before we go into more specifics of the Naive Bayes Algorithm, we'll go through an example of classification to determine whether a sports team will play or not based on the weather. Specifically, we'll classify whether or not a team will play if it is sunny.\n", 48 | "\n", 49 | "To start, we'll load in the data, which you can find [here](https://github.com/lesley2958/ml-bayes/blob/master/data/weather.csv)." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 1, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "import pandas as pd\n", 61 | "f1 = pd.read_csv(\"./data/weather.csv\")" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "Before we go any further, let's take a look at the dataset we're working with. It consists of 2 columns (excluding the indices), *weather* and *play*. The *weather* column consists of one of three possible weather categories: `sunny`, `overcast`, and `rainy`. The *play* column is a binary value of `yes` or `no`, and indicates whether or not the sports team played that day." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 2, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "data": { 78 | "text/html": [ 79 | "
\n", 80 | "\n", 93 | "\n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | "
WeatherPlay
0SunnyNo
1OvercastYes
2RainyYes
\n", 119 | "
" 120 | ], 121 | "text/plain": [ 122 | " Weather Play\n", 123 | "0 Sunny No\n", 124 | "1 Overcast Yes\n", 125 | "2 Rainy Yes" 126 | ] 127 | }, 128 | "execution_count": 2, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "f1.head(3)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "#### Frequency Table\n", 142 | "\n", 143 | "If you recall from probability theory, frequencies are an important part of eventually calculating the probability of a given class. In this section of the tutorial, we'll first convert the dataset into different frequency tables, using the `groupby()` function. First, we retrieve the frequences of each combination of weather and play columns: " 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 3, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "name": "stdout", 153 | "output_type": "stream", 154 | "text": [ 155 | "Weather Play\n", 156 | "Overcast Yes 4\n", 157 | "Rainy No 3\n", 158 | " Yes 2\n", 159 | "Sunny No 2\n", 160 | " Yes 3\n", 161 | "dtype: int64\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "df = f1.groupby(['Weather','Play']).size()\n", 167 | "print(df)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "It will also come in handy to split the frequencies by weather and yes/no. Let's start with the three weather frequencies:" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 4, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "name": "stdout", 184 | "output_type": "stream", 185 | "text": [ 186 | " Play\n", 187 | "Weather \n", 188 | "Overcast 4\n", 189 | "Rainy 5\n", 190 | "Sunny 5\n" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "df2 = f1.groupby('Weather').count()\n", 196 | "print(df2)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "And now for the frequencies of yes and no:" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 5, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "name": "stdout", 213 | "output_type": "stream", 214 | "text": [ 215 | " Weather\n", 216 | "Play \n", 217 | "No 5\n", 218 | "Yes 9\n" 219 | ] 220 | } 221 | ], 222 | "source": [ 223 | "df1 = f1.groupby('Play').count()\n", 224 | "print(df1)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "#### Likelihood Table\n", 232 | "\n", 233 | "The frequencies of each class are important in calculating the likelihood, or the probably that a certain class will occur. Using the frequency tables we just created, we'll find the likelihoods of each weather condition and yes/no. We'll accomplish this by adding a new column that takes the frequency column and divides it by the total data occurances:" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 6, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | " Weather Likelihood\n", 246 | "Play \n", 247 | "No 5 0.357143\n", 248 | "Yes 9 0.642857\n", 249 | " Play Likelihood\n", 250 | "Weather \n", 251 | "Overcast 4 0.285714\n", 252 | "Rainy 5 0.357143\n", 253 | "Sunny 5 0.357143\n" 254 | ] 255 | } 256 | ], 257 | "source": [ 258 | "df1['Likelihood'] = df1['Weather']/len(f1)\n", 259 | "df2['Likelihood'] = df2['Play']/len(f1)\n", 260 | "print(df1)\n", 261 | "print(df2)" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "Now, we're able to use the Naive Bayesian equation to calculate the posterior probability for each class. The highest posterior probability is the outcome of prediction.\n" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "#### Calculation\n", 276 | "\n", 277 | "Now, let's get back to our question: *Will the team play if the weather is sunny?*\n", 278 | "\n", 279 | "From this question, we can construct Bayes Theorem. Because the *know* factor is that it is sunny, the $P(A \\mid B)$ becomes $P(Yes \\mid Sunny)$. From there, it's just a matter of plugging in probabilities. \n", 280 | "\n", 281 | "$$ P(Yes \\mid Sunny) = \\frac{P(Sunny \\mid Yes) \\, P(Yes)}{P(Sunny)} $$\n", 282 | "\n", 283 | "Since we already created some likelihood tables, we can just index `P(Sunny)` and `P(Yes)` off the tables:" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 8, 289 | "metadata": { 290 | "collapsed": true 291 | }, 292 | "outputs": [], 293 | "source": [ 294 | "ps = df2['Likelihood']['Sunny']\n", 295 | "py = df1['Likelihood']['Yes']" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "That leaves us with $P(Sunny \\mid Yes)$. This is the probability that the weather is sunny given that the players played that day. In `df`, we see that the total number of `yes` days under `sunny` is 3. We take this number and divide it by the total number of `yes` days, which we can get from `df`:" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 9, 308 | "metadata": { 309 | "collapsed": true 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "psy = df['Sunny']['Yes']/df1['Weather']['Yes']" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "And finally, we can just plug these variables into bayes theorem: " 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 10, 326 | "metadata": {}, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "0.6\n" 333 | ] 334 | } 335 | ], 336 | "source": [ 337 | "p = (psy*py)/ps\n", 338 | "print(p)" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "This tells us that there's a 60% likelihood of the team playing if it's sunny. Because this is a binary classification of yes or no, a value greater than 50% indicates a team *will* play. " 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "### Naive Bayes Evaluation\n", 353 | "\n", 354 | "Every classifier has pros and cons, whether that be in terms of computational power, accuracy, etc. In this section, we'll review the pros and cons of Naive Bayes.\n", 355 | "\n", 356 | "#### Pros\n", 357 | "\n", 358 | "Naive Bayes is incredibly easy and fast in predicting the class of test data. It also performs well in multi-class prediction.\n", 359 | "\n", 360 | "When the assumption of independence is true, the Naive Bayes classifier performs better thanother models like logistic regression. It does this, and with less need of a lot of data.\n", 361 | "\n", 362 | "Naive Bayes also performs well with categorical input variables compared to numerical variable(s), which is why we're able to use it for text classification. For numerical variables, normal distribution must be assumed." 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "#### Cons\n", 370 | "\n", 371 | "If a categorical variable has a category not observed in the training data set, then model will assign a 0 probability and will be unable to make a prediction. This is referred to as “Zero Frequency”. To solve this, we can use the smoothing technique, such as Laplace estimation.\n", 372 | "\n", 373 | "Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.\n" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "## Naive Bayes Types\n", 381 | "\n", 382 | "With `scikit-learn`, we can implement Naive Bayes models in Python. There are three types of Naive Bayes models, all of which we'll review in the following sections.\n", 383 | "\n", 384 | "### Gaussian\n", 385 | "\n", 386 | "The Gaussian Naive Bayes Model is used in classification and assumes that features will follow a normal distribution. \n", 387 | "\n", 388 | "We begin an example by importing the needed modules:" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 11, 394 | "metadata": { 395 | "collapsed": true 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "from sklearn.naive_bayes import GaussianNB\n", 400 | "import numpy as np" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "As always, we need predictor and target variables, so we assign those:" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 12, 413 | "metadata": { 414 | "collapsed": true 415 | }, 416 | "outputs": [], 417 | "source": [ 418 | "x = np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]])\n", 419 | "y = np.array([3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4])" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "Now we can initialize the Gaussian Classifier:\n" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 14, 432 | "metadata": { 433 | "collapsed": true 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "model = GaussianNB()" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "Now we can train the model using the training sets:\n" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 15, 450 | "metadata": {}, 451 | "outputs": [ 452 | { 453 | "data": { 454 | "text/plain": [ 455 | "GaussianNB(priors=None)" 456 | ] 457 | }, 458 | "execution_count": 15, 459 | "metadata": {}, 460 | "output_type": "execute_result" 461 | } 462 | ], 463 | "source": [ 464 | "model.fit(x, y)" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 17, 470 | "metadata": {}, 471 | "outputs": [ 472 | { 473 | "name": "stdout", 474 | "output_type": "stream", 475 | "text": [ 476 | "[3 4]\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "predicted = model.predict([[1,2],[3,4]])\n", 482 | "print(predicted)" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "### Multinomial\n", 490 | "\n", 491 | "MultinomialNB implements the multinomial Naive Bayes algorithm and is one of the two classic Naive Bayes variants used in text classification. This classifier is suitable for classification with discrete features (such as word counts for text classification). \n", 492 | "\n", 493 | "The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts may also work.\n", 494 | "\n", 495 | "First, we need some data, so we import numpy and \n" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 18, 501 | "metadata": { 502 | "collapsed": true 503 | }, 504 | "outputs": [], 505 | "source": [ 506 | "import numpy as np\n", 507 | "x = np.random.randint(5, size=(6, 100))\n", 508 | "y = np.array([1, 2, 3, 4, 5, 6])" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "Now let's build the Multinomial Naive Bayes model: \n" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": 19, 521 | "metadata": {}, 522 | "outputs": [ 523 | { 524 | "data": { 525 | "text/plain": [ 526 | "MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)" 527 | ] 528 | }, 529 | "execution_count": 19, 530 | "metadata": {}, 531 | "output_type": "execute_result" 532 | } 533 | ], 534 | "source": [ 535 | "from sklearn.naive_bayes import MultinomialNB\n", 536 | "clf = MultinomialNB()\n", 537 | "clf.fit(x, y)" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "Let's try an example:\n" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 20, 550 | "metadata": {}, 551 | "outputs": [ 552 | { 553 | "name": "stdout", 554 | "output_type": "stream", 555 | "text": [ 556 | "[3]\n" 557 | ] 558 | } 559 | ], 560 | "source": [ 561 | "print(clf.predict(x[2:3]))" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "### Bernoulli\n", 569 | "\n", 570 | "Like MultinomialNB, this classifier is suitable for discrete data. BernoulliNB implements the Naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions, meaning there may be multiple features but each one is assumed to be a binary value. \n", 571 | "\n", 572 | "The decision rule for Bernoulli Naive Bayes is based on\n", 573 | "\n", 574 | "![alt text](https://github.com/lesley2958/ml-bayes/blob/master/bernoulli.png?raw=true \"Logo Title Text 1\")" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 21, 580 | "metadata": { 581 | "collapsed": true 582 | }, 583 | "outputs": [], 584 | "source": [ 585 | "import numpy as np\n", 586 | "x = np.random.randint(2, size=(6, 100))\n", 587 | "y = np.array([1, 2, 3, 4, 4, 5])" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 22, 593 | "metadata": {}, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "[3]\n" 600 | ] 601 | } 602 | ], 603 | "source": [ 604 | "from sklearn.naive_bayes import BernoulliNB\n", 605 | "clf = BernoulliNB()\n", 606 | "clf.fit(x, y)\n", 607 | "print(clf.predict(x[2:3]))" 608 | ] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "### Tips for Improvement\n", 615 | "\n", 616 | "If continuous features don't have a normal distribution (recall the assumption of normal distribution), you can use different methods to convert it to a normal distribution.\n", 617 | "\n", 618 | "As we mentioned before, if the test data set has a zero frequency issue, you can apply smoothing techniques “Laplace Correction” to predict the class.\n", 619 | "\n", 620 | "As usual, you can remove correlated features since the correlated features would be voted twice in the model and it can lead to over inflating importance." 621 | ] 622 | }, 623 | { 624 | "cell_type": "markdown", 625 | "metadata": {}, 626 | "source": [ 627 | "## Joint Models\n", 628 | "\n", 629 | "If you have input data x and want to classify the data into labels y. A generative model learns the joint probability distribution `p(x,y)` and a discriminative model learns the conditional probability distribution `p(y|x)`.\n", 630 | "\n", 631 | "Here's an simple example of this form:\n", 632 | "\n", 633 | "```\n", 634 | "(1,0), (1,0), (2,0), (2, 1)\n", 635 | "```\n", 636 | "\n", 637 | "`p(x,y)` is\n", 638 | "\n", 639 | "```\n", 640 | " y=0 y=1\n", 641 | " -----------\n", 642 | "x=1 | 1/2 0\n", 643 | "x=2 | 1/4 1/4\n", 644 | "```\n", 645 | "\n", 646 | "Meanwhile,\n", 647 | "\n", 648 | "```\n", 649 | "p(y|x) is\n", 650 | "```\n", 651 | "\n", 652 | "```\n", 653 | " y=0 y=1\n", 654 | " -----------\n", 655 | "x=1 | 1 0\n", 656 | "x=2 | 1/2 1/2\n", 657 | "```\n", 658 | "\n", 659 | "Notice that if you add all 4 probabilities in the first chart, they add up to 1, but if you do the same for the second chart, they add up to 2. This is because the probabilities in chart 2 are read row by row. Hence, `1+0=1` in the first row and `1/2+1/2 = 1` in the second.\n", 660 | "\n", 661 | "The distribution `p(y|x)` is the natural distribution for classifying a given example `x` into a class `y`, which is why algorithms that model this directly are called discriminative algorithms. \n", 662 | "\n", 663 | "Generative algorithms model `p(x, y)`, which can be tranformed into `p(y|x)` by applying Bayes rule and then used for classification. However, the distribution `p(x, y)` can also be used for other purposes. For example you could use \n", 664 | "6`p(x,y)` to generate likely `(x, y)` pairs." 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": null, 670 | "metadata": { 671 | "collapsed": true 672 | }, 673 | "outputs": [], 674 | "source": [] 675 | } 676 | ], 677 | "metadata": { 678 | "kernelspec": { 679 | "display_name": "Python 3", 680 | "language": "python", 681 | "name": "python3" 682 | }, 683 | "language_info": { 684 | "codemirror_mode": { 685 | "name": "ipython", 686 | "version": 3 687 | }, 688 | "file_extension": ".py", 689 | "mimetype": "text/x-python", 690 | "name": "python", 691 | "nbconvert_exporter": "python", 692 | "pygments_lexer": "ipython3", 693 | "version": "3.6.1" 694 | } 695 | }, 696 | "nbformat": 4, 697 | "nbformat_minor": 2 698 | } 699 | -------------------------------------------------------------------------------- /ml_classification.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Introduction\n", 8 | "\n", 9 | "Classification is a machine learning algorithm whose output is within a set of discrete labels. This is in contrast to Regression, which involved a target variable that was on a continuous range of values.\n", 10 | "\n", 11 | "Throughout this tutorial we'll review different classification algorithms, but you'll notice that the workflow is consistent: split the dataset into training and testing datasets, fit the model on the training set, and classify each data point in the testing set. " 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## LDA vs QDA\n", 19 | "\n", 20 | "Discriminant Analysis is a statistical technique used to classify data into groups based on each data point's features. " 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "### Linear Discriminant Analysis\n", 28 | "\n", 29 | "Linear Discriminant Analysis (LDA) is a linear classification algorithm used for when it can be assumed that the covariance is the same for all classes. It's mostly used as a dimension reduction technique in the pre-processing portion so that the different classes are separable and therefore easier to classify and avoid overfitting. \n", 30 | "\n", 31 | "LDA is similar to Principal Component Analysis (PCA), but LDA also considers the axes that maximize the separation between multiple classes. It works by finding the linear combinations of the original variables that gives the best separation between the groups in the data set. " 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "### Quadratic Discriminant Analysis\n", 39 | "\n", 40 | "Quadratic Discriminant Analysis, on the other hand, is used for heterogeneous variance-covariance matrices. Because QDA has more parameters to estimate, it's typically less accurate than LDA." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## Support Vector Machines\n", 48 | "\n", 49 | "Support Vector Machines (SVMs) is a machine learning algorithm used for classification tasks. Its primary goal is to find an optimal separating hyperplane that separates the data into its classes. This optimal separating hyperplane is the result of the maximization of the margin of the training data.\n", 50 | "\n", 51 | "### Separating Hyperplane\n", 52 | "\n", 53 | "Let's take a look at the scatterplot of the iris dataset that could easily be used by a SVM: \n", 54 | "\n", 55 | "![alt text](https://github.com/lesley2958/regression/blob/master/log-scatter.png?raw=true \"Logo Title Text 1\")\n", 56 | "\n", 57 | "Just by looking at it, it's fairly obvious how the two classes can be easily separated. The line which separates the two classes is called the separating hyperplane.\n", 58 | "\n", 59 | "In this example, the hyperplane is just two-dimensional, but SVMs can work in any number of dimensions, which is why we refer to it as hyperplane.\n", 60 | "\n", 61 | "#### Optimal Separating Hyperplane\n", 62 | "\n", 63 | "Going off the scatter plot above, there are a number of separating hyperplanes. The job of the SVM is find the optimal one. \n", 64 | "\n", 65 | "To accomplish this, we choose the separating hyperplane that maximizes the distance from the datapoints in each category. This is so we have a hyperplane that generalizes well.\n", 66 | "\n", 67 | "### Margins\n", 68 | "\n", 69 | "Given a hyperplane, we can compute the distance between the hyperplane and the closest data point. With this value, we can double the value to get the margin. Inside the margin, there are no datapoints. \n", 70 | "\n", 71 | "The larger the margin, the greater the distance between the hyperplane and datapoint, which means we need to maximize the margin. \n", 72 | "\n", 73 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/margin.png?raw=true \"Logo Title Text 1\")\n", 74 | "\n", 75 | "### Equation\n", 76 | "\n", 77 | "Recall the equation of a hyperplane: wTx = 0. Here, `w` and `x` are vectors. If we combine this equation with `y = ax + b`, we get:\n", 78 | "\n", 79 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/wt.png?raw=true \"Logo Title Text 1\")\n", 80 | "\n", 81 | "This is because we can rewrite `y - ax - b = 0`. This then becomes:\n", 82 | "\n", 83 | "wTx = -b Χ (1) + (-a) Χ x + 1 Χ y\n", 84 | "\n", 85 | "This is just another way of writing: wTx = y - ax - b. We use this equation instead of the traditional `y = ax + b` because it's easier to use when we have more than 2 dimensions. \n", 86 | "\n", 87 | "#### Example\n", 88 | "\n", 89 | "Let's take a look at an example scatter plot with the hyperlane graphed: \n", 90 | "\n", 91 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/ex1.png?raw=true \"Logo Title Text 1\")\n", 92 | "\n", 93 | "Here, the hyperplane is x2 = -2x1. Let's turn this into the vector equivalent:\n", 94 | "\n", 95 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/vectex1.png?raw=true \"Logo Title Text 1\")\n", 96 | "\n", 97 | "Let's calculate the distance between point A and the hyperplane. We begin this process by projecting point A onto the hyperplane.\n", 98 | "\n", 99 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/projex1.png?raw=true \"Logo Title Text 1\")\n", 100 | "\n", 101 | "Point A is a vector from the origin to A. So if we project it onto the normal vector w: \n", 102 | "\n", 103 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/normex1.png?raw=true \"Logo Title Text 1\")\n", 104 | "\n", 105 | "This will get us the projected vector! With the points (3,4) and (2,1) [this came from w = (2,1)], we can compute `||p||`. Now, it'll take a few steps before we get there.\n", 106 | "\n", 107 | "We begin by computing `||w||`: \n", 108 | "\n", 109 | "`||w||` = √(22 + 12) = √5. If we divide the coordinates by the magnitude of `||w||`, we can get the direction of w. This makes the vector u = (2/√5, 1/√5).\n", 110 | "\n", 111 | "Now, p is the orthogonal prhoojection of a onto w, so:\n", 112 | "\n", 113 | "![alt text](https://github.com/lesley2958/ml-classification/blob/master/orthproj.png?raw=true \"Logo Title Text 1\")\n", 114 | "\n", 115 | "#### Margin Computation\n", 116 | "\n", 117 | "Now that we have `||p||`, the distance between A and the hyperplane, the margin is defined by:\n", 118 | "\n", 119 | "margin = 2||p|| = 4√5. This is the margin of the hyperplane!\n" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "## Iris Classification\n", 127 | "\n", 128 | "Let's perform the iris classification from earlier with a support vector machine model:" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 1, 134 | "metadata": { 135 | "collapsed": true 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "import numpy as np\n", 140 | "import matplotlib.pyplot as plt\n", 141 | "from sklearn import svm, datasets\n", 142 | "iris = datasets.load_iris()\n", 143 | "X = iris.data[:, :2] \n", 144 | "y = iris.target" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "Now we can create an instance of SVM and fit the data. For this to happen, we need to declare a regularization parameter, C. " 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 2, 157 | "metadata": { 158 | "collapsed": true 159 | }, 160 | "outputs": [], 161 | "source": [ 162 | "C = 1.0 \n", 163 | "svc = svm.SVC(kernel='linear', C=1, gamma=1).fit(X, y)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "Based off of this classifier, we can create a mesh graph:" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 3, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "data": { 180 | "text/plain": [ 181 | "" 182 | ] 183 | }, 184 | "execution_count": 3, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n", 191 | "y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n", 192 | "h = (x_max / x_min)/100\n", 193 | "xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n", 194 | "np.arange(y_min, y_max, h))\n", 195 | "plt.subplot(1, 1, 1)" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "Now we pull the prediction method on our data:" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 4, 208 | "metadata": { 209 | "collapsed": true 210 | }, 211 | "outputs": [], 212 | "source": [ 213 | "Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])\n", 214 | "Z = Z.reshape(xx.shape)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "Lastly, we visualize this with matplotlib:" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 5, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "data": { 231 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvXmUJHd15/u5EZFb7dXd1Xu3Wi21NiQkhMxiMRhkzwy2\nefAOlmc0YmAk80bAmGd7xvNsNO89sDlj7BGzeLB8jPSMBW0Dki0wCOQFvLANBiOB0IJoJKTe9641\nq7IyMyLu+yOiqnOJzMqsyqz1fs7J05mRv/rFzeyquPG739+9V1QVwzAMwwBwVtoAwzAMY/VgTsEw\nDMOYx5yCYRiGMY85BcMwDGMecwqGYRjGPOYUDMMwjHnMKRjrGhHJi8j+Ju8fFpGfanGuO0Tk663O\nvVyIyOtE5PhK2wHtfZ/G6sScgtERROQ1IvINEZkQkVER+V8i8mMi8ioRmRaRvoSf+a6IvCd+nhaR\n3xCR5+Lxh0Xkj0Rk31LsUtU+VX0hPsfHROQ/L2W+RnMbxnrBnIKxZERkAPgC8HvAJmAX8JtAUVW/\nCRwHbq35mWuBa4BPxYceBt4E3A4MAtcDjwM/uQwfYc0gIt5anNtYO5hTMDrBFQCq+ilVDVS1oKpf\nVNUn4/c/Dry95mfeDvyFql6Iww3/FHizqn5bVX1VnVDV31fVj9aeTETuFJHPV7x+TkT+rOL1MRG5\nIX6uInK5iNwFvBX4tTjs8/mKKW8QkSfjVc5DIpJt5UPPzR0//5iI/L6IPCoiUyLyLRG5rGLsVSLy\npXgVdUhE/kXFez8br5omY9t/o+K9ffF53iEiR4G/a8GuXxKR74vI7vj1G0XkCREZj1dzL60Ye1hE\nfl1EngSmRcSLj/3HRt9Js/mMdYCq2sMeS3oAA8AFoov/TwPDNe/vAXxgT/zaIVo9/O/x698BvtLG\n+fYD4/E8O4EjwPGK98YAJ36twOXx848B/7lmrsPAP8bzbAKeBd7V4Lx3AF+veF079wXgFYAHfAJ4\nMH6vFzgG3Bm/9zLgPHBN/P7rgOviz/NS4EzFd7MvPs/BeJ5cgl2vq/j87wO+A4zEr18GnAVeCbjA\nv4k/c6bi8z8R/x/lFvpOWpzvp1b6d9Iei3/YSsFYMqo6CbyG6OL1/wHnROQREdkWv38M+DLwtvhH\nfhLIAI/GrzcDp9o43wvAFHAD8Frgr4GTInIV8BPA11Q1bOMjfFhVT6rqKPD5eN7F8Oeq+o+q6hM5\nhbl53ggcVtUHNFoFfRf4NPDz8ef5sqo+paqhRqurT8Wfo5LfUNVpVS00OLeIyH8H/hnwelU9Fx+/\nC7hPVb+l0Sru40AReFXN5z9WM3ej76SV+Yw1jDkFoyOo6rOqeoeq7gauJbrL/N2KIR/nolN4G9Fd\ndDl+fQHY0eYpv0J0h/za+PmXiS6kPxG/bofTFc9ngDpRfInzXAK8Mg63jIvIOFEoazuAiLxSRP5e\nRM6JyATwLmBLzdzHFjj3ENEF+7dVdaLi+CXAr9acew/R/0+zuZt9loXmM9Yw5hSMjqOqPyAKp1xb\ncfgzwG4ReT3wFiInMcffAK+Yi4G3yJxT+Cfx86+wsFNYqZLAx4jCY0MVjz5VfXf8/ieBR4jCa4PA\nRwCpmWMh28eIViQPiMjNNef+rZpz96jqpyrGtPO9tDKfsYYxp2AsmVhE/dUKYXMP8K+Ab86NUdVp\noh1GDwBHVPWxivf+BvgS8Oci8vJY7OwXkXeJyC80OO1XgNcTxcGPA18D3kAUivpug585Q6Q5LDdf\nAK4QkbeJSCp+/JiIXB2/3w+MquqsiLyCaAdW26jql4lWIJ+J54EonPeueDUiItIbC9v9i/wsnZ7P\nWGWYUzA6wRSR8PgtEZkmcgZPA79aM+7jROGHgwlz3Ar8BfAQMBH//E1Eq4g6VPWHQJ7IGczpGi8A\n/0tVgwZ2fhS4Jg57fLblT7dEVHWKKNZ/G3CSKDTzX4h0FYB/B3xARKaIhOI/XcK5vgT8AvB5Ebkx\ndr7/FriXaDXxPJFgvtj5OzqfsfoQVWuyYxiGYUTYSsEwDMOYx5yCYRiGMY85BcMwDGMecwqGYRjG\nPGuuANbg8Gbdtqud7eyGYawlnMOHKKVTpDJr7vK0qvnh6fHzqjqy0Lg1961v27Wb33v4iytthmEY\nXeLVj/06j37hCDsv27zSpqwrXvfBTx9pZZyFjwzDWBWU04+y49lnePQLR0BqE7qN5WLNrRQMw1h/\n3PvM7dz+ZyHPAI7nsv2SoZU2acNiTsEwjBWl746buX3fHgALGa0CLHxkGMaK8erHfp2j+/aAmENY\nLdhKwTCMFaFSP9i5f9NKm2PEmFMwDGNZKacf5du/5nGV94DpB6sQcwqGYSwrw7d/kKv27bFw0SrF\nNAXDMJaV/l+/v76FkLFqsJWCYRjLQm3YyFidmFMwDKPrzOUhXOXZLqPVjjkFwzC6iuUhrC1MUzAM\no2tYHsLaw1YKhmF0BctDWJuYUzAMo6NYHsLaxpyCYRgd5bXf+CpTnpW+Xqt0VVMQkcMi8pSIPCEi\njyW8LyLyYRF5XkSeFJEbu2mPYRjd53Dv2y0PYQ2zHCuF16vq+Qbv/TRwIH68EviD+F/DMNYYloew\nPljp8NGbgYOqqsA3RWRIRHao6qkVtsswjDawPIT1Q7e3pCrwRRF5XETuSnh/F3Cs4vXx+FgVInKX\niDwmIo9NjI12yVTDMBZD3x03c/ufhYA5hPVAt53Ca1T1RqIw0S+KyGsXM4mq3q+qN6nqTYPDtrXN\nMFYLO559xvIQ1hldDR+p6on437Mi8ufAK4CvVgw5AeypeL07PmYYxiqmUj+wPIT1Rdecgoj0Ao6q\nTsXP/xnwgZphjwDvEZEHiQTmCdMTDGN1Y/rB+qabK4VtwJ+LyNx5PqmqfyUi7wJQ1Y8AfwH8DPA8\nMAPc2UV7DMPoAJ8o7OFRLA9hvdI1p6CqLwDXJxz/SMVzBX6xWzYYhtFZyulH+W+fvYWrUg+stClG\nl1jpLamGYawRqsJG+22VsF4xp2AYxoJY+euNg5XONgyjKVb+emNhKwXDMBpi5a83HuYUDMOow8pf\nb1zMKRiGUYXlIWxsTFMwDKOK33L+b8AcwkbFnIJhGPOU04/y6U8cB7GGCBsVCx8ZhgFYHoIRYU7B\nMAzLQzDmsfCRYWxwLA/BqMRWCoaxgbE8BKMWcwqGsQGxPASjEeYUDGODYXkIRjNMUzCMDYblIRjN\nMKdgGBuIe5+53fIQjKZ03SmIiCsi3xWRLyS8d4eInBORJ+LH/9Ftewxjo9J3x81x2OgBE5WNhiyH\npvDLwLPAQIP3H1LV9yyDHYaxYXn1Y7/Oo3PbTi0xzWhCV1cKIrIb+FngD7t5HsMwGlO97dQcgtGc\nbq8Ufhf4NaC/yZifE5HXAj8E/r2qHqsdICJ3AXcBbN25uxt2Gsa6w7adGouha05BRN4InFXVx0Xk\ndQ2GfR74lKoWReSdwMeBW2oHqer9wP0AV1x7vXbJZGOdoqrM+iHFIEQV0q6QS7k461hstW2nxmLp\nZvjoZuBNInIYeBC4RUT+pHKAql5Q1WL88g+Bl3fRHmODki8FFPyQUEGBYqBMzvqors/7izlBGcwh\nGO3TNaegqner6m5V3QfcBvydqv7ryjEisqPi5ZuIBGnD6BhBqJTD+ot/CBT9cPkN6jLl9KN8cff7\nrI6RsWiWPaNZRD4APKaqjwC/JCJvAnxgFLhjue0x1jd+gkOYf28drhTu++4nuN0LcVx3pU0x1ijL\n4hRU9cvAl+Pn76s4fjdw93LYYGxMnCaywXrTFCrLX5uobCwWy2g21jWeIw0dQ8ZdP7/+Vv7a6BTr\n56/CMBIQEQYyHl6FZ3AE+tMubrNlxBrC8hCMTmJVUo11jxM7hjDWEITIWax1LA/B6AbmFIwNw3rS\nECwPwegW5hQMY41h/ZSNbmKagrHmUFWmSz6jhTKjhTJTRZ+gydbT9YTlIRjdxlYKxppjqhRU5R+U\nQ2Wy6DOY9dZViCgJy0Mwuo05BWNN4YeamJCmRBnKudT6vVhaHoKxHFj4yFhTNAsTNcteXutYHoKx\nXNhKwVh25grRLWZbaLPcAm+d5B1UUk4/yt7v7avIQ7COaUZ3MadgLBuqynQ5oBRETsEVoTfttnUx\n9xzBE6mrWyRAxltfC9+5bafPgOUhGMuGOQVj2ZgqBlUX80CVqUUIxP0Zt8q5eI7Qu876I9i2U2Ol\nMKdgLAt+qIlVSRWY9UN62hCIRYS+tLekMNRqxvopGyvJ+lpvG6uWZgLxYnMMRGTdOYR7n7md//bZ\nW6yOkbFimFMwloXVLBCr6nxdpJXmvR86wlXeAyYoGytG152CiLgi8l0R+ULCexkReUhEnheRb4nI\nvm7bY6wMniOJF/+VFIjDWNMYm/UZn/WZmC2v6LbWym2nhrFSLMdf4y/TuM3mO4AxVb0c+B/Af1kG\ne4wVoj/tknVl/pqXcqLqpSshEGvsECpbdQYKk0V/RVYNVv7aWC10VWgWkd3AzwK/BfyHhCFvBn4j\nfv4wcK+IiK7XjuobHBGhJ+3Rs9KGEAnfQYPfsnaF76Vg5a+N1Ua3dx/9LvBrQH+D93cBxwBU1ReR\nCWAzcL5ykIjcBdwFsHXn7q4Za2wcmkWJlmulYOWvjdVI18JHIvJG4KyqPr7UuVT1flW9SVVvGhw2\nAW6j4Ych5SAkDMOOzdlM+HaXIZzVd8fN3P5n0ecxh2CsJrqpKdwMvElEDgMPAreIyJ/UjDkB7AEQ\nEQ8YBC500SZjDeGHIaOFMpPFgKlSwHgxoFAOOjK35wipBsJ3tsvCt9UxMlYzXfvtV9W7VXW3qu4D\nbgP+TlX/dc2wR4B/Ez+/NR5jeoJBGIZMFusdQMGPVg2doC/tkvWcqD0nkHaFwazX1dyHe5+53QRl\nY1Wz7HsBReQDIvKm+OVHgc0i8jyREP3e5bbHWJ0UG6nAwEypM6sFEaEn5TKcSzGcS9GX7v5OqPd+\n6AiA5SEYq5ZlKXOhql8Gvhw/f1/F8Vng55fDBmNt0Uzs7ZyysLxUlq8wjNWK1T4yukbRDwgV0g64\nbXYKSzkOxSB5ReAu8qIahIrGP7/c5TGq8xBsldANgnIJVHFS6QX/f1WV0C+3PH4jYU7B6DhFP2C6\nfPF+vgCkHKU/0/qvW9pzkHJA0nqhL92egwlCJV/y5/MSBOhNu6Td7kdPLQ+h+wSlIjNnj0YXeUAc\nl56tu/GyvR0Zv9Ewp2B0lCCodghzlENlthyQbSMpbDDjki8F+PHF3AH6Mi6O0/rFXFWZKvlVeQkK\n5EsBgxlpujV1qVgeQvdRDZk+9SIaXlxVauAzffoI/bsP4Hip1sfvuQLHtUuifQNGRyk0E4j9sC2n\n4DgOA9ml3c37oTZMVCv6AT3p7vwJWD+E5cGfyZO4YVGhlB8nOzTS+vipcbJDW7pk6drBqqQaHSVc\nZX2Sm2Yud+mcO559xvIQlokw8CExyKhoHB6qHh/pCInjg/rxGxFbKRgdJeM5+AnhI2h+B6Ia1SJS\nVTznYp+EueOguIvon9CsLHeqjTBUq8yFjExQXh68TINKWiJ4ub7k8UK9HxExTSHGVgpGR8l4jcND\n/enkX7cgVCaKPpNFP8pcnvUp+kHV8cnixePt4DpCOmG7kiMkHl8qnyjEISNzCMuCm8ni5fqh6mZB\ncFIZvJ76kmtuJhc5ixbHb0QWXCmISAb4OWBf5XhV/UD3zDLWKk0T0qXeKagqk0W/6sZNIRarq1cc\nc8c9x2lLIO5NuaQcZdaPdjOlXYec53R8G2LltlNj+ejZupvS1BilqTFQJdU3SGZgc8P/356te9oa\nv9FoJXz0OWACeBwodtccY63TrEnNrB/QWyPs+nHuQDskzdMMESHjSdea+di205VFRMgMbCIz0Nrq\nrN3xG41W/rJ2q+obum6JsS5oXpK6vfGNWE1Stm07NdYbrdw6fUNEruu6JUZXUFXKQUgpCJuHdjo0\nf6pJnD4pWazZ+EZ0QyBeDFb+uh5VxS9MUy7k0Q6WOjeWj4YrBRF5iuimzAPuFJEXiMJHAqiqvnR5\nTDQWix8q+aJPyMUNFz2e01auwELzT9XoAT2eQ9YVZmvyFdwGwq4j0nC8J1Csua40mme5qaxjZNVO\nI/ziDDOnj1bcfCi5LbtI9w2uqF1GezQLH71x2awwOs5cD+KLf54RM36I5zpNt2ouZv45ZvyQ/rSL\n50ZtLVWVtOtEJaobCHm5lIvnat14gFRYf3w5BMFy+lHu++4n5l+/5yWfnH9udYzq0TBk+vQRqFkd\nFM6fwM3kcFPpFbLMaJeGTkFVjwCIyB+r6tsq3xORPwbelviDxqqgmYA76wf0LTGTt9n8xSCkL+21\nXFtIJNo2mjS+0fFuUk4/yvDtH+RuwBGPQH3GPvkoqdLP0nfHzTyzb48JyjX4hakGOWRKaWqM3KZt\ny26TsTha+Wt7SeULEXGBl3fHHKNTNFMPOiEtdHv+leLeZ25n+PYP4orHZcNXcOnQfgS477ufoJx+\nlOve8xoAcwg1aBDQ8Lci7Ez/C2N5aKYp3A38JyAnIpNzh4EScP8y2GYsgWbhoU7ceS80fxiG8zpB\n2hW8WBxWVcrxliPPkUU3tenUPJXc+8ztvPdDR3DF49Kh/fPHHfF474eO8MXdHlOe5SEkkZQ9DESZ\nwpYUtqZoFj76beC3ReS3VfXudicWkSzwVSATn+dhVX1/zZg7gA8R9WoGuFdV/7Ddcxn1OCLkPIeC\nXx3jdSU5w3cx86ccSKpooRoyXrx41zjrgycBPWmXqWJ1Oeyc55BrU/j2w7Aj88wxFy56L3D58BV1\n7186tJ+ht+/l6CMPmKjcACeVJt2/idLU6MWloghepqexwzBWJc1WCjfGT/+s4vk8qvqdBeYuAreo\nal5EUsDXReQvVfWbNeMeUtX3tGW10RK5lIvnCMUgjJrduA4Zt/36QUmEYZjoEABm/Powgq807Lmc\nqlhJLEQkcNf3WYjmaV9An3MIAlyW4BCqTw4nXxg1cbkB2U3b8HK9VZnCqd5ByxReYzRTG/9b/G8W\nuAn4HlH46KXAY8Crm02s0b60fPwyFT/WcLR5bZJyHVJdEGprt5AuaS4/pK9BXaRays0Ebj/Aa0NA\nbxQuSmL84FGue+f7eerB3zTH0AARIdXTT8rCRWuahn+Jqvp6VX09cAq4UVVvUtWXAy/jYrinKSLi\nisgTwFngS6r6rYRhPyciT4rIwyKyp8E8d4nIYyLy2MTYaCunNtYpzUTsdtxUOw5hjpH77uG6296/\n8EDDWMO0clt1pao+NfdCVZ8WkatbmVxVA+AGERkC/lxErlXVpyuGfB74lKoWReSdwMeBWxLmuZ9Y\n3L7i2utttdEGc4KsaiTILlRILgxDCn6IApkmq4y0K8z6nbGxmfBdaU/Wc6IM6AZl71sV0PvuuLmh\nfrAQgs+e/YN4KUUdl1Jtdl0NGgaUZ/KgIV6ub74TWKPjhrHStOIUnhSRPwT+JH79VuDJdk6iquMi\n8vfAG4CnK45fqBj2h8A97cxrNKcchEyVquP4Wc+hp4EgWygHVcJ0KQhwJWAwW3/B8hwHIbmHclK5\neoC0A6WwfmyqgaNKsscTEgV0z5GG88zRln6QQGpvjit7HkF/6jJCv4gjwtEXJzl+OJ843i/kmT5z\nLH4VfSOZoRG8TI7pM0ep/KYyQyN1XcIMYyVo5dbqTuAZ4Jfjx/fjY00RkZF4hYCI5IB/CvygZsyO\nipdvAp5tzWxjIVSVfKle2J31Q8pB/d3t3B15LYFGF+ek8Y2WbI2O1zqEubFJlVUb2eNrdCkdyLhx\nYpvQl3bpT7tNBc2lOgQ8YeDnduGkHVzxSaVcXM9h7/4B+gbqnaaGYeQQNIwfCqoUx89Fmb+qdceD\nYqF9uwyjwyy4UlDVWeB/xI922AF8PE52c4A/VdUviMgHgMdU9RHgl0TkTYAPjAJ3tHkOowFNBdkg\nrAsLNROOi35Yt92zk0Jzu/bM+iFDqVTL4vRi9INa0pf2JgoaIsK2Hb3kJ8erjvuF5NVDQ1FElVJ+\nnFwmtyj7DKNTNNuS+qeq+i8qCuNVsVBBPFV9kkiUrj3+vorndwNt50AYS6PdjOPVJuJ0W1BOQrzk\nVYjjCE5C3odq+xVCraqosRpotlL45fhfK4y3BmkWX09qNpNpIhwnje+20Nxs/lab5SxFUK6ldHga\nSbj4l8sBzz11DtzqP6UoYSvBfYkke2VxSPV2rpqoX5qlNH4e1ZD0wDCpnG0TNVqj2ZbUU/HTnwLS\nqnqk8rE85hmLRUToSSX0L2ggyLqOQ8JwBMgmXAw9x6FRYnTSPhoHyCbo243s8RyHVML8jeyppJx+\nlL47bkbojEMA0EJI/m/PoeUQDRRVJSyFzOgeTh6e4PSR6vCR43pkN22vLokRl3zo3Vyf45DpyeDl\nOtM4fub8KaZP/Ijy9AT+zBQzp4+SP/liR+Y21j+t7D7aC9wnIvuIWnJ+Ffiaqj7RRbuMDpD1XFKO\nQzHuTZxyHVJO44zm/kyKkh9SKMe9jOPeCE5CtrGq0ijsn7RjNARSrkvGk9btyVbbk/GijOwke+bP\nvVRBuQmzT0xQPl4g85IBnLRD8bk85cPPc6jvTq72DtaNzwxswsv2UJoaBw1J9Q6QyuWYPPrDertn\nZxlI+ZT8pW1NDf0S5an6XJ6gOENxaoxM//CS5jfWP60Ize+H+R1E/xb4v4DfBTrTqcXoKq4j9LSR\n5Zv2HNIthGfKi+ijOVdSuxv2QOf0g2YE50vMfOV81bFb37qXpx4MEjOd3XSW3Obt868zXgFxBGo2\ndIWBorOT4C2tttLs+IWG75UmLphTMBZkwb82Efl/ROQvgS8ClwP/EdjdbcMMox2WwyE04mKmcyuO\nUpsM64Skv9q2BRhrjVZu2d5CtGX0UeArwD+oarGrVhkNCVXjfshRf2NXmhe4azTeD0MK5bnM5Siu\n3ywsU8tCiWJJdKtZTicF5Tmcfo/MVf1ISij9aBr/TPNf+QOFB3nYvxPnyMGmvRZKQSaxV7bruTjZ\nATwNcMMZ0BD1eubDSaFfojw9iYYhqZ5+3AZbV9MDmyhPjSW/1z9MWC5RnplEQyXV0zc/T6PjmazL\nlm05HEcYPTfLdL5BOnlMWI7tVI3tzDYdb6w+Wgkf3SgiA8DNRAlo94vIWVV9TdetM6ooBWFVQlrB\nj3bp9KaSE7eKfsB0RSnTufEaKuWK65IfKg4wkKFlxyAiDTOUsy4UasIjaXfhjON26ZZ+kLmqj/6f\n2R59GEfoedUmZp+aJP+lsw1/ZvzgUW59520Nw0hzhOqyZd8lnD98BFVFQ8X1HIa3b4GwzNjhH4FC\nqCGO47Bp11by0w6zo6fiRYBSnDhPum+Y7Obtdf/vXjqL1zuIPz1RddxJpVGEqRPPV8xzjnT/ME46\ny+yFU3XHL7n2Mi6/eggQRGDPpf2cOjbNi89Vzz1HcWosnif65Yrm2VQVPjNWPws6BRG5FvgnwE8Q\nVUs9Bnyty3YZNTTKUC4FStrVuh4JqlrlECrHJxESJYz1tHgzr6oNM5Rd12XQk3iFovMlrTtZQnku\nXNRphyBph/6f2Y5UbsVyhMy1AxQPTVE+2jjreOS+e6JKqg99oOk5imEPw/uuwA2no3yGVC9+4DJ6\n+BBhRbZ5GISMnjhbdQyIE93GSPUN4GXrdyz1bt2NXxhmduxcvCV1E6mePqaO/bB6O2zcKrNui2x8\nfPMmwa1a3Qk79vRy/myBqYlS1Y+EgV/lEC7OM0qqdwAv29P0OzFWD62Ej36HaMfRh4Fvq2rz9aPR\nFcqhNqwpVPTDutDMooRgP2xYFynJnkaUYkE553RnL0I39YPUpT1o/F1XIikhc01/U6cwjyqnj4w3\nDSP5oYvPQLQa8SHrzSQ6zTqHUHGOcn4i0SkAeLle+iq2uJby44njmmVYn/zRKbbsrBamHRFGtuXq\nnII/kyex6pUq5ekJcwpriFbCR5a8tkpodBneSD1MuqEftITSkoY7v1p48DcXdAzV8wt1nqjjNLqt\naDK8lWPzmMi9HuiO8md0nGbx+CQBdzHx+4znEITKTDlguuRTjsM/ALPlgLFCmdFCmXzRx2lSxiHT\nBUG5Gwlpied5cTraMlqLrxSfmWppjpH77uFQsGDNyCqKYQZNWH0l2gIgQqqvscPp7Q3pz0zRl5pg\nsD/AyzbJsG7Arst31B0LQ+Xs6Zm6415P4x7Nzew0Vh/mFNYIIlE10FoaCbgiQiYh81eISljX4gCi\nykTRZ9YPKQbKVCkgXwoYL5SZ8S9WRS2FymRZ6UmYP+NK2y0xF6KbCWm1aEmZfOQUWg4JyyEahGg5\npPDEOOXj7VUxDf2gLtO54XnVoWdTfQ5BOpclu2VXdPGeu4CLkI4T45IY6i1w+tlDHD90lBM/PM6J\n7z9HrzdBdvPOunkyA5vJ9NeHoFzP5fRpnyAI44cSBCEnjuTJT9ZHkB3XI7claf4teFbkb03RehaR\nseKkXYehrFRsMW3ck1hVKSaIylGmsktWqNiS6pB2hbGEYkPNtIOywlDWa8mexbIS+Qel56e58JEX\nyVzRh6QdSj+aJrhQWvgHK3h3/kHOvfP9PPXQb7Y0Pgx88ufqM5GLM7P09nv077mC8vRk3JSnHzed\nSZwnnVaOPXu4SosI/IBzx86x65oDjOUOUJ6ZBFW8XD+eB+NHztfNE/gBYyfO8O2JMlu25nDcaEtq\nYaZxwat03xBetrdq/kZ2GquXZlVSP0+zNBvVN3XFIqMpjghZb2EBt9nFvOiH9Gc8+jMXlwylRoLm\nAudo1Z7FsJIJaToTMPtE8tbLVjlQeJCnoCVtwS/kk0P+qpTyE/SM9JEZWLgvdG+2zGhCSCjwA/yZ\nSRxvmMzAxazpcPpMw7kK46Nk00OcOj694HnncLxU1fzG2qPZSuG/LpsVxrKyFoTpFROUO8j4waMc\n6ruTK/WBBR1DtPMoWQhuZyuvNhGsxXHqymvQZGXXTjKjsX5o6BRU9SvLaYhRjR/qfOG4dEXhuEbH\na0k50nCvSZIQ3Gx8I9Ju6/a0Sq1+4G3NkH3pIGSE0qE8pR9NNzUyc3U/Pa8ahpRD8dkpZr4W1QLy\ntmXIXlf4OmvLAAAgAElEQVQxz/PN735nR5TzwzMEGrKp2EPfUQdBmHaLnE1PEYqyqdzDULkHQarH\nz/bQdywafzGM1Dx3oVmp7VT/MIXRs5QmLwCK19NPdvNOXNelbyDFtp29uK5w7kyByXFNbgbkCJLp\nx88XKE+NoRqS6h0klR0CziV/l0ObG37VfnGG8tT4/Dxeri/6/ZydoZyvP94p/Nn4vCScN+G40T6t\nJK8dAH4buAaYz1lX1abreRHJEuU3ZOLzPDxXXK9iTAY4CLwcuAD8S1U93N5HWH/MlgNmansTO0LK\ngYKvdceTWlHOCdN1PZpdqetyNjfeERIrnzZyFq7AZPFijLmZPa1Q6xCyNw7S97oRcEBch8wV/ZSP\nzTD56ZOJBvW/eTuZK/vnP4/76k1krx9k5h8u0PcTI+AK4kg0z9F4ngTOX17ixfFThOdDUDjnuQxf\nPkDuGJzIjRMS9QQdTU8zUM4yvH2IIxNnqsZvunyQS58fQOZv25vnLojj0rN1DzNnj1UdzwyOMHPm\nOBpc1DT86Uny05Nc+U9uYv8Vw9H/nSNs2Zpj7MIsPyqNcO5IdViop7+Hs8fOU5o4P+80ytOTeLk+\nBndsZ+LU6arxvcODqJfcg2F27BzFiXM180T6QbFu/n56tu7uyAV6duxs3fypngHES1OarD+eG9ll\njmERtLI+fAD4A6L6R68nuoj/SQs/VwRuUdXrgRuAN4jIq2rGvAMYU9XLidp9/pdWDV+vhKpVDmEO\nP9Qqh1B5vFGWcsp1GM569KZcelIOg5nGFUr9MGxYCrvR3WIjexaTOHfvM7czfPsHccXjsuErkJxD\n3+tHkJSDxE7MSTuk9/SQPlC//dHZlCJzZT9SUQtKRHB6XPpu2RrNE4dKnLRDam8P6cvrd934PcqL\nY6cioTb+GIEfMDYxwbGeMULR+fBMKMpkapbD46frxo9OTJDfE/0/jtx3D4f8OwmD5ruRUj39DOy9\nktyWneQ2bad/9wGcTLbKIVQydeI4ruvgxJ/L9RxyObhwvP7Ov5AvUBo/R23GsV/IUw4zDFxyBQNb\nt9A/spmhSy7DG0queRn65SqHMD/PzCTFxPmnCGZb1yQaEZZLVQ5hbv7y9ESVo5s/PjNJMFu/ddZY\nmFacQk5V/xaQuMHObwA/u9APacRco9pU/Ki9WrwZ+Hj8/GHgJ2WDu/akJvYL0UwkFhEynkPWc3Gb\nxI8bOZbF0K5onSQop/b2oAk2Sdohc0W9U8jdkNy1TCQ5xu6knflVRSXTW4PE3IAgwVFD5BiSCtwF\nfsBo9uJF6e63fINDwTsS56iy13FJ9w2RHtiE46UojdfvDJrjzLH6i//o6bG2M6P9mUnESSG923D6\ntqNO4yJ2DXtPN0I12jW1RJqfN+F3N3YMRvu0siW1KCIO8JyIvAc4ATTIVKlGRFyixjyXA7+vqt+q\nGbKLqJYSquqLyASwGThfM89dwF0AW3eu76rdi/GInXCjK+WJGwrK5WQnpaGiCUWXtNRgvMalqms+\nYKN5nLBzmcWe1OzK0nB+tdBqprM0EXudhDCg6zjt2y9tCMri0HZmdCcEa2cR523ncxnztPKt/TLQ\nA/wSUez/bcC/aWVyVQ1U9Qai/guviIvrtY2q3q+qN6nqTYPDC2/LW8t4jrT9N92JDOKkRLfFkm2l\nSc8CGcqlIzPJf/+BMvtU/R3gzD8ml4ue+5nkeeq3nPaecEj6H3BcByfpuNYWjbs4fvNYjtSeHP1v\n3E6Y9/mVvY/yw+AOwiDEL0wzc/YY06ePUMqPz682BoczXHntMNdcv5mR7Tmym+uziufYf90ldcc2\n796S+L05rpN89yBCur/1jONUTwNBPJ4rcf6+YfxCnumzx5g+fZRSfiJxddX8vP2LOK9lUi+GVmof\nfRsgXi38kqq2lutfPce4iPw98Abg6Yq3TgB7gOMi4gGDRILzhkVE6M+4TBWDqj+BnOcQJiSkCXQk\nYcxxHHpT9ZVVPYGedLI9KVcSj3sL3Bm2lKEcKBN/doLBn981f+crrjD9tQv4p2brx5dCis/lycR6\ng4hEfZRnAorP5cldP1h1PJjyE3skOIFwtbOLZ1MnAEU1Wm1cOrSD/r4+nj75o+g40WrjwIF99DlZ\nvvfcD6uOX3HtfjaVB+l5xSZIRTpH6pIe/r0c44sPTDJ9ZmY+Du7PTlOaGufqH7+OPfv6cdxo/NCm\nDNt29vLtiQuUalpsSipD/9bt8xfXuc/l+7D50v1cOPxCNDC2f2TfHvLTwvSZoxWzKNlN23HTrfc8\nEMeld9teps9UCuLxPKl04vFSfozS5GjF581Tzo/Ts21vy0Jw4/PuwEmlmEk4bolzi6OV3Uc3EYnN\n/fHrCeAXVPXxBX5uBCjHDiFH1IuhVkh+hGjV8Q/ArcDfabu3EOsQz4kyl8uhzjfHARhPyDhWIj0g\n4y3dMWQ8l5QjzMaN6TMVF/hae5z4j7nR8Ua0k5Dmn5zlwr0vkN7XE2UWH5lBZ+rLhwNIr0tmf2/V\nRUZEkJRD7rrBuuNOn0f6ij5Kh+pj1T2nHV7m7mZ6V0joKL1nPNKzLpveuZcd7OHc8QsEfsCWXZtJ\nuR7iwtZX7Uw4LkjFCsxJO7hTx0gxWyeMBsUCGWcW17uojbieQ/9gml1XX8q5E5uZHTuNhiGZoRF6\nhwYY2d5T97myWZfeoQFKu68g7ZUQVUqaYTZ08HIwsPdK/EIeVcXL9eK47Rc18HJ9DeepPU4Yxj0c\nagTo2Wn8Qj5eAXT+vIv5XEZEK9/cHwH/TlW/BiAiryFyEi9d4Od2AB+PdQUH+FNV/YKIfAB4TFUf\nAT4K/LGIPA+MArct8nOsO0SkqkdCM/G2FIRkWuxjvBCO4yT2VKi1Z6HjSczpB21lKAca5SYsQDoW\npqXmN9pJO4mCtZN2yBxIdgoQrRj6j17UBFLX9KABuBmX7fu2Vo3VIMT1ko/XBvgvnBrDc5S6/04N\nOX3kLDv2b6s67HkOm0dyjJ6bpXfb3vnjQ5vmOrhVz+96DptHsoyen6Xo168AxHFI9Q4kfuZ2aDRP\n7fHSdH3pDiAWuKfacgrtnNdYPK04hWDOIQCo6tdFpHEBlIvjngRelnD8fRXPZ4Gfb9HWDU2zy+5q\n36+1HAXttJi8gkiqPDp3PKxtD9d0/pCkmLaGmrz5JUhQtwGvSUmQVMJ24TBU/IRmSQ13Q4VKucF7\nK4Lj0lAg7lK/DWNptOIUviIi9wGfIvqf/ZfAl0XkRgBV/U4X7TNivCYZx60IuyvFclU4LR2eQX3l\nzOmzHP7+MYKyz47929lz2U4ccRC35gIUKIXvjTOWmuFMepJQQjaVe9la7MdJ2H9ROjwTtaerJdDE\nFQqhEgYh546NcuTZyJ6dl+1g577tDR3Vjst38Ow//pCjzx5Hw5CRPVt4yc3XcPrkNJtGsuzY3Yfr\nCmdPz3D2VHJmt6py5mTj/fmbR7Js392H60TznDk53bDPTidI9fSTWFtWhFTfIOXpSYqx3pDqGyTd\nP4S0uWtI5/IkJqMucun+IVJ9Q5a4tkhkoRB+LBA3QlX1ls6a1Jwrrr1ef+/hLy7nKVcNfqhMFf2q\na0GP55BtsVvacrPcBe2Ob57gtIwTxLEZx3XocTK8+qdvIr0tN3+RUFUIlO/8wT/GDiH6Rh0VckGK\na6Z2Ju408rZlIuHbE9BI+J764llyrxzC25ypml8D5YmDj3GGifm7esd16HUypPLCaHqmaiEhXgrH\ndQiK1eK3OA4vff0r2HlJP27s/IMgZHqqzI9+OM61N2yJ8io02oH53PfHOZfQ7wBg/5WDbN/Ze3Ee\nPyQ/VebJx891tT+OPzvDzJmjFTuOlNzmnVGpjPz4Rb1BBDeTo3f7vrYu6DPnTlCenljyPOud133w\n04+r6k0LjWtl99HrO2OSsVQ8RxjKevhhtMvFcxYWdleK5XYIRfE5GYyiFV9HGITMBLOcmxpn9/aL\nvQdEhJnCLGeyk4QVN0WhKAW3zGhqmi3l+lQc/0yRC7//AqndOSTlUD4+g7slU+UQ5uefKXA6GK+e\nPwjJ+wVIU583EfgECRniGoZMnDnJnsuunj/mug49fSmyGY9vfvUUg0MZHFeYGCsSNkhCzOZctu+K\nVhrz83gOvf0pNo9kuXA2YUdXh/CyPfTvvZJgdjoSgrO9hH6ZwoWTiYK7PzPVsj4QlIrVDqFynjaF\nbCNiwXWaiGwTkY+KyF/Gr68RkYVTM42uIBLVLkq7zqp1CH133Mx7P3SEy4evWLaS11PebGJ+QYhy\n5sjZuuNjZ8bmy2dUjRdlPNWkPIJC+ViB0gvTaEnJXpt88Ro7PZ44v0qDm/ImK/YzR+ozlz3PYXhL\nFhQmxoqMnZ9t6BAgyn9IOofnOWza3PqW1MUiIni5PlI9/YjjNC59EZfeaBW/2TwzbWZfG0BryWsf\nA/4a2Bm//iHwK90yyFi7LFfLzCQ8dRqGQDI99fvVvXQqebBCSlsPx4X55D0XXqbBIlyTNw00i+Km\nMvW2hqFSKrUulJfLYeI5onmWX5iONJ7kmxppYzupOI3mkXodyWiJVr79Lar6pyJyN8yXo2j9t9HY\nELQqKGeu7id74xBOWij+IE/h8bGGJSraYdDP4eIQ1jQMcFTYe+VuVLUqxLNl1yYkIVwDsLXYOOSQ\nuaaf3I1DSEqYfXaKme+O0fOazXXzj+zejBPEydSV2gHgqENAWB9C0uSdZAduvKxuflXlzInWC82N\nnZ+NtI6keU4uvWBdu3i5/uRNSW1mIqd6+ikkzgPp/vr2psbCtLJSmBaRzcRfe1zpdGktqYx1RW2F\n00b0/uQIfW/YRnp3Dm9rltyrNzH0tr2RcLtEBOHG664h15fF9Vy8lIvruVz3iqsIvzIFIfMXRVVl\n9vEJUr31d+DiQMpLvlfq+2db6f/n20jtiuzv/fHNDP+rvUx+7uR85vD8/I9NcNX4dtKhG5XCCAVH\nhcumRxjwa1YuCtlcmkndm7DYcThxwqc4G+D7IX45xPdDDj01ymw7W2oVnnz8PMVi9Tw/aHOeTiGO\nQ9/2fdGqQJz5R8/WPTipdGfm8RqsBo2mtLJS+A9EmceXicj/AkaIso+NDUw5/SgA9333Ey0Jyk6/\nR+6GQaRi+6yTcmAgRfaafmafXFpFS2fAY9Mrt3PLj+9g8sIUfjlgaGQACYX8qbOc/6/Pkbm6H8k6\nzD4zxfSmMoVifZkLVTi5b4a9P6wWmp2hFNlrB5DURfsl5eAMphDP4fyHniNzTT+SieanFNJDmhsm\n9zDjlghE6fMzlByfH6XOV68SBMpln345Xh8IEWXi7Cjf/npIX38KxxWmJkqL2kY6ky/z7a+dpm8g\nheMsfp5O4WZy9O+5gqA0CxriZnJtb0etn0dxM9lFzWNEtLL76Dsi8hPAlUS/yodUtdx1y4xVy9zO\nIgHuprX8g9SubMOM49T+3iU7hdSuHBoojucwuKVa/E1f2kvxqUmKz14s2zU2UGjUcIyx6Un21hQC\nTu3KoqHWXbSdtBPN/8wUxe/XlwUThN7g4spgyptFlKpdUjBX2jphxRQLr+m+IfJTnfmzy0+unj9f\nEcHL5FbNPEYTpyAiPwYcU9XTsY7wcuDngCMi8huq2iB/3VjPLHaraTjdIOM4UMIOXKTC6WTBt9H8\n6bBxaCGdShB2G9nvhwRt2J8KG4ifGvdE1vrziGthEGP5aLZSuA/4KQAReS3wO8D/SdRF7X4shLTh\naNj7oAXKxwpoIUQrOqABECqz321fospeP0ju5ZHgWzyUZ+Zbo+hs8vyF702QvWHwokD8gzwj/3ie\nF3Mk7ljaU9pcb/+RGbSUML9C4XsTnE5PcDY7RYiyqdTDzuIQXsIupnlBXIOqhYGD4ASK79SopiJk\nTDBtiqpSnLwQ955W0r2DZIa2xDuTjHZpFnhzK1YD/xK4X1U/rar/L1HTHGOD0KmtpuMPHiM4X0TL\nIWExICwETH7uFMFYeyuFvp/ZRu8tI3gjGdyhNLmXDzH89r2MP3yc4ELp4vwzAZOfPUXPj2+i7/UX\nx/fcNMSmt+3jxhuvreuydvlL99N3NuFiojD+ydr5fSY/e5Lnyic51jNGwS1TdH1OZyd5uv8kYUJd\nDEG4ZmoH2TCFo5H47IUOB/JbecnUTlL926Jy3Ujct3lvW8LrRmTmzFGKY2cJyyXUL1OcvED+5Iuo\nrqIaUGuIZisFV0Q8VfWBnyTufNbCzxnriE7WLgonfMYeOIozlEJSQnC+1HZ5BWcoRfaq/mrB13OQ\nHo/0rh7G/uhI1fzuUIrslfXjnV6PHTfsZOfLd3Hu+HnKxTLbL9kGIUyPn2P2e/Wrl3C8zNgfHcEd\nToEXzV+QEqMDM6hc/CAqUHYCzqen2Vqq396aDVNcP7mbWadMICE9QXo+8e7GoylO3v6bHHr0w4hk\nGOhpqcnhhiUoFqIEtpqM5tAvUZ6eIt2X3KbVaEyzlcKniIrhfQ4oAHOlsy/HtqRuCFrdatou4XiZ\n4Fz7DgEgtSObWFDOSTukLumpm99rMn7OgJHdW9h52Q4cz4nm2dtcsAzGLs6f94pIUlKYKJNeYim4\nebJhit4gU5eJvfOTH+Pa2/4z4SL6dW80/GIh+fdItXHWtNGUhnf8qvpbIvK3RH0RvljR/MYh0haM\ndcyieh8sA+GUT9IuHfVDgvFS6+ODMLnKqB8SjLcezkqHXuKmIVHINBGzW0KVky+MsnP/+m5BuxQc\n14sy/mr31oognoXdFkPTMJCqfjPh2A9bmVhE9gAHgW1Ef373q+r/rBnzOuBzwIvxoc+o6gdamd/o\nDknhIkU5lZngTGaKUEKGyj3sKQyT1ga/PgK5Hxsm97JYCH4+z8zXLjTcwdOWfccLhNM+kkrVCb6z\nT06Qe2V8Xi867/RXL+DnSxx55gRHf3CcwA/Ytm8bV16/nxQeTh/VNYpCEkNHjRjws6RCl6Lj12Qu\nS8PM6NBRzl1W5PTMKGEQsmVgmB3He/BmKjKi77uH6975fp566Dc5fWSc7Ze0nuWrGlIcP09pKiol\nneodIDO8dV12I/PiWkpa17Wovd7TxkW6+VviA78a5zn0A4+LyJdU9fs1476mqm/soh1GizTSD57v\nOcd4ema+xPT5dJ6JVIGXTu5K3GHT/79tJ315X5ScBmSvHSS9v4+xPzyMdqDOTjAWaQVzqCphOaTv\nlhFS+3prztvLtz71LcadAmEYnfv4oROcOXSaG8qXMPymXXjbM6AQFgKmvnCacHLBHlLzCMLVUzt4\nvvcs014p6pkdOlw2M0KmgdN8cf8Eo2MTcW4CnLpwjtHhNNeWd+CWLzqGA4UHedi/k6vdg219PzNn\njlXF2UtTY5QLefp3XY4s0D97rSEi9O64lJmzxwjLUTKi43rkRnavSye4HHTtW1PVU8Cp+PmUiDwL\n7AJqnYKxCmiUfzDrlBlLVwupCASEnEtPsaNYfTfmDqfIXN5XLey6gmQcsi8doPDY+JLsdDelSO/t\nqVoliAhOyiG9v7fqrl9cYbpQYFxmquLzqoqvAWdKY8gnQqTXRTyHcGJx+RIZ9XhJficl8QlFyYRe\nYsVWgNnNyuj4RYcAUfe2cslnYo/PphcuOrvxg0e59Z238dSDQcthpEThlag0d3l6Yl3WA3JTafp3\nXUbol0EV8VLWR2EJLMttg4jsI2rN+a2Et18tIt8Tkb8UkZcshz1GNc0S0qbdYuLlLRRlyqsvE+Ft\nyzQUdr3dS8849bYlC8eVTqiSyYl83bZTiO13I/t1Oli0Q6gkrR7ZMNXQIQAUBoLEC1bgB0x59T0N\nRu67h+tue3/LNgSlBn0RVPFnm5QEXwc4XgonlTaHsES6vr4SkT7g08CvqGptLYPvAJeoal5Efgb4\nLHAgYY67iLfEbt25u8sWbywWSkhrJJaKQjaofy+Y8BNLfaofEozWC8HtEkyWk+cPotLQte/kerKJ\nbTRFIbdUIXgRpGeTnZfjOGS1iT2qLWkLURG45LKhlu9gtEJXVwoikiJyCJ9Q1c/Uvq+qk6qaj5//\nBZASkS0J4+5X1ZtU9abBYduJ0QlaTUjrDdLx3W81grAtYQ++f2qWcKyE1pSl1pBFZS7XzX9ilnCy\nXNHaMSaEcKKM1jSaGdoyGO0Qqt2c0kQI7iY9p4RsNl3n18QVNp2s7/sAF1cLoR9w+kjz8Jub7U2O\npYusy9CR0Xm65hQkWsN9FHhWVf97gzHb43GIyCtiey50yyYjop2ENEH4sdfewMjuqBewOELvYA+v\nfMPLyaaTL2LjDx2ndHgaDRT1Q/wLJSYeOh5vD1060lMtbqsqeMLk509Xn/d8kcmHTnD1xHYG/Ryi\nF1c4V09tb7x7qosIwpWj2xgeHpr/Pvv6e3gJu0lNNw57jNx3D4eCOxeeX4TeHftws71EKwbBSWXo\n27HPhFejJbr5W3Iz8DbgKRF5Ij72n4C9AKr6EaL6Se8WEZ8oQe42rbsFNDpJuwXt3E0p+q4Y5BUv\neTl+2ScMQtLZNGE5ZOaGkMI3x+p+Rgshk58+iaQFXEELnSs3kD7Qi5Nz63oiqyq9N29i8uH686Zw\nuSq/nYCQULStzmrdIDUtHPjRMGF6iNAB70LrMfC51UKzMJLjpejbsQ8NA1TVnIHRFt3cffR1GvXb\nuzjmXuDebtlgVLOYCqfe1sx8yWgv5UEc9nZSDqkdOQrUO4U5oo5qnfXxc1nLtYgI7rZs0/O6OLir\n6JbDKUlbS/V35x/kXJy70AriuM3/AA0jAbuF2CAstsJpMFpO3M2hfoh/rn73UbfxTyefU1UJOyBk\nr3YOFB7kKWg7oc0wWsWcwjpAVZkpBxRjkdVzhN6Ui+vIkgva+WeL+OeKeNsy813TVBUNlNmnJ+n7\np1vJXjsAnlA+OkP+S2cJRrvXxKUYn5MU885qLuI49XcNuuasMiQt9L5+hOxLBsARSkei7y1sobzG\n+MGjHOq7kyv1AXMMRldYX+mNG5Sp0kWHAOCHymTRp+h9qSMVTif+9ATFQ3nUVzRU/NNFJj55nP43\nbCN73QCSjnoMpPb2MPS2vXVCcKfxz9XvxddSiOY7I2R3m8F/sXu+tae4QnpfD8Nv24tkWvtzfHf+\nwWg3Ul1pB8NYOrZSWOMEoeInJHOVwxLug+13SEtCSyFTXzjN1KOnwREIFHckTWpntjpz2YkE3tz1\ng8z8Q3ca83lbM6RGsvVCsyNkrx9MFL5XE96OLO5IpqpXtTiCekL2unYzvlvLXTCMdrCVwhonaLBZ\ny3PSlPde2dkKpwrMhag2pxtmFnvbk7eqdgJ3c7o+R4FI+Pa2Zrt23k7hbk5OIHPSDu621r+3kfvu\n4ZB/J2GwcO6CYbSDOYU1jtsgpd8pl9l2tnsxdv98KbF8hJbDhmJwJwgulBKF77Ac4p9tUOKhw4SO\nErazjckB3Mjm4EIpuWxIKSQ40973dvdbvsGh4B1t/YxhLISFj9Y4riN4juCHPhDH8sMQ8cvs+tZX\nunbe4HyJ8okCqV25+RCShpEAXWij9HS7+GeLlE/PRqEr7+J58bWtkteLIcgoR/dOcW50FFVlcLCf\nfZObyTbIM5CsQ98/30bmQB9IlO099Vdn6oX7Ofufqq0C0wIazq8WLIxkdAJbKawDPvGjd9D3d59F\nZmcgDBh64RA3feR3yOQXcZFpg4lPn2T2yQnCUhhV+jwyw/gfH0Vnlt43oel5Hz7B7JOTkbgcKKW5\n83YwSa4WRTm0/XzkEEIFhYnxKZ5xj+P3JK8ahm7bTeZAb1Ql1hG8HVmG3rqHyUdOMfv0JFqO7X9x\nhrE/PooW27N//OBR3p1/MA4jmehsdAZbKaxxLiak/QGv+Ju/Xt6T+0r+b86R/5tl3gpaVvJfOkv+\nS2eX7ZSF7crMTKFORwnDkNFdJbY+V60HeLuyOMPp6lLesRCfubqf/F+fJf/XnbH/1rfu5amHVlFW\nnrGmMaewhllsQprRPqVcCPn642EQMkMRqHYK7nCapKxqSTl4I50V4uea8fDCAziua2EkY0lY+GgN\n0mqFU6NzZPNufR9gwHUd+sL6XU/BuWJiXwUthZRPdlYQrwwjGcZSMaewxlhqhrKxOLLnhMHBfpzK\nfs4Cbspj+Fh9HwT/TJHyyQJaru6wpqWQ4tPd03rCoLt6jrH+Maewhrj3mdsZvv2DuOKZQ1gBLjs8\nzO7hEVKZFK7nsm3zZq7N78QtJe8+mnj4JDOPjxPO+ISlkOIPphg7eLQjfaqTuPst3+CQfycnX7Dq\n88biMU1hjbCYCqdGZ3ECYcdzPewgrtS6kL4eKDNfOc/MV8533TaIwkh3vx0+88iynM5Yp5hTWAOY\noGy0w6FYdN65f/NKm2KsQSx8tIpZK4Ky9Lo4Q8vf79ioZ/zg0YowUnfqTxnrm26249wjIn8vIt8X\nkWdE5JcTxoiIfFhEnheRJ0Xkxm7Zs9ZYC4Ky0+cyePtuNr/rUjb9wiVs+neXNmyCYywf4wePcutb\n9660GcYapZsrBR/4VVW9BngV8Isick3NmJ8GDsSPu4A/6KI9a4a1IigP3raH1M4c4jlIysHtTzH4\ncztt1bAKOFB4EFRNdDbapmtOQVVPqep34udTwLPArpphbwYOasQ3gSER2dEtm9YCa0VQ9nZlcfo8\nxK3ZeSNC7mWDK2OUMc/4waO85c1Xcih4h1VRNdpiWYRmEdkHvAz4Vs1bu4BjFa+Px8dO1fz8XUQr\nCbbu3N0tM1ectSQoO30eiRm7nuAO2kphNTB+8Cj0hVYXyWiLrgvNItIHfBr4FVVdVNaOqt6vqjep\n6k2Dw5s6a+AqYK0IypX4p2frVwlEGbulIzMrYJGRxN1v+UYcRjLR2WiNrjoFEUkROYRPqOpnEoac\nAPZUvN4dH9swzOkHq1lQTiKc8Jl9ZpKwIhFL/ZBwJmC2ixm7RnvMhZGSVnWGkUQ3dx8J8FHgWVX9\n7w2GPQK8Pd6F9CpgQlVPNRi77qjUD9aSQ5gj/1dnmf7bs/hniwTjJQqPjzP28SNQtgvQqkOx1YLR\nEuPR25sAAAqBSURBVN3UFG4G3gY8JSJPxMf+E7AXQFU/AvwF8DPA88AMsGEqeq0l/aAZs09OMvuk\nrQxWM+MHj/KWt1/JZz53iJMvjLJz//oLwRqdo2tOQVW/DomdByvHKPCL3bJhNbIW8g+M9cf4waNc\n987389RDH1hpU4xVjmU0LyPmEIwVx3IXjAUwp7BMrJWENGP9MnLfPZa7YCyIOYVlYK0kpBkbBLXc\nBaMxViW1y6wXQdlYH8yX1/6cmuhsJGIrhS6xFhPSjI3B+MGjXHfb+1faDGOVYk6hC5igbKwJLNPZ\nSMCcQocxQdlYC8yJzqia6GxUYU6hg5igbKwlxg8e5VBwp4nORhUmNHcIE5SNtcitb93LUw+a6Gxc\nxFYKS8QEZWMtMx9GsoJ5Row5hSVggrKxbrCCeUaMOYVFYoKysV6Y36Jqu5EMzCksir47bjZB2VhX\njNx3j+UuGIA5hbao1Q/MIRjrDlstbHjMKbSI6QfGeufiasFyFzYy5hRawPQDY6NwoPAgh/wN0+vK\nSKCb7Tj/SETOisjTDd5/nYhMiMgT8eN93bJlKVhCmrGRGD94lFvfupfQDyyMtEHp5krhY8AbFhjz\nNVW9IX6supZQc4Ky6QfGRsJE541N15yCqn4VWJO3GpaQZhhYXaQNykprCq8Wke+JyF+KyEsaDRKR\nu0TkMRF5bGKsu37GBGXDuLhaCP3AHMMGYyWdwneAS1T1euD3gM82Gqiq96vqTap60+Bw9+qzmKBs\nGBcZue8eDgUmOm80VswpqOqkqubj538BpERky0rZY4KyYdQzJzrbamHjsGJOQUS2i4jEz18R23Jh\nJWwxQdkwkpkPIwXBSptiLBPd3JL6KeAfgCtF5LiIvENE3iUi74qH3Ao8LSLfAz4M3Kaqy1qq0QRl\nw1iYA4UHORS8w1YLG4Su9VNQ1X+1wPv3Avd26/wLYYKyYbTG+MGj0BfOh5G2XzK00iYZXWSldx+t\nCCYoG0Z7vDv/YBxGsi5t650N5xRMUDaMpWC5C+udDdWO01pmGsbiGbnvHg713cmVPGBhpHXMhlgp\nmKBsGJ3h3XkrmLfeWfdOYU4/MEHZMDpHGFjuwnplXTuFSv3AHIJhdIa51YLlLqxP1q1TsIQ0w+ge\nd7/lGwC2WliHrDunYPqBYSwPh/w7rQTGOmRdOQVLSDOM5WH84FHLXVinrBunYAlphrFSWO7CemJd\nOAVLSDOMlWHkvnvmRWdzDOuDNe8UTFA2jJXl7rd8g0PBO1baDKNDrFmnYIKyYawiNLTVwjphTToF\nE5QNY/UwJzpHYSQTndc6a84pnC28aIKyYaxCbn3rXmBZW6IYXWDNOYXtZ0omKBvGKuRAIVotnHzh\ngoWR1jDd7Lz2RyJyVkSebvC+iMiHReR5EXlSRG5sZd6MmzGHYBirkMowkrF26eZK4WPAG5q8/9PA\ngfhxF/AHXbTFMIxlxOoirV265hRU9avAaJMhbwYOasQ3gSER2dEtewzDWB7ufss35sNIxtpjJZvs\n7AKOVbw+Hh87VTtQRO4iWk0A5Lf/1V8e6r55S2YLcH6ljVhG7POub1r/vH8F/P/t3X/InWUdx/H3\nhz0T2/wJi9A2nYgpq2BuYy5zalpCNcxQWUGk4B9CIVhoGFIKkpH5R1GU6VoJ1SzHJiKik/KxH4a6\nuTnd/FHmtCmlEjzlj2bJpz/u6zmc1bPpY+c+l7vP5wVjz7nOfd/7nB2e8z33dc79vfhBm1mGoYvP\n75FvZqN9YuU129cD19fOMR2SNtpeUjvHsOTxdlse7+io+e2jZ4F5fbfnlrGIiKikZlG4Ffhs+RbS\nMmDC9v9MHUVExPC0Nn0kaQ1wKjBH0k7gCmAmgO3rgNuBjwF/BF4BuvY9tn1qumsA8ni7LY93RMjO\nFYgREdHY565ojoiI9qQoRERET4pCCyTNkLRZ0m21swyDpB2SHpa0RdLG2nnaJukQSWslPSbpUUkf\nqJ2pLZKOLc/r5J+/S7q4dq42SfqCpG2SHpG0RtL+tTMNUz5TaIGkLwJLgINsr6idp22SdgBLbHft\nYp8pSboR+I3tVZL2A2bZ7nwHOEkzaL42foLtp2vnaYOkdwO/BRbYflXSL4Dbbf+4brLhyZnCgEma\nC3wcWFU7SwyepIOBk4EfAth+bRQKQnE68GRXC0KfMeAdksaAWcBzlfMMVYrC4H0L+BIwSquNGNgg\naVNpSdJlRwEvAD8qU4SrJM2uHWpIPgWsqR2iTbafBa4FnqFpuTNhe0PdVMOVojBAklYAz9veVDvL\nkJ1kexFN59vPSzq5dqAWjQGLgO/bPh54GbisbqT2lWmyM4Gba2dpk6RDaZp1HgUcDsyW9Jm6qYYr\nRWGwPgicWebYbwJOk/STupHaV95dYft5YD2wtG6iVu0Edtq+r9xeS1Mkuu6jwIO2/1o7SMs+DDxl\n+wXb/wLWASdWzjRUKQoDZPvLtufank9zqv0r251+lyFptqQDJ38GzgCmXFipC2z/BfizpGPL0OnA\n9oqRhuXTdHzqqHgGWCZpliTRPL+PVs40VPtEl9R4W3sXsL75/WEM+JntO+pGat1FwE/LlMqf6F6L\nlt2UYv8R4MLaWdpm+z5Ja4EHgX8Dmxmxlhf5SmpERPRk+igiInpSFCIioidFISIielIUIiKiJ0Uh\nIiJ6UhSicyRdXrpcbi2dPU8Y8PFPnaoD7p7GB/DvnSVpQd/tcUkjuah8tC/XKUSnlDbWK4BFtndJ\nmgPsVznW/+ss4DZG4yK5qCxnCtE1hwEv2t4FYPtF288BSFos6Z7SuO9OSYeV8XFJ3y5nFY9IWlrG\nl0r6fWl8d2/fVcxvqFzpvVrS/WX/T5Tx8yWtk3SHpD9IuqZvnwskPVH2uUHSdyWdSNNz6Jsl39Fl\n83PLdk9IWj6I/7gISFGI7tkAzCsvlt+TdAqApJnAd4BzbC8GVgNf69tvlu2FwOfKfQCPActL47uv\nAldPI8flNG1OlgIfonlRn+ymuhBYCbwfWClpnqTDga8Ay2h6aB0HYPte4FbgUtsLbT9ZjjFWjn0x\ncMU0ckXsVaaPolNsvyRpMbCc5sX455IuAzYC7wPuKi05ZtC0Rp60puz/a0kHSToEOBC4UdIxNO3B\nZ04jyhk0zREvKbf3B44oP//S9gSApO3AkcAc4B7bfyvjNwPv2cvx15W/NwHzp5ErYq9SFKJzbL8O\njAPjkh4GzqN58dxme09LZ/53vxcDVwF32/6kpPnlmG+WgLNtP77bYPOh966+odd5a7+Hk8d4q/tH\nTCnTR9EpZU3hY/qGFgJPA48D75xcT1nSTEnv7dtuZRk/iWZhlQngYJrlJwHOn2aUO4GLSqdNJB3/\nBts/AJwi6dCy4tfZfff9g+asJaJ1KQrRNQfQTPlsl7QVWABcafs14BzgG5IeArawe5/8f0raDFwH\nXFDGrgG+Xsan+278Kprppq2StpXbe1TWpLgauB/4HbADmCh33wRcWj6wPnrqI0QMRrqkxsiTNA5c\nYntj5RwHlM9ExmgWK1pte33NTDF6cqYQ8fZxpaQtNIsUPQXcUjlPjKCcKURERE/OFCIioidFISIi\nelIUIiKiJ0UhIiJ6UhQiIqLnP2dWwukYY/Y2AAAAAElFTkSuQmCC\n", 232 | "text/plain": [ 233 | "" 234 | ] 235 | }, 236 | "metadata": {}, 237 | "output_type": "display_data" 238 | } 239 | ], 240 | "source": [ 241 | "plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)\n", 242 | "plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)\n", 243 | "plt.xlabel('Sepal length')\n", 244 | "plt.ylabel('Sepal width')\n", 245 | "plt.xlim(xx.min(), xx.max())\n", 246 | "plt.title('SVC with linear kernel')\n", 247 | "plt.show()" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "### Kernels\n", 255 | "\n", 256 | "Classes aren't always easily separable. In these cases, we build a decision function that's polynomial instead of linear. This is done using the kernel trick.\n", 257 | "\n", 258 | "![alt text](https://github.com/lesley2958/regression/blob/master/svm.png?raw=true \"Logo Title Text 1\")\n", 259 | "\n", 260 | "If you modify the previous code with the kernel parameter, you'll get different results!" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 6, 266 | "metadata": { 267 | "collapsed": true 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "svc = svm.SVC(kernel='rbf', C=1, gamma=1).fit(X, y)" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "Usually, linear kernels are the best choice if you have large number of features (>1000) because it is more likely that the data is linearly separable in high dimensional space. Also, you can RBF but do not forget to cross validate for its parameters as to avoid overfitting." 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "### Tuning\n", 286 | "\n", 287 | "Tuning parameters value for machine learning algorithms effectively improves the model performance. Let’s look at the list of parameters available with SVM.\n", 288 | "\n", 289 | "#### Gamma \n", 290 | "\n", 291 | "Notice our gamma value from earlier at 1. The higher the value of gamma the more our model will try to exact fit the as per training data set, leading to an overfit model.\n", 292 | "\n", 293 | "Let's see our visualizations with gamma values of 10 and 100: " 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 7, 299 | "metadata": { 300 | "collapsed": true 301 | }, 302 | "outputs": [], 303 | "source": [ 304 | "svc = svm.SVC(kernel='linear', C=1, gamma=10).fit(X, y)\n", 305 | "svc = svm.SVC(kernel='linear', C=1, gamma=100).fit(X, y)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "### Evaluation\n", 313 | "\n", 314 | "As with any other machine learning model, there are pros and cons. In this section, we'll review the pros and cons of this particular model.\n", 315 | "\n", 316 | "#### Pros\n", 317 | "\n", 318 | "SVMs largest stength is its effectiveness in high dimensional spaces, even when the number of dimensions is greater than the number of samples. Lastly, it uses a subset of training points in the decision function (called support vectors), so it's also memory efficient.\n", 319 | "\n", 320 | "#### Cons\n", 321 | "\n", 322 | "Now, if we have a large dataset, the required time becomes high. SVMs also perform poorly when there's noise in the dataset. Lastly, SVM doesn’t directly provide probability estimates and must be calculated using an expensive five-fold cross-validation." 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "## Linear SVM\n", 330 | "\n", 331 | "SVM's implementation is based on the Sequential Minimal Optimization (SMO), which works as followS: \n", 332 | "\n", 333 | "1. recursively select a pair of elements ($\\alpha_1, \\alpha_2$).\n", 334 | "2. perform 1D optimization of the objective function on this pair.\n", 335 | "\n", 336 | "### Challenge\n", 337 | "\n", 338 | "Implement a function `linear_kernel()` that computes the kernel matrix K = $\\Phi \\Phi^T$\n", 339 | "\n", 340 | "Get a solution for the dual problem with [alphas, beta0] = SMO(K, ... y, C), where C = 0.1.\n", 341 | "\n", 342 | "### Kernel Matrix\n", 343 | "\n", 344 | "$K_{i,j} = \\langle \\Phi_i, \\Phi_j \\rangle$, the kernel matrix is obtained with the formula K = $\\Phi \\Phi^T$. Instead, however, you can use a different function to replace the scalar product, such as the Radial Basis Function (RBF). \n", 345 | "\n", 346 | "$ RBF(\\Phi_i, \\Phi_j) = exp(-\\gamma||\\Phi_i - \\Phi_j||^2)$\n", 347 | "\n", 348 | "#### Challenge \n", 349 | "\n", 350 | "Implement a function `rbf_kernel` and compute K with it. Set $\\gamma = 1$ and C = 1. \n", 351 | "\n", 352 | "How do $\\gamma$ and C influence results?\n", 353 | "\n", 354 | "How can you automatically find the parameters $\\gamma$ and C? Implement it." 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": { 361 | "collapsed": true 362 | }, 363 | "outputs": [], 364 | "source": [] 365 | } 366 | ], 367 | "metadata": { 368 | "kernelspec": { 369 | "display_name": "Python 3", 370 | "language": "python", 371 | "name": "python3" 372 | }, 373 | "language_info": { 374 | "codemirror_mode": { 375 | "name": "ipython", 376 | "version": 3 377 | }, 378 | "file_extension": ".py", 379 | "mimetype": "text/x-python", 380 | "name": "python", 381 | "nbconvert_exporter": "python", 382 | "pygments_lexer": "ipython3", 383 | "version": "3.6.1" 384 | } 385 | }, 386 | "nbformat": 4, 387 | "nbformat_minor": 2 388 | } 389 | -------------------------------------------------------------------------------- /ml_optimization.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 2.0 Ensemble Learning\n", 8 | "\n", 9 | "Ensemble Learning allows us to combine predictions from different multiple learning algorithms, ultimately making up what we consider to be the *ensemble*. By employing this method, we can have a result with a better predictive performance compared to if we had used a single learner.\n", 10 | "\n", 11 | "While this can have positive consequences on performance, it's important to note that one drawback is that there's increased computation time and reduced interpretability. \n", 12 | "\n", 13 | "\n", 14 | "## 3.0 Bagging\n", 15 | "\n", 16 | "Bagging is a technique where reuse the same training algorithm several times on different subsets of the training data. \n", 17 | "\n", 18 | "### 3.1 Algorithm\n", 19 | "\n", 20 | "Given a training dataset D of size N, bagging will generate new training sets Di of size M by sampling with replacement from D. Some observations might be repeated in each Di. \n", 21 | "\n", 22 | "If we set M to N, then on average 63.2% of the original dataset D is represented, the rest will be duplicates.\n", 23 | "\n", 24 | "The final step is that we train the classifer C on each Ci separately. " 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## 4.0 Boosting\n", 32 | "\n", 33 | "Boosting is an optimization technique that allows us to combine multiple classifiers to improve classification accuracy. In boosting, none of the classifiers just need to be at least slightly better than chance. \n", 34 | "\n", 35 | "Boosting involves training classifiers on a subset of the training data that is most informative given the current classifiers. \n", 36 | "\n", 37 | "### 4.1 Algorithm\n", 38 | "\n", 39 | "The general boosting algorithm first involves fitting a simple model to subsample of the data. Next, we identify misclassified observations (ones that are hard to predict). we focus subsequent learners on these samples to get them right. Lastly, we combine these weak learners to form a more complex but accurate predictor.\n", 40 | "\n", 41 | "### 4.2 Boosting in R\n", 42 | "\n", 43 | "First, we load the required packages: " 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## 5.0 AdaBoosting\n", 51 | "\n", 52 | "Now, instead of resampling, we can reweight misclassified training examples:\n", 53 | "\n", 54 | "### 5.1 Benefits\n", 55 | "\n", 56 | "Aside from its easy implementation, AdaBoosting is great because it's a simple combination of multiple classifiers. These classifiers can also be different. \n", 57 | "\n", 58 | "### 5.2 Limits\n", 59 | "\n", 60 | "On the other hand, AdaBoost is sensitive to misclassified points in the training data. \n", 61 | "\n", 62 | "### 5.3 AdaBoost in R\n", 63 | "\n", 64 | "We begin by loading the required package" 65 | ] 66 | } 67 | ], 68 | "metadata": { 69 | "kernelspec": { 70 | "display_name": "Python 3", 71 | "language": "python", 72 | "name": "python3" 73 | }, 74 | "language_info": { 75 | "codemirror_mode": { 76 | "name": "ipython", 77 | "version": 3 78 | }, 79 | "file_extension": ".py", 80 | "mimetype": "text/x-python", 81 | "name": "python", 82 | "nbconvert_exporter": "python", 83 | "pygments_lexer": "ipython3", 84 | "version": "3.6.1" 85 | } 86 | }, 87 | "nbformat": 4, 88 | "nbformat_minor": 2 89 | } 90 | -------------------------------------------------------------------------------- /tree_modeling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Background\n", 8 | "\n", 9 | "Recall in data structures learning about the different types of tree structures - binary, red black, and splay trees. In tree based modeling, we work off these structures for classification prediction. \n", 10 | "\n", 11 | "Tree based machine learning is great because it's incredibly accurate and stable, as well as easy to interpret. Despite being a linear model, tree based models map non-linear relationships well. The general structure is as follows: " 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Decision Trees\n", 19 | "\n", 20 | "Decision trees are a type of supervised learning algorithm used in classification that works for both categorical and continuous input/output variables. This type of model includes structures with nodes which represent tests on attributes and the end nodes (leaves) of each branch represent class labels. Between these nodes are what we call edges, which represent a 'decision' that separates the data from the previous node based on some criteria. \n", 21 | "\n", 22 | "![](https://imgur.com/sLOjIzF.png)\n", 23 | "\n", 24 | "Looks familiar, right? " 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### Nodes\n", 32 | "\n", 33 | "As mentioned above, nodes are an important part of the structure of Decision Trees. In this section, we'll review different types of nodes.\n", 34 | "\n", 35 | "#### Root Node\n", 36 | "\n", 37 | "The root node is the node at the very top. It represents an entire population or sample because it has yet to be divided by any edges. \n", 38 | "\n", 39 | "#### Decision Node\n", 40 | "\n", 41 | "Decision Nodes are the nodes that occur between the root node and leaves of your decision tree. It's considered a decision node because it's a resulting node of an edge that then splits once again into either more decision nodes, or the leaves.\n", 42 | "\n", 43 | "#### Leaves/Terminal Nodes\n", 44 | "\n", 45 | "As mentioned before, leaves are the final nodes at the bottom of the decision tree that represent a class label in classification. They're also called terminal nodes because more nodes do not split off of them. \n", 46 | "\n", 47 | "#### Parent and Child Nodes\n", 48 | "\n", 49 | "A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node.\n", 50 | "\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### Pros & Cons\n", 58 | "\n", 59 | "#### Pros\n", 60 | "\n", 61 | "1. Easy to Understand: Decision tree output is fairly easy to understand since it doesn't require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis.\n", 62 | "\n", 63 | "2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can create new variables / features that has better power to predict target variable. You can refer article (Trick to enhance power of regression model) for one such trick. It can also be used in data exploration stage. For example, we are working on a problem where we have information available in hundreds of variables, there decision tree will help to identify most significant variable.\n", 64 | "\n", 65 | "3. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree.\n", 66 | "\n", 67 | "4. Data type is not a constraint: It can handle both numerical and categorical variables.\n", 68 | "\n", 69 | "5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure.\n", 70 | "\n", 71 | "#### Cons\n", 72 | "\n", 73 | "1. Over fitting: Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning (discussed in detailed below).\n", 74 | "\n", 75 | "2. Not fit for continuous variables: While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories." 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "## Pruning Decision Trees\n", 83 | "\n", 84 | "Decision Tree pruning is a technique that reduces the size of decision trees by removing sections (nodes) of the tree that provide little power to classify instances. This is great because it reduces the complexity of the final classifier, which results in increased predictive accuracy by reducing overfitting. \n", 85 | "\n", 86 | "Ultimately, our aim is to reduce the cross-validation error. First, we index with the smallest complexity parameter:" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Random Forests\n", 94 | "\n", 95 | "Recall the *ensemble* learning method from previous lectures. Random Forests are an ensemble learning method that can be used for both classification and regression. It works by combining individual decision trees through **bagging** -- preventing the model from overfitting. \n", 96 | "\n", 97 | "What's remarkable about random forests is that it's a reliable method for a wide range of prediction tasks without the need to tune much, unlike SVMs which often require a great deal of tuning.\n", 98 | "\n", 99 | "### Algorithm\n", 100 | "\n", 101 | "First, we create many decision trees through bagging. Once completed, we inject randomness into the decision trees by allowing the trees to grow to their maximum sizes, leaving them unpruned. \n", 102 | "\n", 103 | "We make sure that each split is based on randomly selected subset of attributes, which reduces the correlation between different trees. \n", 104 | "\n", 105 | "Now we get into the random forest by voting on categories by majority. We begin by splitting the training data into K bootstrap samples by drawing samples from training data with replacement. \n", 106 | "\n", 107 | "Next, we estimate individual trees ti to the samples and have every regression tree predict a value for the unseen data. Lastly, we estimate those predictions with the formula:\n", 108 | "\n", 109 | "![alt text](https://imgur.com/dN32Fvy.png \"Logo Title Text 1\")\n", 110 | "\n", 111 | "where ŷ is the response vector and x = [x1,...,xN]T ∈ X as the input parameters. \n", 112 | "\n", 113 | "\n", 114 | "### Advantages\n", 115 | "\n", 116 | "Random Forests allow us to learn non-linearity with a simple algorithm and good performance. It's also a fast training algorithm and resistant to overfitting.\n", 117 | "\n", 118 | "What's also phenomenal about Random Forests is that increasing the number of trees decreases the variance without increasing the bias, so the worry of the variance-bias tradeoff isn't as present. \n", 119 | "\n", 120 | "The averaging portion of the algorithm also allows the real structure of the data to reveal. Lastly, the noisy signals of individual trees cancel out. \n", 121 | "\n", 122 | "### Limitations \n", 123 | "\n", 124 | "Unfortunately, random forests have high memory consumption because of the many tree constructions. There's also little performance gain from larger training datasets. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [] 133 | } 134 | ], 135 | "metadata": { 136 | "kernelspec": { 137 | "display_name": "Python 3", 138 | "language": "python", 139 | "name": "python3" 140 | }, 141 | "language_info": { 142 | "codemirror_mode": { 143 | "name": "ipython", 144 | "version": 3 145 | }, 146 | "file_extension": ".py", 147 | "mimetype": "text/x-python", 148 | "name": "python", 149 | "nbconvert_exporter": "python", 150 | "pygments_lexer": "ipython3", 151 | "version": "3.6.5" 152 | } 153 | }, 154 | "nbformat": 4, 155 | "nbformat_minor": 2 156 | } 157 | --------------------------------------------------------------------------------