explained well by Brandon Rohrer [here]((https://channel9.msdn.com/blogs/Cloud-and-Enterprise-Premium/Data-Science-for-Rest-of-Us)
\n", 196 | "\n", 197 | "As far as algorithms for learning a model (i.e. running some training data through an algorithm), it's nice to think of them in two different ways (with the help of the [machine learning wikipedia article](https://en.wikipedia.org/wiki/Machine_learning)). \n", 198 | "\n", 199 | "The first way of thinking about ML, is by the type of information or **input** given to a system. So, given that criteria there are three classical categories:\n", 200 | "1. Supervised learning - we get the data and the labels\n", 201 | "2. Unsupervised learning - only get the data (no labels)\n", 202 | "3. Reinforcement learning - reward/penalty based information (feedback)\n", 203 | "\n", 204 | "Another way of categorizing ML approaches, is to think of the desired **output**:\n", 205 | "1. Classification\n", 206 | "2. Regression\n", 207 | "3. Clustering\n", 208 | "4. Density estimation\n", 209 | "5. Dimensionality reduction\n", 210 | "\n", 211 | "--> This second approach (by desired output) is how `sklearn` categorizes it's ML algorithms.from http://www.madlantern.com/photography/wild-iris
\n", 56 | "\n", 57 | "### Labels (species names/classes):\n", 58 | "What you might have to do before using a learner in `sklearn`:
\n", 363 | "1. Non-numerics transformed to numeric (tip: use applymap() method from `pandas`)\n", 364 | "* Fill in missing values\n", 365 | "* Standardization\n", 366 | "* Normalization\n", 367 | "* Encoding categorical features (e.g. one-hot encoding or dummy variables)\n", 368 | "\n", 369 | "Features should end up in a numpy.ndarray (hence numeric) and labels in a list.\n", 370 | "\n", 371 | "Data options:\n", 372 | "* Use pre-processed [datasets](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets) from scikit-learn\n", 373 | "* [Create your own](http://scikit-learn.org/stable/datasets/index.html#sample-generators)\n", 374 | "* Read from a file\n", 375 | "\n", 376 | "If you use your own data or \"real-world\" data you will likely have to do some data wrangling and need to leverage `pandas` for some data manipulation." 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "#### Standardization - make our data look like a standard Gaussian distribution (commonly needed for `sklearn` learners)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "> FYI: you'll commonly see the data or feature set (ML word for data without it's labels) represented as a capital X and the targets or labels (if we have them) represented as a lowercase y. This is because the data is a 2D array or list of lists and the targets are a 1D array or simple list." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": { 397 | "collapsed": false 398 | }, 399 | "outputs": [], 400 | "source": [ 401 | "# Standardization aka scaling\n", 402 | "from sklearn import preprocessing, datasets\n", 403 | "\n", 404 | "# make sure we have iris loaded\n", 405 | "iris = datasets.load_iris()\n", 406 | "\n", 407 | "X, y = iris.data, iris.target\n", 408 | "\n", 409 | "# scale it to a gaussian distribution\n", 410 | "X_scaled = preprocessing.scale(X)\n", 411 | "\n", 412 | "# how does it look now\n", 413 | "pd.DataFrame(X_scaled).head()" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": { 420 | "collapsed": false 421 | }, 422 | "outputs": [], 423 | "source": [ 424 | "# let's just confirm our standardization worked (mean is 0 w/ unit variance)\n", 425 | "pd.DataFrame(X_scaled).describe()\n", 426 | "\n", 427 | "# also could:\n", 428 | "#print(X_scaled.mean(axis = 0))\n", 429 | "#print(X_scaled.std(axis = 0))" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "> PRO TIP: To save our standardization and reapply later (say to the test set or some new data), create a transformer object like so:\n", 437 | "```python\n", 438 | "scaler = preprocessing.StandardScaler().fit(X_train)\n", 439 | "# apply to a new dataset (e.g. test set):\n", 440 | "scaler.transform(X_test)\n", 441 | "```" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "#### Normalization - scaling samples individually to have unit norm\n", 449 | "* This type of scaling is really important if doing some downstream transformations and learning (see sklearn docs [here](http://scikit-learn.org/stable/modules/preprocessing.html#normalization) for more) where similarity of pairs of samples is examined\n", 450 | "* A basic intro to normalization and the unit vector can be found [here](http://freetext.org/Introduction_to_Linear_Algebra/Basic_Vector_Operations/Normalization/)" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": { 457 | "collapsed": false 458 | }, 459 | "outputs": [], 460 | "source": [ 461 | "# Standardization aka scaling\n", 462 | "from sklearn import preprocessing, datasets\n", 463 | "\n", 464 | "# make sure we have iris loaded\n", 465 | "iris = datasets.load_iris()\n", 466 | "\n", 467 | "X, y = iris.data, iris.target\n", 468 | "\n", 469 | "# scale it to a gaussian distribution\n", 470 | "X_norm = preprocessing.normalize(X, norm='l1')\n", 471 | "\n", 472 | "# how does it look now\n", 473 | "pd.DataFrame(X_norm).tail()" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "metadata": { 480 | "collapsed": false 481 | }, 482 | "outputs": [], 483 | "source": [ 484 | "# let's just confirm our standardization worked (mean is 0 w/ unit variance)\n", 485 | "pd.DataFrame(X_norm).describe()\n", 486 | "\n", 487 | "# cumulative sum of normalized and original data:\n", 488 | "#print(pd.DataFrame(X_norm.cumsum().reshape(X.shape)).tail())\n", 489 | "#print(pd.DataFrame(X).cumsum().tail())\n", 490 | "\n", 491 | "# unit norm (convert to unit vectors) - all row sums should be 1 now\n", 492 | "X_norm.sum(axis = 1)" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "> PRO TIP: To save our normalization (like standardization above) and reapply later (say to the test set or some new data), create a transformer object like so:\n", 500 | "```python\n", 501 | "normalizer = preprocessing.Normalizer().fit(X_train)\n", 502 | "# apply to a new dataset (e.g. test set):\n", 503 | "normalizer.transform(X_test) \n", 504 | "```" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "Created by a Microsoft Employee.\n", 512 | "\t\n", 513 | "The MIT License (MIT)We can be explicit and use the `train_test_split` method in scikit-learn ( [train_test_split](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) ) as in (and as shown above for `iris` data):
\n", 478 | "\n", 479 | "```python\n", 480 | "# Create some data by hand and place 70% into a training set and the rest into a test set\n", 481 | "# Here we are using labeled features (X - feature data, y - labels) in our made-up data\n", 482 | "import numpy as np\n", 483 | "from sklearn import linear_model\n", 484 | "from sklearn.cross_validation import train_test_split\n", 485 | "X, y = np.arange(10).reshape((5, 2)), range(5)\n", 486 | "X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.70)\n", 487 | "clf = linear_model.LinearRegression()\n", 488 | "clf.fit(X_train, y_train)\n", 489 | "```\n", 490 | "\n", 491 | "OR\n", 492 | "\n", 493 | "Be more concise and\n", 494 | "\n", 495 | "```python\n", 496 | "import numpy as np\n", 497 | "from sklearn import cross_validation, linear_model\n", 498 | "X, y = np.arange(10).reshape((5, 2)), range(5)\n", 499 | "clf = linear_model.LinearRegression()\n", 500 | "score = cross_validation.cross_val_score(clf, X, y)\n", 501 | "```\n", 502 | "\n", 503 | "
There is also a `cross_val_predict` method to create estimates rather than scores and is very useful for cross-validation to evaluate models ( [cross_val_predict](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.cross_val_predict.html) )"
504 | ]
505 | },
506 | {
507 | "cell_type": "markdown",
508 | "metadata": {},
509 | "source": [
510 | "Created by a Microsoft Employee.\n",
511 | "\t\n",
512 | "The MIT License (MIT)
\n",
513 | "Copyright (c) 2016 Micheleen Harris"
514 | ]
515 | }
516 | ],
517 | "metadata": {
518 | "kernelspec": {
519 | "display_name": "Python 3",
520 | "language": "python",
521 | "name": "python3"
522 | },
523 | "language_info": {
524 | "codemirror_mode": {
525 | "name": "ipython",
526 | "version": 3
527 | },
528 | "file_extension": ".py",
529 | "mimetype": "text/x-python",
530 | "name": "python",
531 | "nbconvert_exporter": "python",
532 | "pygments_lexer": "ipython3",
533 | "version": "3.4.3"
534 | }
535 | },
536 | "nbformat": 4,
537 | "nbformat_minor": 0
538 | }
539 |
--------------------------------------------------------------------------------
/06.Model Evaluation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Evaluating Models"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {
14 | "collapsed": true
15 | },
16 | "outputs": [],
17 | "source": [
18 | "# Imports for python 2/3 compatibility\n",
19 | "\n",
20 | "from __future__ import absolute_import, division, print_function, unicode_literals\n",
21 | "\n",
22 | "# For python 2, comment these out:\n",
23 | "# from builtins import range"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "### Evaluating using metrics\n",
31 | "* Confusion matrix - visually inspect quality of a classifier's predictions (more [here](http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html)) - very useful to see if a particular class is problematic\n",
32 | "\n",
33 | "Here, we will process some data, classify it with SVM (see [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) for more info), and view the quality of the classification with a confusion matrix."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {
40 | "collapsed": false
41 | },
42 | "outputs": [],
43 | "source": [
44 | "import pandas as pd\n",
45 | "\n",
46 | "# import model algorithm and data\n",
47 | "from sklearn import svm, datasets\n",
48 | "\n",
49 | "# import splitter\n",
50 | "from sklearn.model_selection import train_test_split\n",
51 | "\n",
52 | "# import metrics\n",
53 | "from sklearn.metrics import confusion_matrix\n",
54 | "\n",
55 | "# feature data (X) and labels (y)\n",
56 | "iris = datasets.load_iris()\n",
57 | "X, y = iris.data, iris.target\n",
58 | "\n",
59 | "# split data into training and test sets\n",
60 | "X_train, X_test, y_train, y_test = \\\n",
61 | " train_test_split(X, y, test_size = 0.30, random_state = 42)"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": null,
67 | "metadata": {
68 | "collapsed": false
69 | },
70 | "outputs": [],
71 | "source": [
72 | "# perform the classification step and run a prediction on test set from above\n",
73 | "clf = svm.SVC(kernel = 'linear', C = 0.01)\n",
74 | "y_pred = clf.fit(X_train, y_train).predict(X_test)\n",
75 | "\n",
76 | "pd.DataFrame({'Prediction': iris.target_names[y_pred],\n",
77 | " 'Actual': iris.target_names[y_test]})"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": null,
83 | "metadata": {
84 | "collapsed": false
85 | },
86 | "outputs": [],
87 | "source": [
88 | "# accuracy score\n",
89 | "clf.score(X_test, y_test)"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": null,
95 | "metadata": {
96 | "collapsed": true
97 | },
98 | "outputs": [],
99 | "source": [
100 | "# Define a plotting function confusion matrices \n",
101 | "# (from http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html)\n",
102 | "\n",
103 | "import matplotlib.pyplot as plt\n",
104 | "import numpy as np\n",
105 | "\n",
106 | "def plot_confusion_matrix(cm, target_names, title = 'The Confusion Matrix', cmap = plt.cm.YlOrRd):\n",
107 | " plt.imshow(cm, interpolation = 'nearest', cmap = cmap)\n",
108 | " plt.tight_layout()\n",
109 | " \n",
110 | " # Add feature labels to x and y axes\n",
111 | " tick_marks = np.arange(len(target_names))\n",
112 | " plt.xticks(tick_marks, target_names, rotation=45)\n",
113 | " plt.yticks(tick_marks, target_names)\n",
114 | " \n",
115 | " plt.ylabel('True Label')\n",
116 | " plt.xlabel('Predicted Label')\n",
117 | " \n",
118 | " plt.colorbar()"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "Numbers in confusion matrix:\n",
126 | "* on-diagonal - counts of points for which the predicted label is equal to the true label\n",
127 | "* off-diagonal - counts of mislabeled points"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": null,
133 | "metadata": {
134 | "collapsed": false
135 | },
136 | "outputs": [],
137 | "source": [
138 | "%matplotlib inline\n",
139 | "\n",
140 | "cm = confusion_matrix(y_test, y_pred)\n",
141 | "\n",
142 | "# see the actual counts\n",
143 | "print(cm)\n",
144 | "\n",
145 | "# visually inpsect how the classifier did of matching predictions to true labels\n",
146 | "plot_confusion_matrix(cm, iris.target_names)"
147 | ]
148 | },
149 | {
150 | "cell_type": "markdown",
151 | "metadata": {},
152 | "source": [
153 | "* Classification reports - a text report with important classification metrics (e.g. precision, recall)"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": null,
159 | "metadata": {
160 | "collapsed": false
161 | },
162 | "outputs": [],
163 | "source": [
164 | "from sklearn.metrics import classification_report\n",
165 | "\n",
166 | "# Using the test and prediction sets from above\n",
167 | "print(classification_report(y_test, y_pred, target_names = iris.target_names))"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": null,
173 | "metadata": {
174 | "collapsed": false
175 | },
176 | "outputs": [],
177 | "source": [
178 | "# Another example with some toy data\n",
179 | "\n",
180 | "y_test = ['cat', 'dog', 'mouse', 'mouse', 'cat', 'cat']\n",
181 | "y_pred = ['mouse', 'dog', 'cat', 'mouse', 'cat', 'mouse']\n",
182 | "\n",
183 | "# How did our predictor do?\n",
184 | "print(classification_report(y_test, ___, target_names = ___)) # <-- fill in the blanks"
185 | ]
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "metadata": {},
190 | "source": [
191 | "QUICK QUESTION: Is it better to have too many false positives or too many false negatives?"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | "### Evaluating Models and Under/Over-Fitting\n",
199 | "* Over-fitting or under-fitting can be visualized as below and tuned as we will see later with `GridSearchCV` paramter tuning\n",
200 | "* A validation curve gives one an idea of the relationship of model complexity to model performance.\n",
201 | "* For this examination it would help to understand the idea of the [bias-variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff).\n",
202 | "* A learning curve helps answer the question of if there is an added benefit to adding more training data to a model. It is also a tool for investigating whether an estimator is more affected by variance error or bias error."
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {
208 | "collapsed": true
209 | },
210 | "source": [
211 | "PARTING THOUGHT: Does a parameter when increased/decreased cause overfitting or underfitting? What are the implications of those cases?"
212 | ]
213 | },
214 | {
215 | "cell_type": "markdown",
216 | "metadata": {
217 | "collapsed": true
218 | },
219 | "source": [
220 | "Created by a Microsoft Employee.\n",
221 | "\t\n",
222 | "The MIT License (MIT)
\n",
223 | "Copyright (c) 2016 Micheleen Harris"
224 | ]
225 | }
226 | ],
227 | "metadata": {
228 | "kernelspec": {
229 | "display_name": "Python 3",
230 | "language": "python",
231 | "name": "python3"
232 | },
233 | "language_info": {
234 | "codemirror_mode": {
235 | "name": "ipython",
236 | "version": 3
237 | },
238 | "file_extension": ".py",
239 | "mimetype": "text/x-python",
240 | "name": "python",
241 | "nbconvert_exporter": "python",
242 | "pygments_lexer": "ipython3",
243 | "version": "3.4.3"
244 | }
245 | },
246 | "nbformat": 4,
247 | "nbformat_minor": 0
248 | }
249 |
--------------------------------------------------------------------------------
/07.Pipelines and Parameter Tuning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Search for best parameters and create a pipeline"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "### Easy reading...create and use a pipeline"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "> Pipelining (as an aside to this section)\n",
22 | "* `Pipeline(steps=[...])` - where steps can be a list of processes through which to put data or a dictionary which includes the parameters for each step as values\n",
23 | "* For example, here we do a transformation (SelectKBest) and a classification (SVC) all at once in a pipeline we set up.\n",
24 | "\n",
25 | "See a full example [here](http://scikit-learn.org/stable/auto_examples/feature_stacker.html)\n",
26 | "\n",
27 | "Note: If you wish to perform multiple transformations in your pipeline try [FeatureUnion](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html#sklearn.pipeline.FeatureUnion)"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "# Imports for python 2/3 compatibility\n",
37 | "\n",
38 | "from __future__ import absolute_import, division, print_function, unicode_literals\n",
39 | "\n",
40 | "# For python 2, comment these out:\n",
41 | "# from builtins import range"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": null,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "from sklearn.model_selection import train_test_split\n",
51 | "from sklearn.svm import SVC\n",
52 | "from sklearn.pipeline import Pipeline\n",
53 | "from sklearn.feature_selection import SelectKBest, chi2\n",
54 | "from sklearn.datasets import load_iris\n",
55 | "\n",
56 | "iris = load_iris()\n",
57 | "X, y = iris.data, iris.target\n",
58 | "\n",
59 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)\n",
60 | "\n",
61 | "# a feature selection instance\n",
62 | "selection = SelectKBest(chi2, k = 2)\n",
63 | "\n",
64 | "# classification instance\n",
65 | "clf = SVC(kernel = 'linear')\n",
66 | "\n",
67 | "# make a pipeline\n",
68 | "pipeline = Pipeline([(\"feature selection\", selection), (\"classification\", clf)])\n",
69 | "\n",
70 | "# train the model\n",
71 | "pipeline.fit(X, y)"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "# Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.\n",
81 | "# Homepage: http://rasbt.github.io/mlxtend/\n",
82 | "\n",
83 | "!pip install msgpack mlxtend"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": null,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "import numpy as np\n",
93 | "from mlxtend.plotting import plot_decision_regions\n",
94 | "\n",
95 | "# Obtain estimated test set labels using the pipeline we created\n",
96 | "y_pred = pipeline.predict(X_test)\n",
97 | "\n",
98 | "# We use mlxtend to show the decision regions of the final SVC\n",
99 | "fig, axarr = plt.subplots(1, 2, figsize=(12,5), sharex=True, sharey=True)\n",
100 | "\n",
101 | "# Plot the decision region for the X_train and y_train. Note that the pipeline didn't transform X using\n",
102 | "# the SelectKBest component, so we transform it here:\n",
103 | "X_train_transformed = selection.transform(X_train)\n",
104 | "X_test_transformed = selection.transform(X_test)\n",
105 | "\n",
106 | "plot_decision_regions(X_train_transformed, y_train, clf=clf, legend=2, ax= axarr[0])\n",
107 | "axarr[0].set_title(\"Decision Region (Trained)\")\n",
108 | "\n",
109 | "plot_decision_regions(X_test_transformed, y_pred, clf=clf, legend=2, ax= axarr[1])\n",
110 | "axarr[1].set_title(\"Decision Region (Predicted)\")\n"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "### Last, but not least, Searching Parameter Space with `GridSearchCV`"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": null,
123 | "metadata": {},
124 | "outputs": [],
125 | "source": [
126 | "from sklearn.model_selection import GridSearchCV\n",
127 | "\n",
128 | "from sklearn.preprocessing import PolynomialFeatures\n",
129 | "from sklearn.linear_model import LinearRegression\n",
130 | "\n",
131 | "poly = PolynomialFeatures(include_bias = False)\n",
132 | "lm = LinearRegression()\n",
133 | "\n",
134 | "pipeline = Pipeline([(\"polynomial_features\", poly),\n",
135 | " (\"linear_regression\", lm)])\n",
136 | "\n",
137 | "param_grid = dict(polynomial_features__degree = list(range(1, 30, 2)),\n",
138 | " linear_regression__normalize = [False, True])\n",
139 | "\n",
140 | "grid_search = GridSearchCV(pipeline, param_grid=param_grid)\n",
141 | "grid_search.fit(X, y)\n",
142 | "print(grid_search.best_params_)"
143 | ]
144 | },
145 | {
146 | "cell_type": "markdown",
147 | "metadata": {},
148 | "source": [
149 | "Created by a Microsoft Employee.\n",
150 | "\t\n",
151 | "The MIT License (MIT)
\n",
152 | "Copyright (c) 2016 Micheleen Harris"
153 | ]
154 | }
155 | ],
156 | "metadata": {
157 | "kernelspec": {
158 | "display_name": "Python 3",
159 | "language": "python",
160 | "name": "python3"
161 | },
162 | "language_info": {
163 | "codemirror_mode": {
164 | "name": "ipython",
165 | "version": 3
166 | },
167 | "file_extension": ".py",
168 | "mimetype": "text/x-python",
169 | "name": "python",
170 | "nbconvert_exporter": "python",
171 | "pygments_lexer": "ipython3",
172 | "version": "3.6.5"
173 | }
174 | },
175 | "nbformat": 4,
176 | "nbformat_minor": 1
177 | }
178 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 Micheleen Harris
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 |
23 |
--------------------------------------------------------------------------------
/LICENSE_Jake_Vanderplas:
--------------------------------------------------------------------------------
1 | Copyright (c) 2015, Jake Vanderplas
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without modification,
5 | are permitted provided that the following conditions are met:
6 |
7 | Redistributions of source code must retain the above copyright notice, this
8 | list of conditions and the following disclaimer.
9 |
10 | Redistributions in binary form must reproduce the above copyright notice, this
11 | list of conditions and the following disclaimer in the documentation and/or
12 | other materials provided with the distribution.
13 |
14 | Neither the name of the {organization} nor the names of its
15 | contributors may be used to endorse or promote products derived from
16 | this software without specific prior written permission.
17 |
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
22 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 |
--------------------------------------------------------------------------------
/Notebook_anatomy.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Basic Anatomy of a Notebook and General Guide\n",
8 | "* Note this a is Python 3-flavored Jupyter notebook"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "metadata": {},
14 | "source": [
15 | "### My Disclaimers:\n",
16 | "1. Notebooks are no substitute for an IDE for developing apps.\n",
17 | "* Notebooks are not suitable for debugging code (yet).\n",
18 | "* They are no substitute for publication quality publishing, however they are very useful for interactive blogging\n",
19 | "* My main use of notebooks are for interactive teaching mostly and as a playground for some code that I might like to share at some point (I can add useful and pretty markup text, pics, videos, etc).\n",
20 | "* I'm a fan also because github render's ipynb files nicely (even better than r-markdown for some reason)."
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "## Shortcuts!!!\n",
28 | "* A complete list is [here](https://sowingseasons.com/blog/reference/2016/01/jupyter-keyboard-shortcuts/23298516), but these are my favorites:\n",
29 | "\n",
30 | "Mode | What | Shortcut\n",
31 | "------------- | ------------- | -------------\n",
32 | "Either (Press `Esc` to enter) | Run cell | Shift-Enter\n",
33 | "Command | Add cell below | B\n",
34 | "Command | Add cell above | A\n",
35 | "Command | Delete a cell | d-d\n",
36 | "Command | Go into edit mode | Enter\n",
37 | "Edit (Press `Enter` to enable) | Indent | Clrl-]\n",
38 | "Edit | Unindent | Ctrl-[\n",
39 | "Edit | Comment section | Ctrl-/\n",
40 | "Edit | Function introspection | Shift-Tab\n",
41 | "\n",
42 | "Try some below"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 1,
48 | "metadata": {
49 | "collapsed": false
50 | },
51 | "outputs": [
52 | {
53 | "name": "stdout",
54 | "output_type": "stream",
55 | "text": [
56 | "hello world!\n"
57 | ]
58 | }
59 | ],
60 | "source": [
61 | "print('hello world!')"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "### In this figure are a few labels of notebook parts I will refer to\n",
69 | ""
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "\n",
77 | "#### OK, change this cell to markdown to see some examples (you'll recognize this if you speak markdown)\n",
78 | "# This will be Heading1\n",
79 | "1. first thing\n",
80 | "* second thing\n",
81 | "* third thing\n",
82 | "\n",
83 | "A horizontal rule:\n",
84 | "\n",
85 | "---\n",
86 | "> Indented text\n",
87 | "\n",
88 | "Code snippet:\n",
89 | "\n",
90 | "```python\n",
91 | "import numpy as np\n",
92 | "a2d = np.random.randn(100).reshape(10, 10)\n",
93 | "```\n",
94 | "\n",
95 | "LaTeX inline equation:\n",
96 | "\n",
97 | "$\\Delta =\\sum_{i=1}^N w_i (x_i - \\bar{x})^2$\n",
98 | "\n",
99 | "LaTeX table:\n",
100 | "\n",
101 | "First Header | Second Header\n",
102 | "------------- | -------------\n",
103 | "Content Cell | Content Cell\n",
104 | "Content Cell | Content Cell\n",
105 | "\n",
106 | "HTML:\n",
107 | "\n",
108 | ""
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "metadata": {},
114 | "source": [
115 | "### As you can see on your jupyter homepage, you can open up any notebook\n",
116 | "NB: You can return to the homepage by clicking the Jupyter icon in the very upper left corner at any time\n",
117 | "### You can also Upload a notebook (button on upper right)\n",
118 | "\n",
119 | "### As well as start a new notebook with a specific kernel (button to the right of Upload)\n",
120 | "\n",
121 | "\n",
122 | "> So, what's that number after `In` or `Out`? That's the order of running this cell relative to other cells (useful for keeping track of what order cells have been run). When you save this notebook that number along with any output shown will also be saved. To reset a notebook go to Cell -> All Output -> Clear and then Save it.\n",
123 | "\n",
124 | "You can do something like this to render a publicly available notebook on github statically (this I do as a backup for presentations and course stuff):\n",
125 | "\n",
126 | "```\n",
127 | "http://nbviewer.jupyter.org/github/
\n",
130 | "http://nbviewer.jupyter.org/github/michhar/rpy2_sample_notebooks/blob/master/TestingRpy2.ipynb\n",
131 | "\n",
132 | "
\n",
133 | "Also, you can upload or start a new interactive, free notebook by going here:
\n",
134 | "https://tmpnb.org\n",
135 | "
\n",
136 | "\n",
137 | "> The nifty thing about Jupyter notebooks (and the .ipynb files which you can download and upload) is that you can share these. They are just written in JSON language. I put them up in places like GitHub and point people in that direction. \n",
138 | "\n",
139 | "> Some people (like [this guy](http://www.r-bloggers.com/why-i-dont-like-jupyter-fka-ipython-notebook/) who misses the point I think) really dislike notebooks, but they are really good for what they are good at - sharing code ideas plus neat notes and stuff in dev, teaching interactively, even chaining languages together in a polyglot style. And doing all of this on github works really well (as long as you remember to always clear your output before checking in - version control can get a bit crazy otherwise).\n",
140 | "\n",
141 | "### Some additional features\n",
142 | "* tab completion\n",
143 | "* function introspection\n",
144 | "* help"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 2,
150 | "metadata": {
151 | "collapsed": true
152 | },
153 | "outputs": [],
154 | "source": [
155 | "import json"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "metadata": {
162 | "collapsed": true
163 | },
164 | "outputs": [],
165 | "source": [
166 | "# hit Tab at end of this to see all methods\n",
167 | "json.\n",
168 | "\n",
169 | "# hit Shift-Tab within parenthesis of method to see full docstring\n",
170 | "json.loads()"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": null,
176 | "metadata": {
177 | "collapsed": true
178 | },
179 | "outputs": [],
180 | "source": [
181 | "?sum()"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {
188 | "collapsed": false
189 | },
190 | "outputs": [],
191 | "source": [
192 | "import json\n",
193 | "?json"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "metadata": {
199 | "collapsed": true
200 | },
201 | "source": [
202 | "The MIT License (MIT)
\n",
203 | "Copyright (c) 2016 Micheleen Harris"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": null,
209 | "metadata": {
210 | "collapsed": true
211 | },
212 | "outputs": [],
213 | "source": []
214 | }
215 | ],
216 | "metadata": {
217 | "kernelspec": {
218 | "display_name": "Python 3",
219 | "language": "python",
220 | "name": "python3"
221 | },
222 | "language_info": {
223 | "codemirror_mode": {
224 | "name": "ipython",
225 | "version": 3
226 | },
227 | "file_extension": ".py",
228 | "mimetype": "text/x-python",
229 | "name": "python",
230 | "nbconvert_exporter": "python",
231 | "pygments_lexer": "ipython3",
232 | "version": "3.5.1"
233 | }
234 | },
235 | "nbformat": 4,
236 | "nbformat_minor": 0
237 | }
238 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## What's in this tutorial
2 |
3 | The notebooks are a modular introduction to machine learning in python using `scikit-learn` with examples and tips.
4 |
5 | The material is in jupyter notebook format and was designed to be compatible with Python >= 2.6 or >= 3.3. To use these notebooks interatively (intended use), you will need a jupyter/ipython notebook install (see below).
6 |
7 | Also, included is a brief introductory guide to jupyter notebooks in Notebook_anatomy notebook. If you are unfamiliar with jupyter/ipython notebooks, please take some time to look at this file.
8 |
9 | ## Installation Notes
10 |
11 | > For a quick deployment, simply click the `launch binder` link at the bottom of this page. However, we recommend a local install for more customizable setups, flexibility and possiblities.
12 |
13 | ### Setting up a development environment
14 |
15 | > Note: the requirements.txt file above is a snapshot of the latest `pip` installed packages from a successful ML ecosystem. `conda` should install the best dependencies for the `scikit-learn` used and may have different versions.
16 |
17 | It is generally best practice to have a distinct development environment for various Python projects. There are multiple options available to do this such as virtualenv and Conda. For this project, we will be using the [Conda](https://www.continuum.io/why-anaconda) environment.
18 |
19 | To get started, you can install [miniconda3](http://conda.pydata.org/docs/install/quick.html) to get python3 as well as python2.
20 |
21 | If you already have Python installed, you can install Conda via `pip`:
22 |
23 | ```
24 | pip install auxlib conda
25 | ```
26 |
27 | ### Initializing a Conda environment
28 |
29 | * To setup a python 2.7 development environment in addition to your python 3 conda install for this project (done after installing [miniconda3](http://conda.pydata.org/docs/install/quick.html)), you can run:
30 | * `conda create --name sklearn python=2`
31 | * This installs into `C:\Miniconda3\envs\python2\` so I added this to system path (on Windows)
32 | * On Linux and OS/X, this depends on where the Python Framework is installed. On OS/X using Homebrew, this installs into `/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/envs/python2/bin`
33 | * See [here](http://conda.pydata.org/docs/py2or3.html) for more detailed instructions
34 |
35 | * To activate the development environment, from the `bin` folder of your conda environment, run
36 | * Windows: `activate sklearn`
37 | * Linux/OSX: `source activate sklearn`
38 |
39 | * Ensure ipython/ipython2 is installed in the Python environment
40 | * Windows: `c:\Miniconda3\envs\python2\Scripts\ipython2.exe kernel install --name python2 --display-name "Python 2"`
41 | * Linux/OSX: `ipython2 kernel install --name python2 --display-name "Python 2"` (may need `sudo`)
42 |
43 | * If, at any point, you desire to exit the development environment, simply type the following:
44 | * Windows: `deactivate`
45 | * Linux/OSX: `source deactivate`
46 |
47 |
48 | ### Installing jupyter notebook locally
49 |
50 | The easiest way to install [jupyter notebook](http://jupyter.org/) is via `conda install`
51 | * Run `conda install jupyter` from your terminal. Linux/OSX may require `sudo` permissions.
52 | * Navigate to the directory containing this repository, and execute `jupyter notebook`. This will start a notebook service locally for accessing notebooks in your browser. Drill down on the home page to your notebook of interest.
53 |
54 | For a notebook primer go to `Notebook_anatomy.ipynb` on this repo. The very short story is: to execute a cell just hit Shift-Enter. There are many more shortcuts in primer.
55 |
56 | ## Installing python packages
57 |
58 | This tutorial requires the following packages:
59 |
60 | * numpy version 1.5 or later: http://www.numpy.org/
61 | * scipy version 0.10 or later: http://www.scipy.org/
62 | * pandas http://pandas.pydata.org/
63 | * matplotlib version 1.3 or later: http://matplotlib.org/
64 | * scikit-learn version 0.14 or later: http://scikit-learn.org
65 | * jupyter http://jupyter.readthedocs.org/en/latest/install.html
66 |
67 | You can use your development environment of choice, but if you used `conda` as described above, simply run:
68 | ```
69 | $ conda install numpy scipy matplotlib scikit-learn jupyter
70 | ```
71 |
72 | We have also provided a requirements.txt file above for use with pip.
73 |
74 | ## Other install options
75 |
76 | There are many different ways to install python and the package ecosystem for machine learning. They are not all going to be covered here, but essentially you have the following choices:
77 |
78 | 1. anaconda/miniconda aka conda (shown above)
79 | 2. download python and pip install packages
80 | 3. use a docker image ([this](https://hub.docker.com/r/wi3o/skflow-jupyternb/) is one for jupyter+sklearn+skflow+tensorflow)
81 | 4. [Google cloud platform](https://cloud.google.com/) has a jupyter notebook service called Datalab (quickstart [here](https://cloud.google.com/datalab/docs/quickstart)). It has tensorflow pre-installed (needed for next tutorial).
82 | 5. Click the Binder link at the bottom of this page to deploy a notebook setup.
83 |
84 | Or a combination of the above.
85 |
86 | A quick tip if you are installing in a non-conda way with `pip` and are on Windows, many of the data analysis packages are tricky (compiled dependencies) to install. A nice "unofficial" repository for binaries of packages like `numpy` and a myriad of others was created and maintained by Christoph Gohlke. This site is [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/).
87 |
88 | ## What's next
89 |
90 | The next tutorial in this workshop is on `tensorflow` and the installation instructions are in this [README](https://github.com/PythonWorkshop/intro-to-tensorflow/blob/master/README.md)
91 |
92 | [](http://mybinder.org/repo/PythonWorkshop/intro-to-sklearn)
93 |
--------------------------------------------------------------------------------
/Resources.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Some References\n",
8 | "* [The iris dataset and an intro to sklearn explained on the Kaggle blog](http://blog.kaggle.com/2015/04/22/scikit-learn-video-3-machine-learning-first-steps-with-the-iris-dataset/)\n",
9 | "* [sklearn: Conference Notebooks and Presentation from Open Data Science Conf 2015](https://github.com/amueller/odscon-sf-2015) by Andreas Mueller\n",
10 | "* [real-world example set of notebooks for learning ML from Open Data Science Conf 2015](https://github.com/cmmalone/malone_OpenDataSciCon) by Katie Malone\n",
11 | "* [PyCon 2015 Workshop, Scikit-learn tutorial](https://www.youtube.com/watch?v=L7R4HUQ-eQ0) by Jake VanDerplas (Univ of Washington, eScience Dept)\n",
12 | "* [Data Science for the Rest of Us](https://channel9.msdn.com/blogs/Cloud-and-Enterprise-Premium/Data-Science-for-Rest-of-Us) great introductory webinar (no math) by Brandon Rohrer (Microsoft)\n",
13 | "* [A Few Useful Things to Know about Machine Learning](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) with useful ML \"folk wisdom\" by Pedro Domingos (Univ of Washington, CS Dept)\n",
14 | "* [Machine Learning 101](http://www.astroml.org/sklearn_tutorial/general_concepts.html) associated with `sklearn` docs\n",
15 | "\n",
16 | "### Some Datasets\n",
17 | "* [Machine learning datasets](http://mldata.org/)\n",
18 | "* [Make your own with sklearn](http://scikit-learn.org/stable/datasets/index.html#sample-generators)\n",
19 | "* [Kaggle datasets](https://www.kaggle.com/datasets)\n",
20 | "\n",
21 | "### Contact Info\n",
22 | "\n",
23 | "Micheleen Harris
\n",
24 | "email: michhar@microsoft.com"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": null,
30 | "metadata": {
31 | "collapsed": true
32 | },
33 | "outputs": [],
34 | "source": []
35 | }
36 | ],
37 | "metadata": {
38 | "kernelspec": {
39 | "display_name": "Python 3",
40 | "language": "python",
41 | "name": "python3"
42 | },
43 | "language_info": {
44 | "codemirror_mode": {
45 | "name": "ipython",
46 | "version": 3
47 | },
48 | "file_extension": ".py",
49 | "mimetype": "text/x-python",
50 | "name": "python",
51 | "nbconvert_exporter": "python",
52 | "pygments_lexer": "ipython3",
53 | "version": "3.5.1"
54 | }
55 | },
56 | "nbformat": 4,
57 | "nbformat_minor": 0
58 | }
59 |
--------------------------------------------------------------------------------
/Untitled.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 6,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [
10 | {
11 | "data": {
12 | "text/plain": [
13 | "str"
14 | ]
15 | },
16 | "execution_count": 6,
17 | "metadata": {},
18 | "output_type": "execute_result"
19 | }
20 | ],
21 | "source": [
22 | "s1 = u'my string'\n",
23 | "type(s1)"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 3,
29 | "metadata": {
30 | "collapsed": true
31 | },
32 | "outputs": [],
33 | "source": [
34 | "from __future__ import unicode_literals"
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": 7,
40 | "metadata": {
41 | "collapsed": true
42 | },
43 | "outputs": [],
44 | "source": []
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 8,
49 | "metadata": {
50 | "collapsed": false
51 | },
52 | "outputs": [
53 | {
54 | "data": {
55 | "text/plain": [
56 | "dict_values([175, 166, 192])"
57 | ]
58 | },
59 | "execution_count": 8,
60 | "metadata": {},
61 | "output_type": "execute_result"
62 | }
63 | ],
64 | "source": [
65 | "heights = {'Fred': 175, 'Anne': 166, 'Joe': 192}\n",
66 | "heights.values()"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": null,
72 | "metadata": {
73 | "collapsed": true
74 | },
75 | "outputs": [],
76 | "source": []
77 | }
78 | ],
79 | "metadata": {
80 | "kernelspec": {
81 | "display_name": "Python 3",
82 | "language": "python",
83 | "name": "python3"
84 | },
85 | "language_info": {
86 | "codemirror_mode": {
87 | "name": "ipython",
88 | "version": 3
89 | },
90 | "file_extension": ".py",
91 | "mimetype": "text/x-python",
92 | "name": "python",
93 | "nbconvert_exporter": "python",
94 | "pygments_lexer": "ipython3",
95 | "version": "3.4.3"
96 | }
97 | },
98 | "nbformat": 4,
99 | "nbformat_minor": 0
100 | }
101 |
--------------------------------------------------------------------------------
/fig_code/ML_flow_chart.py:
--------------------------------------------------------------------------------
1 | """
2 | Tutorial Diagrams
3 | -----------------
4 |
5 | This script plots the flow-charts used in the scikit-learn tutorials.
6 | """
7 |
8 | import numpy as np
9 | import pylab as pl
10 | from matplotlib.patches import Circle, Rectangle, Polygon, Arrow, FancyArrow
11 |
12 | def create_base(box_bg = '#CCCCCC',
13 | arrow1 = '#88CCFF',
14 | arrow2 = '#88FF88',
15 | supervised=True):
16 | fig = pl.figure(figsize=(9, 6), facecolor='w')
17 | ax = pl.axes((0, 0, 1, 1),
18 | xticks=[], yticks=[], frameon=False)
19 | ax.set_xlim(0, 9)
20 | ax.set_ylim(0, 6)
21 |
22 | patches = [Rectangle((0.3, 3.6), 1.5, 1.8, zorder=1, fc=box_bg),
23 | Rectangle((0.5, 3.8), 1.5, 1.8, zorder=2, fc=box_bg),
24 | Rectangle((0.7, 4.0), 1.5, 1.8, zorder=3, fc=box_bg),
25 |
26 | Rectangle((2.9, 3.6), 0.2, 1.8, fc=box_bg),
27 | Rectangle((3.1, 3.8), 0.2, 1.8, fc=box_bg),
28 | Rectangle((3.3, 4.0), 0.2, 1.8, fc=box_bg),
29 |
30 | Rectangle((0.3, 0.2), 1.5, 1.8, fc=box_bg),
31 |
32 | Rectangle((2.9, 0.2), 0.2, 1.8, fc=box_bg),
33 |
34 | Circle((5.5, 3.5), 1.0, fc=box_bg),
35 |
36 | Polygon([[5.5, 1.7],
37 | [6.1, 1.1],
38 | [5.5, 0.5],
39 | [4.9, 1.1]], fc=box_bg),
40 |
41 | FancyArrow(2.3, 4.6, 0.35, 0, fc=arrow1,
42 | width=0.25, head_width=0.5, head_length=0.2),
43 |
44 | FancyArrow(3.75, 4.2, 0.5, -0.2, fc=arrow1,
45 | width=0.25, head_width=0.5, head_length=0.2),
46 |
47 | FancyArrow(5.5, 2.4, 0, -0.4, fc=arrow1,
48 | width=0.25, head_width=0.5, head_length=0.2),
49 |
50 | FancyArrow(2.0, 1.1, 0.5, 0, fc=arrow2,
51 | width=0.25, head_width=0.5, head_length=0.2),
52 |
53 | FancyArrow(3.3, 1.1, 1.3, 0, fc=arrow2,
54 | width=0.25, head_width=0.5, head_length=0.2),
55 |
56 | FancyArrow(6.2, 1.1, 0.8, 0, fc=arrow2,
57 | width=0.25, head_width=0.5, head_length=0.2)]
58 |
59 | if supervised:
60 | patches += [Rectangle((0.3, 2.4), 1.5, 0.5, zorder=1, fc=box_bg),
61 | Rectangle((0.5, 2.6), 1.5, 0.5, zorder=2, fc=box_bg),
62 | Rectangle((0.7, 2.8), 1.5, 0.5, zorder=3, fc=box_bg),
63 | FancyArrow(2.3, 2.9, 2.0, 0, fc=arrow1,
64 | width=0.25, head_width=0.5, head_length=0.2),
65 | Rectangle((7.3, 0.85), 1.5, 0.5, fc=box_bg)]
66 | else:
67 | patches += [Rectangle((7.3, 0.2), 1.5, 1.8, fc=box_bg)]
68 |
69 | for p in patches:
70 | ax.add_patch(p)
71 |
72 | pl.text(1.45, 4.9, "Training\nText,\nDocuments,\nImages,\netc.",
73 | ha='center', va='center', fontsize=14)
74 |
75 | pl.text(3.6, 4.9, "Feature\nVectors",
76 | ha='left', va='center', fontsize=14)
77 |
78 | pl.text(5.5, 3.5, "Machine\nLearning\nAlgorithm",
79 | ha='center', va='center', fontsize=14)
80 |
81 | pl.text(1.05, 1.1, "New Text,\nDocument,\nImage,\netc.",
82 | ha='center', va='center', fontsize=14)
83 |
84 | pl.text(3.3, 1.7, "Feature\nVector",
85 | ha='left', va='center', fontsize=14)
86 |
87 | pl.text(5.5, 1.1, "Predictive\nModel",
88 | ha='center', va='center', fontsize=12)
89 |
90 | if supervised:
91 | pl.text(1.45, 3.05, "Labels",
92 | ha='center', va='center', fontsize=14)
93 |
94 | pl.text(8.05, 1.1, "Expected\nLabel",
95 | ha='center', va='center', fontsize=14)
96 | pl.text(8.8, 5.8, "Supervised Learning Model",
97 | ha='right', va='top', fontsize=18)
98 |
99 | else:
100 | pl.text(8.05, 1.1,
101 | "Likelihood\nor Cluster ID\nor Better\nRepresentation",
102 | ha='center', va='center', fontsize=12)
103 | pl.text(8.8, 5.8, "Unsupervised Learning Model",
104 | ha='right', va='top', fontsize=18)
105 |
106 |
107 |
108 | def plot_supervised_chart(annotate=False):
109 | create_base(supervised=True)
110 | if annotate:
111 | fontdict = dict(color='r', weight='bold', size=14)
112 | pl.text(1.9, 4.55, 'X = vec.fit_transform(input)',
113 | fontdict=fontdict,
114 | rotation=20, ha='left', va='bottom')
115 | pl.text(3.7, 3.2, 'clf.fit(X, y)',
116 | fontdict=fontdict,
117 | rotation=20, ha='left', va='bottom')
118 | pl.text(1.7, 1.5, 'X_new = vec.transform(input)',
119 | fontdict=fontdict,
120 | rotation=20, ha='left', va='bottom')
121 | pl.text(6.1, 1.5, 'y_new = clf.predict(X_new)',
122 | fontdict=fontdict,
123 | rotation=20, ha='left', va='bottom')
124 |
125 | def plot_unsupervised_chart():
126 | create_base(supervised=False)
127 |
128 |
129 | if __name__ == '__main__':
130 | plot_supervised_chart(False)
131 | plot_supervised_chart(True)
132 | plot_unsupervised_chart()
133 | pl.show()
134 |
135 |
136 |
--------------------------------------------------------------------------------
/fig_code/__init__.py:
--------------------------------------------------------------------------------
1 | from .data import *
2 | from .figures import *
3 |
4 | from .sgd_separator import plot_sgd_separator
5 | from .linear_regression import plot_linear_regression
6 | from .helpers import plot_iris_knn
7 |
--------------------------------------------------------------------------------
/fig_code/data.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def linear_data_sample(N=40, rseed=0, m=3, b=-2):
5 | rng = np.random.RandomState(rseed)
6 |
7 | x = 10 * rng.rand(N)
8 | dy = m / 2 * (1 + rng.rand(N))
9 | y = m * x + b + dy * rng.randn(N)
10 |
11 | return (x, y, dy)
12 |
13 |
14 | def linear_data_sample_big_errs(N=40, rseed=0, m=3, b=-2):
15 | rng = np.random.RandomState(rseed)
16 |
17 | x = 10 * rng.rand(N)
18 | dy = m / 2 * (1 + rng.rand(N))
19 | dy[20:25] *= 10
20 | y = m * x + b + dy * rng.randn(N)
21 |
22 | return (x, y, dy)
23 |
24 |
25 | def sample_light_curve(phased=True):
26 | from astroML.datasets import fetch_LINEAR_sample
27 | data = fetch_LINEAR_sample()
28 | t, y, dy = data[18525697].T
29 |
30 | if phased:
31 | P_best = 0.580313015651
32 | t /= P_best
33 |
34 | return (t, y, dy)
35 |
36 |
37 | def sample_light_curve_2(phased=True):
38 | from astroML.datasets import fetch_LINEAR_sample
39 | data = fetch_LINEAR_sample()
40 | t, y, dy = data[10022663].T
41 |
42 | if phased:
43 | P_best = 0.61596079804
44 | t /= P_best
45 |
46 | return (t, y, dy)
47 |
48 |
--------------------------------------------------------------------------------
/fig_code/figures.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | import warnings
4 |
5 |
6 | def plot_venn_diagram():
7 | fig, ax = plt.subplots(subplot_kw=dict(frameon=False, xticks=[], yticks=[]))
8 | ax.add_patch(plt.Circle((0.3, 0.3), 0.3, fc='red', alpha=0.5))
9 | ax.add_patch(plt.Circle((0.6, 0.3), 0.3, fc='blue', alpha=0.5))
10 | ax.add_patch(plt.Rectangle((-0.1, -0.1), 1.1, 0.8, fc='none', ec='black'))
11 | ax.text(0.2, 0.3, '$x$', size=30, ha='center', va='center')
12 | ax.text(0.7, 0.3, '$y$', size=30, ha='center', va='center')
13 | ax.text(0.0, 0.6, '$I$', size=30)
14 | ax.axis('equal')
15 |
16 |
17 | def plot_example_decision_tree():
18 | fig = plt.figure(figsize=(10, 4))
19 | ax = fig.add_axes([0, 0, 0.8, 1], frameon=False, xticks=[], yticks=[])
20 | ax.set_title('Example Decision Tree: Animal Classification', size=24)
21 |
22 | def text(ax, x, y, t, size=20, **kwargs):
23 | ax.text(x, y, t,
24 | ha='center', va='center', size=size,
25 | bbox=dict(boxstyle='round', ec='k', fc='w'), **kwargs)
26 |
27 | text(ax, 0.5, 0.9, "How big is\nthe animal?", 20)
28 | text(ax, 0.3, 0.6, "Does the animal\nhave horns?", 18)
29 | text(ax, 0.7, 0.6, "Does the animal\nhave two legs?", 18)
30 | text(ax, 0.12, 0.3, "Are the horns\nlonger than 10cm?", 14)
31 | text(ax, 0.38, 0.3, "Is the animal\nwearing a collar?", 14)
32 | text(ax, 0.62, 0.3, "Does the animal\nhave wings?", 14)
33 | text(ax, 0.88, 0.3, "Does the animal\nhave a tail?", 14)
34 |
35 | text(ax, 0.4, 0.75, "> 1m", 12, alpha=0.4)
36 | text(ax, 0.6, 0.75, "< 1m", 12, alpha=0.4)
37 |
38 | text(ax, 0.21, 0.45, "yes", 12, alpha=0.4)
39 | text(ax, 0.34, 0.45, "no", 12, alpha=0.4)
40 |
41 | text(ax, 0.66, 0.45, "yes", 12, alpha=0.4)
42 | text(ax, 0.79, 0.45, "no", 12, alpha=0.4)
43 |
44 | ax.plot([0.3, 0.5, 0.7], [0.6, 0.9, 0.6], '-k')
45 | ax.plot([0.12, 0.3, 0.38], [0.3, 0.6, 0.3], '-k')
46 | ax.plot([0.62, 0.7, 0.88], [0.3, 0.6, 0.3], '-k')
47 | ax.plot([0.0, 0.12, 0.20], [0.0, 0.3, 0.0], '--k')
48 | ax.plot([0.28, 0.38, 0.48], [0.0, 0.3, 0.0], '--k')
49 | ax.plot([0.52, 0.62, 0.72], [0.0, 0.3, 0.0], '--k')
50 | ax.plot([0.8, 0.88, 1.0], [0.0, 0.3, 0.0], '--k')
51 | ax.axis([0, 1, 0, 1])
52 |
53 |
54 | def visualize_tree(estimator, X, y, boundaries=True,
55 | xlim=None, ylim=None):
56 | estimator.fit(X, y)
57 |
58 | if xlim is None:
59 | xlim = (X[:, 0].min() - 0.1, X[:, 0].max() + 0.1)
60 | if ylim is None:
61 | ylim = (X[:, 1].min() - 0.1, X[:, 1].max() + 0.1)
62 |
63 | x_min, x_max = xlim
64 | y_min, y_max = ylim
65 | xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
66 | np.linspace(y_min, y_max, 100))
67 | Z = estimator.predict(np.c_[xx.ravel(), yy.ravel()])
68 |
69 | # Put the result into a color plot
70 | Z = Z.reshape(xx.shape)
71 | plt.figure()
72 | plt.pcolormesh(xx, yy, Z, alpha=0.2, cmap='rainbow')
73 | plt.clim(y.min(), y.max())
74 |
75 | # Plot also the training points
76 | plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='rainbow')
77 | plt.axis('off')
78 |
79 | plt.xlim(x_min, x_max)
80 | plt.ylim(y_min, y_max)
81 | plt.clim(y.min(), y.max())
82 |
83 | # Plot the decision boundaries
84 | def plot_boundaries(i, xlim, ylim):
85 | if i < 0:
86 | return
87 |
88 | tree = estimator.tree_
89 |
90 | if tree.feature[i] == 0:
91 | plt.plot([tree.threshold[i], tree.threshold[i]], ylim, '-k')
92 | plot_boundaries(tree.children_left[i],
93 | [xlim[0], tree.threshold[i]], ylim)
94 | plot_boundaries(tree.children_right[i],
95 | [tree.threshold[i], xlim[1]], ylim)
96 |
97 | elif tree.feature[i] == 1:
98 | plt.plot(xlim, [tree.threshold[i], tree.threshold[i]], '-k')
99 | plot_boundaries(tree.children_left[i], xlim,
100 | [ylim[0], tree.threshold[i]])
101 | plot_boundaries(tree.children_right[i], xlim,
102 | [tree.threshold[i], ylim[1]])
103 |
104 | if boundaries:
105 | plot_boundaries(0, plt.xlim(), plt.ylim())
106 |
107 |
108 | def plot_tree_interactive(X, y):
109 | from sklearn.tree import DecisionTreeClassifier
110 |
111 | def interactive_tree(depth=1):
112 | clf = DecisionTreeClassifier(max_depth=depth, random_state=0)
113 | visualize_tree(clf, X, y)
114 |
115 | from IPython.html.widgets import interact
116 | return interact(interactive_tree, depth=[1, 5])
117 |
118 |
119 | def plot_kmeans_interactive(min_clusters=1, max_clusters=6):
120 | from IPython.html.widgets import interact
121 | from sklearn.metrics.pairwise import euclidean_distances
122 | from sklearn.datasets.samples_generator import make_blobs
123 |
124 | with warnings.catch_warnings():
125 | #warnings.filterwarnings('ignore')
126 |
127 | from sklearn.datasets import load_iris
128 | from sklearn.decomposition import PCA
129 |
130 | iris = load_iris()
131 | X, y = iris.data, iris.target
132 | pca = PCA(n_components = 0.95) # keep 95% of variance
133 | X = pca.fit_ttransform(X)
134 | #X = X[:, 1:3]
135 |
136 |
137 | def _kmeans_step(frame=0, n_clusters=3):
138 | rng = np.random.RandomState(2)
139 | labels = np.zeros(X.shape[0])
140 | centers = rng.randn(n_clusters, 2)
141 |
142 | nsteps = frame // 3
143 |
144 | for i in range(nsteps + 1):
145 | old_centers = centers
146 | if i < nsteps or frame % 3 > 0:
147 | dist = euclidean_distances(X, centers)
148 | labels = dist.argmin(1)
149 |
150 | if i < nsteps or frame % 3 > 1:
151 | centers = np.array([X[labels == j].mean(0)
152 | for j in range(n_clusters)])
153 | nans = np.isnan(centers)
154 | centers[nans] = old_centers[nans]
155 |
156 |
157 | # plot the data and cluster centers
158 | plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='rainbow',
159 | vmin=0, vmax=n_clusters - 1);
160 | plt.scatter(old_centers[:, 0], old_centers[:, 1], marker='o',
161 | c=np.arange(n_clusters),
162 | s=200, cmap='rainbow')
163 | plt.scatter(old_centers[:, 0], old_centers[:, 1], marker='o',
164 | c='black', s=50)
165 |
166 | # plot new centers if third frame
167 | if frame % 3 == 2:
168 | for i in range(n_clusters):
169 | plt.annotate('', centers[i], old_centers[i],
170 | arrowprops=dict(arrowstyle='->', linewidth=1))
171 | plt.scatter(centers[:, 0], centers[:, 1], marker='o',
172 | c=np.arange(n_clusters),
173 | s=200, cmap='rainbow')
174 | plt.scatter(centers[:, 0], centers[:, 1], marker='o',
175 | c='black', s=50)
176 |
177 | plt.xlim(-4, 4)
178 | plt.ylim(-2, 10)
179 |
180 | if frame % 3 == 1:
181 | plt.text(3.8, 9.5, "1. Reassign points to nearest centroid",
182 | ha='right', va='top', size=14)
183 | elif frame % 3 == 2:
184 | plt.text(3.8, 9.5, "2. Update centroids to cluster means",
185 | ha='right', va='top', size=14)
186 |
187 |
188 | return interact(_kmeans_step, frame=[0, 50],
189 | n_clusters=[min_clusters, max_clusters])
190 |
191 |
192 | def plot_image_components(x, coefficients=None, mean=0, components=None,
193 | imshape=(8, 8), n_components=6, fontsize=12):
194 | if coefficients is None:
195 | coefficients = x
196 |
197 | if components is None:
198 | components = np.eye(len(coefficients), len(x))
199 |
200 | mean = np.zeros_like(x) + mean
201 |
202 |
203 | fig = plt.figure(figsize=(1.2 * (5 + n_components), 1.2 * 2))
204 | g = plt.GridSpec(2, 5 + n_components, hspace=0.3)
205 |
206 | def show(i, j, x, title=None):
207 | ax = fig.add_subplot(g[i, j], xticks=[], yticks=[])
208 | ax.imshow(x.reshape(imshape), interpolation='nearest')
209 | if title:
210 | ax.set_title(title, fontsize=fontsize)
211 |
212 | show(slice(2), slice(2), x, "True")
213 |
214 | approx = mean.copy()
215 | show(0, 2, np.zeros_like(x) + mean, r'$\mu$')
216 | show(1, 2, approx, r'$1 \cdot \mu$')
217 |
218 | for i in range(0, n_components):
219 | approx = approx + coefficients[i] * components[i]
220 | show(0, i + 3, components[i], r'$c_{0}$'.format(i + 1))
221 | show(1, i + 3, approx,
222 | r"${0:.2f} \cdot c_{1}$".format(coefficients[i], i + 1))
223 | plt.gca().text(0, 1.05, '$+$', ha='right', va='bottom',
224 | transform=plt.gca().transAxes, fontsize=fontsize)
225 |
226 | show(slice(2), slice(-2, None), approx, "Approx")
227 |
228 |
229 | def plot_pca_interactive(data, n_components=6):
230 | from sklearn.decomposition import PCA
231 | from IPython.html.widgets import interact
232 |
233 | pca = PCA(n_components=n_components)
234 | Xproj = pca.fit_transform(data)
235 |
236 | def show_decomp(i=0):
237 | plot_image_components(data[i], Xproj[i],
238 | pca.mean_, pca.components_)
239 |
240 | interact(show_decomp, i=(0, data.shape[0] - 1));
241 |
--------------------------------------------------------------------------------
/fig_code/helpers.py:
--------------------------------------------------------------------------------
1 | """
2 | Small helpers for code that is not shown in the notebooks
3 | """
4 |
5 | from sklearn import neighbors, datasets, linear_model
6 | import pylab as pl
7 | import numpy as np
8 | from matplotlib.colors import ListedColormap
9 |
10 | # Create color maps for 3-class classification problem, as with iris
11 | cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
12 | cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
13 |
14 | def plot_iris_knn():
15 | iris = datasets.load_iris()
16 | X = iris.data[:, :2] # we only take the first two features. We could
17 | # avoid this ugly slicing by using a two-dim dataset
18 | y = iris.target
19 |
20 | knn = neighbors.KNeighborsClassifier(n_neighbors=3)
21 | knn.fit(X, y)
22 |
23 | x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1
24 | y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1
25 | xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
26 | np.linspace(y_min, y_max, 100))
27 | Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
28 |
29 | # Put the result into a color plot
30 | Z = Z.reshape(xx.shape)
31 | pl.figure()
32 | pl.pcolormesh(xx, yy, Z, cmap=cmap_light)
33 |
34 | # Plot also the training points
35 | pl.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
36 | pl.xlabel('sepal length (cm)')
37 | pl.ylabel('sepal width (cm)')
38 | pl.axis('tight')
39 |
40 |
41 | def plot_polynomial_regression():
42 | rng = np.random.RandomState(0)
43 | x = 2*rng.rand(100) - 1
44 |
45 | f = lambda t: 1.2 * t**2 + .1 * t**3 - .4 * t **5 - .5 * t ** 9
46 | y = f(x) + .4 * rng.normal(size=100)
47 |
48 | x_test = np.linspace(-1, 1, 100)
49 |
50 | pl.figure()
51 | pl.scatter(x, y, s=4)
52 |
53 | X = np.array([x**i for i in range(5)]).T
54 | X_test = np.array([x_test**i for i in range(5)]).T
55 | regr = linear_model.LinearRegression()
56 | regr.fit(X, y)
57 | pl.plot(x_test, regr.predict(X_test), label='4th order')
58 |
59 | X = np.array([x**i for i in range(10)]).T
60 | X_test = np.array([x_test**i for i in range(10)]).T
61 | regr = linear_model.LinearRegression()
62 | regr.fit(X, y)
63 | pl.plot(x_test, regr.predict(X_test), label='9th order')
64 |
65 | pl.legend(loc='best')
66 | pl.axis('tight')
67 | pl.title('Fitting a 4th and a 9th order polynomial')
68 |
69 | pl.figure()
70 | pl.scatter(x, y, s=4)
71 | pl.plot(x_test, f(x_test), label="truth")
72 | pl.axis('tight')
73 | pl.title('Ground truth (9th order polynomial)')
74 |
75 |
76 |
--------------------------------------------------------------------------------
/fig_code/linear_regression.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | from sklearn.linear_model import LinearRegression
4 |
5 |
6 | def plot_linear_regression():
7 | a = 0.5
8 | b = 1.0
9 |
10 | # x from 0 to 10
11 | x = 30 * np.random.random(20)
12 |
13 | # y = a*x + b with noise
14 | y = a * x + b + np.random.normal(size=x.shape)
15 |
16 | # create a linear regression classifier
17 | clf = LinearRegression()
18 | clf.fit(x[:, None], y)
19 |
20 | # predict y from the data
21 | x_new = np.linspace(0, 30, 100)
22 | y_new = clf.predict(x_new[:, None])
23 |
24 | # plot the results
25 | ax = plt.axes()
26 | ax.scatter(x, y)
27 | ax.plot(x_new, y_new)
28 |
29 | ax.set_xlabel('x')
30 | ax.set_ylabel('y')
31 |
32 | ax.axis('tight')
33 |
34 |
35 | if __name__ == '__main__':
36 | plot_linear_regression()
37 | plt.show()
38 |
--------------------------------------------------------------------------------
/fig_code/sgd_separator.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | from sklearn.linear_model import SGDClassifier
4 | from sklearn.datasets.samples_generator import make_blobs
5 |
6 | def plot_sgd_separator():
7 | # we create 50 separable points
8 | X, Y = make_blobs(n_samples=50, centers=2,
9 | random_state=0, cluster_std=0.60)
10 |
11 | # fit the model
12 | clf = SGDClassifier(loss="hinge", alpha=0.01,
13 | n_iter=200, fit_intercept=True)
14 | clf.fit(X, Y)
15 |
16 | # plot the line, the points, and the nearest vectors to the plane
17 | xx = np.linspace(-1, 5, 10)
18 | yy = np.linspace(-1, 5, 10)
19 |
20 | X1, X2 = np.meshgrid(xx, yy)
21 | Z = np.empty(X1.shape)
22 | for (i, j), val in np.ndenumerate(X1):
23 | x1 = val
24 | x2 = X2[i, j]
25 | p = clf.decision_function(np.array([[x1, x2]]))
26 | Z[i, j] = p[0]
27 | levels = [-1.0, 0.0, 1.0]
28 | linestyles = ['dashed', 'solid', 'dashed']
29 | colors = 'k'
30 |
31 | ax = plt.axes()
32 | ax.contour(X1, X2, Z, levels, colors=colors, linestyles=linestyles)
33 | ax.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)
34 |
35 | ax.axis('tight')
36 |
37 |
38 | if __name__ == '__main__':
39 | plot_sgd_separator()
40 | plt.show()
41 |
--------------------------------------------------------------------------------
/fig_code/svm_gui.py:
--------------------------------------------------------------------------------
1 | """
2 | ==========
3 | Libsvm GUI
4 | ==========
5 |
6 | A simple graphical frontend for Libsvm mainly intended for didactic
7 | purposes. You can create data points by point and click and visualize
8 | the decision region induced by different kernels and parameter settings.
9 |
10 | To create positive examples click the left mouse button; to create
11 | negative examples click the right button.
12 |
13 | If all examples are from the same class, it uses a one-class SVM.
14 |
15 | """
16 | from __future__ import division, print_function
17 |
18 | print(__doc__)
19 |
20 | # Author: Peter Prettenhoer