├── AdaBoost └── Adaboost.ipynb ├── DecisonTree ├── DT.ipynb ├── dt.py └── mytree.pdf ├── EM └── em.ipynb ├── KNearestNeighbors └── KNN.ipynb ├── LeastSquaresMethod ├── README.md └── least_sqaure_method.ipynb ├── LogisticRegression └── LR.ipynb ├── NaiveBayes └── GaussianNB.ipynb ├── Perceptron ├── Iris_perceptron.ipynb └── README.md ├── README.md └── SVM ├── .ipynb_checkpoints └── support-vector-machine-checkpoint.ipynb └── support-vector-machine.ipynb /AdaBoost/Adaboost.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "# Boost\n", 10 | "\n", 11 | "“装袋”(bagging)和“提升”(boost)是构建组合模型的两种最主要的方法,所谓的组合模型是由多个基本模型构成的模型,组合模型的预测效果往往比任意一个基本模型的效果都要好。\n", 12 | "\n", 13 | "- 装袋:每个基本模型由从总体样本中随机抽样得到的不同数据集进行训练得到,通过重抽样得到不同训练数据集的过程称为装袋。\n", 14 | "\n", 15 | "- 提升:每个基本模型训练时的数据集采用不同权重,针对上一个基本模型分类错误的样本增加权重,使得新的模型重点关注误分类样本\n", 16 | "\n", 17 | "### AdaBoost\n", 18 | "\n", 19 | "AdaBoost是AdaptiveBoost的缩写,表明该算法是具有适应性的提升算法。\n", 20 | "\n", 21 | "算法的步骤如下:\n", 22 | "\n", 23 | "1)给每个训练样本(x1,x2,….,xN)分配权重,初始权重$w_{1}$均为1/N。\n", 24 | "\n", 25 | "2)针对带有权值的样本进行训练,得到模型$G_m$(初始模型为G1)。\n", 26 | "\n", 27 | "3)计算模型$G_m$的误分率$e_m=\\sum_{i=1}^Nw_iI(y_i\\not= G_m(x_i))$\n", 28 | "\n", 29 | "4)计算模型$G_m$的系数$\\alpha_m=0.5\\log[(1-e_m)/e_m]$\n", 30 | "\n", 31 | "5)根据误分率e和当前权重向量$w_m$更新权重向量$w_{m+1}$。\n", 32 | "\n", 33 | "6)计算组合模型$f(x)=\\sum_{m=1}^M\\alpha_mG_m(x_i)$的误分率。\n", 34 | "\n", 35 | "7)当组合模型的误分率或迭代次数低于一定阈值,停止迭代;否则,回到步骤2)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 1, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "import numpy as np\n", 45 | "import pandas as pd\n", 46 | "from sklearn.datasets import load_iris\n", 47 | "from sklearn.model_selection import train_test_split\n", 48 | "import matplotlib.pyplot as plt\n", 49 | "%matplotlib inline" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 2, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "# data\n", 61 | "def create_data():\n", 62 | " iris = load_iris()\n", 63 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 64 | " df['label'] = iris.target\n", 65 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 66 | " data = np.array(df.iloc[:100, [0, 1, -1]])\n", 67 | " for i in range(len(data)):\n", 68 | " if data[i,-1] == 0:\n", 69 | " data[i,-1] = -1\n", 70 | " # print(data)\n", 71 | " return data[:,:2], data[:,-1]" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": { 78 | "collapsed": true 79 | }, 80 | "outputs": [], 81 | "source": [ 82 | "X, y = create_data()\n", 83 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 4, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "" 95 | ] 96 | }, 97 | "execution_count": 4, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | }, 101 | { 102 | "data": { 103 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZ9JREFUeJzt3X9sHOWdx/H394yv8bWAReMWsJNLCihqSSLSugTICXGg\nXkqaQoRQlAiKQhE5ELpS0aNqKtQfqBJISLRQdEQBdBTBBeVoGihHgjgoKkUklZMg5y4pKhxtY8MV\nE5TQHKYE93t/7Dqx12vvzu6O93me/bwky97Zyfj7zMA3m5nPPGPujoiIpOWvml2AiIg0npq7iEiC\n1NxFRBKk5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSdBx1a5oZm1AHzDo7stL3rsAeBx4\nvbhos7vfOtX2Zs6c6XPmzMlUrIhIq9u5c+fb7t5Vab2qmztwI7APOGGS918obfpTmTNnDn19fRl+\nvYiImNnvq1mvqtMyZtYDfAm4v56iRERkelR7zv1HwDeBv0yxznlm1m9mW83szHIrmNlaM+szs76h\noaGstYqISJUqNnczWw685e47p1htFzDb3RcCPwa2lFvJ3Te4e6+793Z1VTxlJCIiNarmnPsS4BIz\nWwbMAE4ws4fd/crRFdz93TE/P2Vm/2JmM9397caXLCJSnyNHjjAwMMD777/f7FImNWPGDHp6emhv\nb6/pz1ds7u6+DlgHR1Mx/zy2sReXnwz80d3dzM6m8C+CAzVVJCKSs4GBAY4//njmzJmDmTW7nAnc\nnQMHDjAwMMDcuXNr2kaWtMw4ZnZdsYj1wOXA9Wb2ITAMrHI9BUREAvX+++8H29gBzIyPf/zj1HNt\nMlNzd/fngeeLP68fs/we4J6aqxAJ2Jbdg9zx9Cu8cXCYUzs7uHnpPFYs6m52WVKnUBv7qHrrq/mT\nu0gr2LJ7kHWb9zB8ZASAwYPDrNu8B0ANXoKm6QdEpnDH068cbeyjho+McMfTrzSpIknFtm3bmDdv\nHqeffjq33357w7ev5i4yhTcODmdaLlKNkZERbrjhBrZu3crevXvZuHEje/fubejv0GkZkSmc2tnB\nYJlGfmpnRxOqkWZp9HWXX//615x++ul86lOfAmDVqlU8/vjjfOYzn2lUyfrkLjKVm5fOo6O9bdyy\njvY2bl46r0kVyXQbve4yeHAY59h1ly27B2ve5uDgILNmzTr6uqenh8HB2rdXjpq7yBRWLOrmtssW\n0N3ZgQHdnR3cdtkCXUxtIbFed9FpGZEKVizqVjNvYXlcd+nu7mb//v1HXw8MDNDd3dj/xvTJXURk\nCpNdX6nnusvnP/95fvvb3/L666/zwQcf8Oijj3LJJZfUvL1y1NxFRKaQx3WX4447jnvuuYelS5fy\n6U9/mpUrV3LmmWUn0639dzR0ayIiiRk9Jdfou5SXLVvGsmXLGlFiWWruIiIVxHjdRadlREQSpOYu\nIpIgNXcRkQSpuYuIJEjNXUQkQWrukowtuwdZcvtzzP3Wf7Dk9ufqmvtDJG9f/epX+cQnPsH8+fNz\n2b6auyQhj8mdRPK0Zs0atm3bltv21dwlCbFO7iSR6N8EP5wP3+ssfO/fVPcmzz//fE466aQGFFee\nbmKSJOihGpKb/k3w86/BkeJ/S4f2F14DLFzZvLoq0Cd3SUIekzuJAPDsrcca+6gjw4XlAVNzlyTo\noRqSm0MD2ZYHQqdlJAl5Te4kwok9hVMx5ZYHTM1dkhHj5E4SgYu+M/6cO0B7R2F5HVavXs3zzz/P\n22+/TU9PD9///ve55ppr6iz2GDV3qVujHx4sEpTRi6bP3lo4FXNiT6Gx13kxdePGjQ0obnJq7lKX\n0Xz5aAxxNF8OqMFLOhauDDoZU44uqEpdlC8XCZOau9RF+XKJlbs3u4Qp1VufmrvURflyidGMGTM4\ncOBAsA3e3Tlw4AAzZsyoeRs65y51uXnpvHHn3EH5cglfT08PAwMDDA0NNbuUSc2YMYOentrjlmru\nUhflyyVG7e3tzJ07t9ll5Krq5m5mbUAfMOjuy0veM+AuYBnwHrDG3Xc1slAJl/LlIuHJ8sn9RmAf\ncEKZ9y4Gzih+LQbuLX4XaSnK/EsoqrqgamY9wJeA+ydZ5VLgIS/YDnSa2SkNqlEkCppTXkJSbVrm\nR8A3gb9M8n43MHbyhYHiMpGWocy/hKRiczez5cBb7r6z3l9mZmvNrM/M+kK+Si1SC2X+JSTVfHJf\nAlxiZr8DHgUuNLOHS9YZBGaNed1TXDaOu29w91537+3q6qqxZJEwKfMvIanY3N19nbv3uPscYBXw\nnLtfWbLaE8BVVnAOcMjd32x8uSLh0pzyEpKac+5mdh2Au68HnqIQg3yVQhTy6oZUJxIRZf4lJNas\n2297e3u9r6+vKb9bRCRWZrbT3Xsrrac7VCVYt2zZw8Yd+xlxp82M1Ytn8YMVC5pdlkgU1NwlSLds\n2cPD2/9w9PWI+9HXavAilWlWSAnSxh1lnlk5xXIRGU/NXYI0Msm1oMmWi8h4au4SpDazTMtFZDw1\ndwnS6sWzMi0XkfF0QVWCNHrRVGkZkdoo5y4iEhHl3KUuV9z3Ei++9s7R10tOO4lHrj23iRU1j+Zo\nlxjpnLtMUNrYAV587R2uuO+lJlXUPJqjXWKl5i4TlDb2SstTpjnaJVZq7iJT0BztEis1d5EpaI52\niZWau0yw5LSTMi1PmeZol1ipucsEj1x77oRG3qppmRWLurntsgV0d3ZgQHdnB7ddtkBpGQmecu4i\nIhFRzl3qkle2O8t2lS8XqZ2au0wwmu0ejQCOZruBupprlu3mVYNIq9A5d5kgr2x3lu0qXy5SHzV3\nmSCvbHeW7SpfLlIfNXeZIK9sd5btKl8uUh81d5kgr2x3lu0qXy5SH11QlQlGL1g2OqmSZbt51SDS\nKpRzFxGJiHLuOYsxgx1jzSJSGzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7dTcaxBjBjvGmkWkdmru\nNYgxgx1jzSJSOzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7Srm3M1sBvBL4CMU/jJ4zN2/W7LOBcDj\nwOvFRZvd/daptqucu4hIdo3Muf8ZuNDdD5tZO/ArM9vq7ttL1nvB3ZfXUqxMj1u27GHjjv2MuNNm\nxurFs/jBigV1rxtKfj6UOkRCULG5e+Gj/eHiy/biV3Nua5Wa3bJlDw9v/8PR1yPuR1+XNu0s64aS\nnw+lDpFQVHVB1czazOxl4C3gGXffUWa188ys38y2mtmZDa1S6rZxx/6ql2dZN5T8fCh1iISiqubu\n7iPufhbQA5xtZvNLVtkFzHb3hcCPgS3ltmNma82sz8z6hoaG6qlbMhqZ5NpKueVZ1g0lPx9KHSKh\nyBSFdPeDwC+AL5Ysf9fdDxd/fgpoN7OZZf78Bnfvdfferq6uOsqWrNrMql6eZd1Q8vOh1CESiorN\n3cy6zKyz+HMH8AXgNyXrnGxW+D/fzM4ubvdA48uVWq1ePKvq5VnWDSU/H0odIqGoJi1zCvATM2uj\n0LQ3ufuTZnYdgLuvBy4HrjezD4FhYJU3ay5hKWv0Qmg1CZgs64aSnw+lDpFQaD53EZGIaD73nOWV\nqc6SL89z21nGF+O+iE7/Jnj2Vjg0ACf2wEXfgYUrm12VBEzNvQZ5Zaqz5Mvz3HaW8cW4L6LTvwl+\n/jU4Ukz+HNpfeA1q8DIpTRxWg7wy1Vny5XluO8v4YtwX0Xn21mONfdSR4cJykUmoudcgr0x1lnx5\nntvOMr4Y90V0Dg1kWy6CmntN8spUZ8mX57ntLOOLcV9E58SebMtFUHOvSV6Z6iz58jy3nWV8Me6L\n6Fz0HWgv+cuyvaOwXGQSuqBag7wy1Vny5XluO8v4YtwX0Rm9aKq0jGSgnLuISESUc5cJQsiuS+SU\nt4+GmnuLCCG7LpFT3j4quqDaIkLIrkvklLePipp7iwghuy6RU94+KmruLSKE7LpETnn7qKi5t4gQ\nsusSOeXto6ILqi0ihOy6RE55+6go5y4iEhHl3Ivyymtn2W4o85Irux6Y1DPjqY8viybsi6Sbe155\n7SzbDWVecmXXA5N6Zjz18WXRpH2R9AXVvPLaWbYbyrzkyq4HJvXMeOrjy6JJ+yLp5p5XXjvLdkOZ\nl1zZ9cCknhlPfXxZNGlfJN3c88prZ9luKPOSK7semNQz46mPL4sm7Yukm3teee0s2w1lXnJl1wOT\nemY89fFl0aR9kfQF1bzy2lm2G8q85MquByb1zHjq48uiSftCOXcRkYgo556zEPLzV9z3Ei++9s7R\n10tOO4lHrj237hpEkvLkTbDzQfARsDb43BpYfmf92w08x5/0Ofe8jGbGBw8O4xzLjG/ZPTht2y1t\n7AAvvvYOV9z3Ul01iCTlyZug74FCY4fC974HCsvrMZpdP7Qf8GPZ9f5NdZfcKGruNQghP1/a2Cst\nF2lJOx/MtrxaEeT41dxrEEJ+XkSq4CPZllcrghy/mnsNQsjPi0gVrC3b8mpFkONXc69BCPn5Jaed\nVHYbky0XaUmfW5NtebUiyPGruddgxaJubrtsAd2dHRjQ3dnBbZctaEh+vtrtPnLtuRMaudIyIiWW\n3wm91xz7pG5thdf1pmUWroQv3w0nzgKs8P3LdweVllHOXUQkIg3LuZvZDOCXwEeK6z/m7t8tWceA\nu4BlwHvAGnffVUvhlWTNl8c2h3mWud9T3xe55oizZJ/zqiPP8QWewa5L1rGlvC+mUM1NTH8GLnT3\nw2bWDvzKzLa6+/Yx61wMnFH8WgzcW/zeUFnnJI9tDvMsc7+nvi9ynQN7NPs8ajT7DBMbfF515Dm+\nlOdSzzq2lPdFBRXPuXvB4eLL9uJX6bmcS4GHiutuBzrN7JTGlpo9Xx7bHOZZ5n5PfV/kmiPOkn3O\nq448xxdBBrtmWceW8r6ooKoLqmbWZmYvA28Bz7j7jpJVuoGxHWiguKx0O2vNrM/M+oaGhjIXmzUH\nHltuPMvc76nvi1xzxFmyz3nVkef4Ishg1yzr2FLeFxVU1dzdfcTdzwJ6gLPNbH4tv8zdN7h7r7v3\ndnV1Zf7zWXPgseXGs8z9nvq+yDVHnCX7nFcdeY4vggx2zbKOLeV9UUGmKKS7HwR+AXyx5K1BYOwE\n5T3FZQ2VNV8e2xzmWeZ+T31f5JojzpJ9zquOPMcXQQa7ZlnHlvK+qKBiczezLjPrLP7cAXwB+E3J\nak8AV1nBOcAhd3+z0cVmzZfnlUfPyw9WLODKc2Yf/aTeZsaV58wum5ZJfV/kmiPOkn3Oq448xxdB\nBrtmWceW8r6ooGLO3cwWAj8B2ij8ZbDJ3W81s+sA3H19MQp5D4VP9O8BV7v7lCF25dxFRLJrWM7d\n3fuBRWWWrx/zswM3ZC1SRETykfzDOqK7cUemR5YbW0K4CSbPG3diu0krhOMRgaSbe3Q37sj0yHJj\nSwg3weR5405sN2mFcDwikfTEYdHduCPTI8uNLSHcBJPnjTux3aQVwvGIRNLNPbobd2R6ZLmxJYSb\nYPK8cSe2m7RCOB6RSLq5R3fjjkyPLDe2hHATTJ437sR2k1YIxyMSSTf36G7ckemR5caWEG6CyfPG\nndhu0grheEQi6eYe3Y07Mj2y3NgSwk0wed64E9tNWiEcj0joYR0iIhFp2E1MIi0vy4M9QhFbzaFk\n10OpowHU3EWmkuXBHqGIreZQsuuh1NEgSZ9zF6lblgd7hCK2mkPJrodSR4OouYtMJcuDPUIRW82h\nZNdDqaNB1NxFppLlwR6hiK3mULLrodTRIGruIlPJ8mCPUMRWcyjZ9VDqaBA1d5GpZHmwRyhiqzmU\n7HoodTSIcu4iIhFRzl2mT4zZ4LxqzitfHuM+lqZSc5f6xJgNzqvmvPLlMe5jaTqdc5f6xJgNzqvm\nvPLlMe5jaTo1d6lPjNngvGrOK18e4z6WplNzl/rEmA3Oq+a88uUx7mNpOjV3qU+M2eC8as4rXx7j\nPpamU3OX+sSYDc6r5rzy5THuY2k65dxFRCJSbc5dn9wlHf2b4Ifz4Xudhe/9m6Z/u3nVIJKRcu6S\nhryy4Fm2qzy6BESf3CUNeWXBs2xXeXQJiJq7pCGvLHiW7SqPLgFRc5c05JUFz7Jd5dElIGrukoa8\nsuBZtqs8ugREzV3SkFcWPMt2lUeXgFTMuZvZLOAh4JOAAxvc/a6SdS4AHgdeLy7a7O5TXkVSzl1E\nJLtGzuf+IfANd99lZscDO83sGXffW7LeC+6+vJZiJUAxzh+epeYYxxcC7bdoVGzu7v4m8Gbx5z+Z\n2T6gGyht7pKKGPPayqPnT/stKpnOuZvZHGARsKPM2+eZWb+ZbTWzMxtQmzRLjHlt5dHzp/0Wlarv\nUDWzjwE/Bb7u7u+WvL0LmO3uh81sGbAFOKPMNtYCawFmz55dc9GSsxjz2sqj50/7LSpVfXI3s3YK\njf0Rd99c+r67v+vuh4s/PwW0m9nMMuttcPded+/t6uqqs3TJTYx5beXR86f9FpWKzd3MDHgA2Ofu\nZecuNbOTi+thZmcXt3ugkYXKNIoxr608ev6036JSzWmZJcBXgD1m9nJx2beB2QDuvh64HLjezD4E\nhoFV3qy5hKV+oxfHYkpFZKk5xvGFQPstKprPXUQkIo3MuUuolDke78mbYOeDhQdSW1vh8Xb1PgVJ\nJFJq7rFS5ni8J2+CvgeOvfaRY6/V4KUFaW6ZWClzPN7OB7MtF0mcmnuslDkez0eyLRdJnJp7rJQ5\nHs/asi0XSZyae6yUOR7vc2uyLRdJnJp7rDR3+HjL74Tea459Ure2wmtdTJUWpZy7iEhElHOvwZbd\ng9zx9Cu8cXCYUzs7uHnpPFYs6m52WY2Tei4+9fGFQPs4GmruRVt2D7Ju8x6GjxTSFYMHh1m3eQ9A\nGg0+9Vx86uMLgfZxVHTOveiOp1852thHDR8Z4Y6nX2lSRQ2Wei4+9fGFQPs4KmruRW8cHM60PDqp\n5+JTH18ItI+jouZedGpnR6bl0Uk9F5/6+EKgfRwVNfeim5fOo6N9/A0vHe1t3Lx0XpMqarDUc/Gp\njy8E2sdR0QXVotGLpsmmZVKfizv18YVA+zgqyrmLiESk2py7TsuIxKB/E/xwPnyvs/C9f1Mc25am\n0WkZkdDlmS9Xdj1Z+uQuEro88+XKridLzV0kdHnmy5VdT5aau0jo8syXK7ueLDV3kdDlmS9Xdj1Z\nau4ioctz7n49FyBZyrmLiEREOXcRkRam5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSZCa\nu4hIgio2dzObZWa/MLO9ZvbfZnZjmXXMzO42s1fNrN/MPptPuVIXzdst0jKqmc/9Q+Ab7r7LzI4H\ndprZM+6+d8w6FwNnFL8WA/cWv0soNG+3SEup+Mnd3d90913Fn/8E7ANKHyx6KfCQF2wHOs3slIZX\nK7XTvN0iLSXTOXczmwMsAnaUvNUN7B/zeoCJfwFgZmvNrM/M+oaGhrJVKvXRvN0iLaXq5m5mHwN+\nCnzd3d+t5Ze5+wZ373X33q6urlo2IbXSvN0iLaWq5m5m7RQa+yPuvrnMKoPArDGve4rLJBSat1uk\npVSTljHgAWCfu985yWpPAFcVUzPnAIfc/c0G1in10rzdIi2lmrTMEuArwB4ze7m47NvAbAB3Xw88\nBSwDXgXeA65ufKlSt4Ur1cxFWkTF5u7uvwKswjoO3NCookREpD66Q1VEJEFq7iIiCVJzFxFJkJq7\niEiC1NxFRBKk5i4ikiA1dxGRBFkhot6EX2w2BPy+Kb+8spnA280uIkcaX7xSHhtofNX4W3evODlX\n05p7yMysz917m11HXjS+eKU8NtD4GkmnZUREEqTmLiKSIDX38jY0u4CcaXzxSnlsoPE1jM65i4gk\nSJ/cRUQS1NLN3czazGy3mT1Z5r0LzOyQmb1c/IrqkUVm9jsz21Osva/M+2Zmd5vZq2bWb2afbUad\ntapifLEfv04ze8zMfmNm+8zs3JL3Yz9+lcYX7fEzs3lj6n7ZzN41s6+XrJP78avmYR0puxHYB5ww\nyfsvuPvyaayn0f7e3SfL1F4MnFH8WgzcW/wek6nGB3Efv7uAbe5+uZn9NfA3Je/HfvwqjQ8iPX7u\n/gpwFhQ+QFJ45OjPSlbL/fi17Cd3M+sBvgTc3+xamuRS4CEv2A50mtkpzS5KwMxOBM6n8HhL3P0D\ndz9Yslq0x6/K8aXiIuA1dy+9YTP349eyzR34EfBN4C9TrHNe8Z9MW83szGmqq1Ec+E8z22lma8u8\n3w3sH/N6oLgsFpXGB/Eev7nAEPCvxdOG95vZR0vWifn4VTM+iPf4jbUK2Fhmee7HryWbu5ktB95y\n951TrLYLmO3uC4EfA1umpbjG+Tt3P4vCP/9uMLPzm11Qg1UaX8zH7zjgs8C97r4I+D/gW80tqaGq\nGV/Mxw+A4ummS4B/b8bvb8nmTuGh35eY2e+AR4ELzezhsSu4+7vufrj481NAu5nNnPZKa+Tug8Xv\nb1E433d2ySqDwKwxr3uKy6JQaXyRH78BYMDddxRfP0ahGY4V8/GrOL7Ij9+oi4Fd7v7HMu/lfvxa\nsrm7+zp373H3ORT+2fScu185dh0zO9nMrPjz2RT21YFpL7YGZvZRMzt+9GfgH4D/KlntCeCq4lX7\nc4BD7v7mNJdak2rGF/Pxc/f/Bfab2bzioouAvSWrRXv8qhlfzMdvjNWUPyUD03D8Wj0tM46ZXQfg\n7uuBy4HrzexDYBhY5fHc8fVJ4GfF/zeOA/7N3beVjO8pYBnwKvAecHWTaq1FNeOL+fgB/BPwSPGf\n9v8DXJ3Q8YPK44v6+BU/dHwB+Mcxy6b1+OkOVRGRBLXkaRkRkdSpuYuIJEjNXUQkQWruIiIJUnMX\nEUmQmruISILU3EVEEqTmLiKSoP8H2fNC9uxjMHwAAAAASUVORK5CYII=\n", 104 | "text/plain": [ 105 | "" 106 | ] 107 | }, 108 | "metadata": {}, 109 | "output_type": "display_data" 110 | } 111 | ], 112 | "source": [ 113 | "plt.scatter(X[:50,0],X[:50,1], label='0')\n", 114 | "plt.scatter(X[50:,0],X[50:,1], label='1')\n", 115 | "plt.legend()" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "----\n", 123 | "\n", 124 | "### AdaBoost in Python" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 59, 130 | "metadata": { 131 | "collapsed": true 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "class AdaBoost:\n", 136 | " def __init__(self, n_estimators=50, learning_rate=1.0):\n", 137 | " self.clf_num = n_estimators\n", 138 | " self.learning_rate = learning_rate\n", 139 | " \n", 140 | " def init_args(self, datasets, labels):\n", 141 | " \n", 142 | " self.X = datasets\n", 143 | " self.Y = labels\n", 144 | " self.M, self.N = datasets.shape\n", 145 | " \n", 146 | " # 弱分类器数目和集合\n", 147 | " self.clf_sets = []\n", 148 | " \n", 149 | " # 初始化weights\n", 150 | " self.weights = [1.0/self.M]*self.M\n", 151 | " \n", 152 | " # G(x)系数 alpha\n", 153 | " self.alpha = []\n", 154 | " \n", 155 | " def _G(self, features, labels, weights):\n", 156 | " m = len(features)\n", 157 | " error = 100000.0 # 无穷大\n", 158 | " best_v = 0.0\n", 159 | " # 单维features\n", 160 | " features_min = min(features)\n", 161 | " features_max = max(features)\n", 162 | " n_step = (features_max - features_min + self.learning_rate) // self.learning_rate\n", 163 | " # print('n_step:{}'.format(n_step))\n", 164 | " direct, compare_array = None, None\n", 165 | " for i in range(1, int(n_step)):\n", 166 | " v = features_min + self.learning_rate * i\n", 167 | " \n", 168 | " if v not in features:\n", 169 | " # 误分类计算\n", 170 | " compare_array_positive = np.array([1 if features[k] > v else -1 for k in range(m)])\n", 171 | " weight_error_positive = sum([weights[k] for k in range(m) if compare_array_positive[k] != labels[k]])\n", 172 | " \n", 173 | " compare_array_nagetive = np.array([-1 if features[k] > v else 1 for k in range(m)])\n", 174 | " weight_error_nagetive = sum([weights[k] for k in range(m) if compare_array_nagetive[k] != labels[k]])\n", 175 | "\n", 176 | " if weight_error_positive < weight_error_nagetive:\n", 177 | " weight_error = weight_error_positive\n", 178 | " _compare_array = compare_array_positive\n", 179 | " direct = 'positive'\n", 180 | " else:\n", 181 | " weight_error = weight_error_nagetive\n", 182 | " _compare_array = compare_array_nagetive\n", 183 | " direct = 'nagetive'\n", 184 | " \n", 185 | " # print('v:{} error:{}'.format(v, weight_error))\n", 186 | " if weight_error < error:\n", 187 | " error = weight_error\n", 188 | " compare_array = _compare_array\n", 189 | " best_v = v\n", 190 | " return best_v, direct, error, compare_array\n", 191 | " \n", 192 | " # 计算alpha\n", 193 | " def _alpha(self, error):\n", 194 | " return 0.5 * np.log((1-error)/error)\n", 195 | " \n", 196 | " # 规范化因子\n", 197 | " def _Z(self, weights, a, clf):\n", 198 | " return sum([weights[i]*np.exp(-1*a*self.Y[i]*clf[i]) for i in range(self.M)])\n", 199 | " \n", 200 | " # 权值更新\n", 201 | " def _w(self, a, clf, Z):\n", 202 | " for i in range(self.M):\n", 203 | " self.weights[i] = self.weights[i]*np.exp(-1*a*self.Y[i]*clf[i])/ Z\n", 204 | " \n", 205 | " # G(x)的线性组合\n", 206 | " def _f(self, alpha, clf_sets):\n", 207 | " pass\n", 208 | " \n", 209 | " def G(self, x, v, direct):\n", 210 | " if direct == 'positive':\n", 211 | " return 1 if x > v else -1 \n", 212 | " else:\n", 213 | " return -1 if x > v else 1 \n", 214 | " \n", 215 | " def fit(self, X, y):\n", 216 | " self.init_args(X, y)\n", 217 | " \n", 218 | " for epoch in range(self.clf_num):\n", 219 | " best_clf_error, best_v, clf_result = 100000, None, None\n", 220 | " # 根据特征维度, 选择误差最小的\n", 221 | " for j in range(self.N):\n", 222 | " features = self.X[:, j]\n", 223 | " # 分类阈值,分类误差,分类结果\n", 224 | " v, direct, error, compare_array = self._G(features, self.Y, self.weights)\n", 225 | " \n", 226 | " if error < best_clf_error:\n", 227 | " best_clf_error = error\n", 228 | " best_v = v\n", 229 | " final_direct = direct\n", 230 | " clf_result = compare_array\n", 231 | " axis = j\n", 232 | " \n", 233 | " # print('epoch:{}/{} feature:{} error:{} v:{}'.format(epoch, self.clf_num, j, error, best_v))\n", 234 | " if best_clf_error == 0:\n", 235 | " break\n", 236 | " \n", 237 | " # 计算G(x)系数a\n", 238 | " a = self._alpha(best_clf_error)\n", 239 | " self.alpha.append(a)\n", 240 | " # 记录分类器\n", 241 | " self.clf_sets.append((axis, best_v, final_direct))\n", 242 | " # 规范化因子\n", 243 | " Z = self._Z(self.weights, a, clf_result)\n", 244 | " # 权值更新\n", 245 | " self._w(a, clf_result, Z)\n", 246 | " \n", 247 | "# print('classifier:{}/{} error:{:.3f} v:{} direct:{} a:{:.5f}'.format(epoch+1, self.clf_num, error, best_v, final_direct, a))\n", 248 | "# print('weight:{}'.format(self.weights))\n", 249 | "# print('\\n')\n", 250 | " \n", 251 | " def predict(self, feature):\n", 252 | " result = 0.0\n", 253 | " for i in range(len(self.clf_sets)):\n", 254 | " axis, clf_v, direct = self.clf_sets[i]\n", 255 | " f_input = feature[axis]\n", 256 | " result += self.alpha[i] * self.G(f_input, clf_v, direct)\n", 257 | " # sign\n", 258 | " return 1 if result > 0 else -1\n", 259 | " \n", 260 | " def score(self, X_test, y_test):\n", 261 | " right_count = 0\n", 262 | " for i in range(len(X_test)):\n", 263 | " feature = X_test[i]\n", 264 | " if self.predict(feature) == y_test[i]:\n", 265 | " right_count += 1\n", 266 | " \n", 267 | " return right_count / len(X_test)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### 例8.1" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 22, 280 | "metadata": { 281 | "collapsed": true 282 | }, 283 | "outputs": [], 284 | "source": [ 285 | "X = np.arange(10).reshape(10, 1)\n", 286 | "y = np.array([1, 1, 1, -1, -1, -1, 1, 1, 1, -1])" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 23, 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "name": "stdout", 296 | "output_type": "stream", 297 | "text": [ 298 | "classifier:1/3 error:0.300 v:2.5 direct:nagetive a:0.42365\n", 299 | "weight:[0.071428571428571425, 0.071428571428571425, 0.071428571428571425, 0.071428571428571425, 0.071428571428571425, 0.071428571428571425, 0.16666666666666663, 0.16666666666666663, 0.16666666666666663, 0.071428571428571425]\n", 300 | "\n", 301 | "\n", 302 | "classifier:2/3 error:0.214 v:8.5 direct:nagetive a:0.64964\n", 303 | "weight:[0.045454545454545463, 0.045454545454545463, 0.045454545454545463, 0.16666666666666669, 0.16666666666666669, 0.16666666666666669, 0.10606060606060606, 0.10606060606060606, 0.10606060606060606, 0.045454545454545463]\n", 304 | "\n", 305 | "\n", 306 | "classifier:3/3 error:0.182 v:5.5 direct:nagetive a:0.75204\n", 307 | "weight:[0.12499999999999996, 0.12499999999999996, 0.12499999999999996, 0.10185185185185185, 0.10185185185185185, 0.10185185185185185, 0.064814814814814797, 0.064814814814814797, 0.064814814814814797, 0.12499999999999996]\n", 308 | "\n", 309 | "\n" 310 | ] 311 | } 312 | ], 313 | "source": [ 314 | "clf = AdaBoost(n_estimators=3, learning_rate=0.5)\n", 315 | "clf.fit(X, y)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 50, 321 | "metadata": { 322 | "collapsed": true 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "X, y = create_data()\n", 327 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": 51, 333 | "metadata": {}, 334 | "outputs": [ 335 | { 336 | "data": { 337 | "text/plain": [ 338 | "0.8484848484848485" 339 | ] 340 | }, 341 | "execution_count": 51, 342 | "metadata": {}, 343 | "output_type": "execute_result" 344 | } 345 | ], 346 | "source": [ 347 | "clf = AdaBoost(n_estimators=10, learning_rate=0.2)\n", 348 | "clf.fit(X_train, y_train)\n", 349 | "clf.score(X_test, y_test)" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 60, 355 | "metadata": {}, 356 | "outputs": [ 357 | { 358 | "name": "stdout", 359 | "output_type": "stream", 360 | "text": [ 361 | "average score:63.061%\n" 362 | ] 363 | } 364 | ], 365 | "source": [ 366 | "# 100次结果\n", 367 | "result = []\n", 368 | "for i in range(1, 101):\n", 369 | " X, y = create_data()\n", 370 | " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\n", 371 | " clf = AdaBoost(n_estimators=100, learning_rate=0.2)\n", 372 | " clf.fit(X_train, y_train)\n", 373 | " r = clf.score(X_test, y_test)\n", 374 | " # print('{}/100 score:{}'.format(i, r))\n", 375 | " result.append(r)\n", 376 | "\n", 377 | "print('average score:{:.3f}%'.format(sum(result)))" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "-----\n", 385 | "# sklearn.ensemble.AdaBoostClassifier\n", 386 | "\n", 387 | "- algorithm:这个参数只有AdaBoostClassifier有。主要原因是scikit-learn实现了两种Adaboost分类算法,SAMME和SAMME.R。两者的主要区别是弱学习器权重的度量,SAMME使用了和我们的原理篇里二元分类Adaboost算法的扩展,即用对样本集分类效果作为弱学习器权重,而SAMME.R使用了对样本集分类的预测概率大小来作为弱学习器权重。由于SAMME.R使用了概率度量的连续值,迭代一般比SAMME快,因此AdaBoostClassifier的默认算法algorithm的值也是SAMME.R。我们一般使用默认的SAMME.R就够了,但是要注意的是使用了SAMME.R, 则弱分类学习器参数base_estimator必须限制使用支持概率预测的分类器。SAMME算法则没有这个限制。\n", 388 | "\n", 389 | "- n_estimators: AdaBoostClassifier和AdaBoostRegressor都有,就是我们的弱学习器的最大迭代次数,或者说最大的弱学习器的个数。一般来说n_estimators太小,容易欠拟合,n_estimators太大,又容易过拟合,一般选择一个适中的数值。默认是50。在实际调参的过程中,我们常常将n_estimators和下面介绍的参数learning_rate一起考虑。\n", 390 | "\n", 391 | "- learning_rate: AdaBoostClassifier和AdaBoostRegressor都有,即每个弱学习器的权重缩减系数ν\n", 392 | "\n", 393 | "- base_estimator:AdaBoostClassifier和AdaBoostRegressor都有,即我们的弱分类学习器或者弱回归学习器。理论上可以选择任何一个分类或者回归学习器,不过需要支持样本权重。我们常用的一般是CART决策树或者神经网络MLP。" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 57, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "data": { 403 | "text/plain": [ 404 | "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,\n", 405 | " learning_rate=0.5, n_estimators=100, random_state=None)" 406 | ] 407 | }, 408 | "execution_count": 57, 409 | "metadata": {}, 410 | "output_type": "execute_result" 411 | } 412 | ], 413 | "source": [ 414 | "from sklearn.ensemble import AdaBoostClassifier\n", 415 | "clf = AdaBoostClassifier(n_estimators=100, learning_rate=0.5)\n", 416 | "clf.fit(X_train, y_train)" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 58, 422 | "metadata": {}, 423 | "outputs": [ 424 | { 425 | "data": { 426 | "text/plain": [ 427 | "0.90909090909090906" 428 | ] 429 | }, 430 | "execution_count": 58, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "clf.score(X_test, y_test)" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": { 443 | "collapsed": true 444 | }, 445 | "outputs": [], 446 | "source": [] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": null, 451 | "metadata": { 452 | "collapsed": true 453 | }, 454 | "outputs": [], 455 | "source": [] 456 | } 457 | ], 458 | "metadata": { 459 | "kernelspec": { 460 | "display_name": "Python 3", 461 | "language": "python", 462 | "name": "python3" 463 | }, 464 | "language_info": { 465 | "codemirror_mode": { 466 | "name": "ipython", 467 | "version": 3 468 | }, 469 | "file_extension": ".py", 470 | "mimetype": "text/x-python", 471 | "name": "python", 472 | "nbconvert_exporter": "python", 473 | "pygments_lexer": "ipython3", 474 | "version": "3.6.1" 475 | } 476 | }, 477 | "nbformat": 4, 478 | "nbformat_minor": 2 479 | } 480 | -------------------------------------------------------------------------------- /DecisonTree/DT.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 决策树\n", 8 | "\n", 9 | "- ID3(基于信息增益)\n", 10 | "- C4.5(基于信息增益比)\n", 11 | "- CART(gini指数)" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "#### entropy:$H(x) = -\\sum_{i=1}^{n}p_i\\log{p_i}$\n", 19 | "\n", 20 | "#### conditional entropy: $H(X|Y)=\\sum{P(X|Y)}\\log{P(X|Y)}$\n", 21 | "\n", 22 | "#### information gain : $g(D, A)=H(D)-H(D|A)$\n", 23 | "\n", 24 | "#### information gain ratio: $g_R(D, A) = \\frac{g(D,A)}{H(A)}$\n", 25 | "\n", 26 | "#### gini index:$Gini(D)=\\sum_{k=1}^{K}p_k\\log{p_k}=1-\\sum_{k=1}^{K}p_k^2$" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np\n", 38 | "import pandas as pd\n", 39 | "import matplotlib.pyplot as plt\n", 40 | "%matplotlib inline\n", 41 | "\n", 42 | "from sklearn.datasets import load_iris\n", 43 | "from sklearn.model_selection import train_test_split\n", 44 | "\n", 45 | "from collections import Counter\n", 46 | "import math\n", 47 | "from math import log\n", 48 | "\n", 49 | "import pprint" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "书上题目5.1" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 2, 62 | "metadata": { 63 | "collapsed": true 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "# 书上题目5.1\n", 68 | "def create_data():\n", 69 | " datasets = [['青年', '否', '否', '一般', '否'],\n", 70 | " ['青年', '否', '否', '好', '否'],\n", 71 | " ['青年', '是', '否', '好', '是'],\n", 72 | " ['青年', '是', '是', '一般', '是'],\n", 73 | " ['青年', '否', '否', '一般', '否'],\n", 74 | " ['中年', '否', '否', '一般', '否'],\n", 75 | " ['中年', '否', '否', '好', '否'],\n", 76 | " ['中年', '是', '是', '好', '是'],\n", 77 | " ['中年', '否', '是', '非常好', '是'],\n", 78 | " ['中年', '否', '是', '非常好', '是'],\n", 79 | " ['老年', '否', '是', '非常好', '是'],\n", 80 | " ['老年', '否', '是', '好', '是'],\n", 81 | " ['老年', '是', '否', '好', '是'],\n", 82 | " ['老年', '是', '否', '非常好', '是'],\n", 83 | " ['老年', '否', '否', '一般', '否'],\n", 84 | " ]\n", 85 | " labels = [u'年龄', u'有工作', u'有自己的房子', u'信贷情况', u'类别']\n", 86 | " # 返回数据集和每个维度的名称\n", 87 | " return datasets, labels" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 3, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "datasets, labels = create_data()" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 4, 104 | "metadata": { 105 | "collapsed": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "train_data = pd.DataFrame(datasets, columns=labels)" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/html": [ 120 | "
\n", 121 | "\n", 134 | "\n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | "
年龄有工作有自己的房子信贷情况类别
0青年一般
1青年
2青年
3青年一般
4青年一般
5中年一般
6中年
7中年
8中年非常好
9中年非常好
10老年非常好
11老年
12老年
13老年非常好
14老年一般
\n", 268 | "
" 269 | ], 270 | "text/plain": [ 271 | " 年龄 有工作 有自己的房子 信贷情况 类别\n", 272 | "0 青年 否 否 一般 否\n", 273 | "1 青年 否 否 好 否\n", 274 | "2 青年 是 否 好 是\n", 275 | "3 青年 是 是 一般 是\n", 276 | "4 青年 否 否 一般 否\n", 277 | "5 中年 否 否 一般 否\n", 278 | "6 中年 否 否 好 否\n", 279 | "7 中年 是 是 好 是\n", 280 | "8 中年 否 是 非常好 是\n", 281 | "9 中年 否 是 非常好 是\n", 282 | "10 老年 否 是 非常好 是\n", 283 | "11 老年 否 是 好 是\n", 284 | "12 老年 是 否 好 是\n", 285 | "13 老年 是 否 非常好 是\n", 286 | "14 老年 否 否 一般 否" 287 | ] 288 | }, 289 | "execution_count": 5, 290 | "metadata": {}, 291 | "output_type": "execute_result" 292 | } 293 | ], 294 | "source": [ 295 | "train_data" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": { 302 | "collapsed": true 303 | }, 304 | "outputs": [], 305 | "source": [] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 6, 310 | "metadata": { 311 | "collapsed": true 312 | }, 313 | "outputs": [], 314 | "source": [ 315 | "# 熵\n", 316 | "def calc_ent(datasets):\n", 317 | " data_length = len(datasets)\n", 318 | " label_count = {}\n", 319 | " for i in range(data_length):\n", 320 | " label = datasets[i][-1]\n", 321 | " if label not in label_count:\n", 322 | " label_count[label] = 0\n", 323 | " label_count[label] += 1\n", 324 | " ent = -sum([(p/data_length)*log(p/data_length, 2) for p in label_count.values()])\n", 325 | " return ent\n", 326 | "\n", 327 | "# 经验条件熵\n", 328 | "def cond_ent(datasets, axis=0):\n", 329 | " data_length = len(datasets)\n", 330 | " feature_sets = {}\n", 331 | " for i in range(data_length):\n", 332 | " feature = datasets[i][axis]\n", 333 | " if feature not in feature_sets:\n", 334 | " feature_sets[feature] = []\n", 335 | " feature_sets[feature].append(datasets[i])\n", 336 | " cond_ent = sum([(len(p)/data_length)*calc_ent(p) for p in feature_sets.values()])\n", 337 | " return cond_ent\n", 338 | "\n", 339 | "# 信息增益\n", 340 | "def info_gain(ent, cond_ent):\n", 341 | " return ent - cond_ent\n", 342 | "\n", 343 | "def info_gain_train(datasets):\n", 344 | " count = len(datasets[0]) - 1\n", 345 | " ent = calc_ent(datasets)\n", 346 | " best_feature = []\n", 347 | " for c in range(count):\n", 348 | " c_info_gain = info_gain(ent, cond_ent(datasets, axis=c))\n", 349 | " best_feature.append((c, c_info_gain))\n", 350 | " print('特征({}) - info_gain - {:.3f}'.format(labels[c], c_info_gain))\n", 351 | " # 比较大小\n", 352 | " best_ = max(best_feature, key=lambda x: x[-1])\n", 353 | " return '特征({})的信息增益最大,选择为根节点特征'.format(labels[best_[0]])" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 7, 359 | "metadata": {}, 360 | "outputs": [ 361 | { 362 | "name": "stdout", 363 | "output_type": "stream", 364 | "text": [ 365 | "特征(年龄) - info_gain - 0.083\n", 366 | "特征(有工作) - info_gain - 0.324\n", 367 | "特征(有自己的房子) - info_gain - 0.420\n", 368 | "特征(信贷情况) - info_gain - 0.363\n" 369 | ] 370 | }, 371 | { 372 | "data": { 373 | "text/plain": [ 374 | "'特征(有自己的房子)的信息增益最大,选择为根节点特征'" 375 | ] 376 | }, 377 | "execution_count": 7, 378 | "metadata": {}, 379 | "output_type": "execute_result" 380 | } 381 | ], 382 | "source": [ 383 | "info_gain_train(np.array(datasets))" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": { 389 | "collapsed": true 390 | }, 391 | "source": [ 392 | "---\n", 393 | "\n", 394 | "利用ID3算法生成决策树,例5.3" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 8, 400 | "metadata": { 401 | "collapsed": true 402 | }, 403 | "outputs": [], 404 | "source": [ 405 | "# 定义节点类 二叉树\n", 406 | "class Node:\n", 407 | " def __init__(self, root=True, label=None, feature_name=None, feature=None):\n", 408 | " self.root = root\n", 409 | " self.label = label\n", 410 | " self.feature_name = feature_name\n", 411 | " self.feature = feature\n", 412 | " self.tree = {}\n", 413 | " self.result = {'label:': self.label, 'feature': self.feature, 'tree': self.tree}\n", 414 | "\n", 415 | " def __repr__(self):\n", 416 | " return '{}'.format(self.result)\n", 417 | "\n", 418 | " def add_node(self, val, node):\n", 419 | " self.tree[val] = node\n", 420 | "\n", 421 | " def predict(self, features):\n", 422 | " if self.root is True:\n", 423 | " return self.label\n", 424 | " return self.tree[features[self.feature]].predict(features)\n", 425 | " \n", 426 | "class DTree:\n", 427 | " def __init__(self, epsilon=0.1):\n", 428 | " self.epsilon = epsilon\n", 429 | " self._tree = {}\n", 430 | "\n", 431 | " # 熵\n", 432 | " @staticmethod\n", 433 | " def calc_ent(datasets):\n", 434 | " data_length = len(datasets)\n", 435 | " label_count = {}\n", 436 | " for i in range(data_length):\n", 437 | " label = datasets[i][-1]\n", 438 | " if label not in label_count:\n", 439 | " label_count[label] = 0\n", 440 | " label_count[label] += 1\n", 441 | " ent = -sum([(p/data_length)*log(p/data_length, 2) for p in label_count.values()])\n", 442 | " return ent\n", 443 | "\n", 444 | " # 经验条件熵\n", 445 | " def cond_ent(self, datasets, axis=0):\n", 446 | " data_length = len(datasets)\n", 447 | " feature_sets = {}\n", 448 | " for i in range(data_length):\n", 449 | " feature = datasets[i][axis]\n", 450 | " if feature not in feature_sets:\n", 451 | " feature_sets[feature] = []\n", 452 | " feature_sets[feature].append(datasets[i])\n", 453 | " cond_ent = sum([(len(p)/data_length)*self.calc_ent(p) for p in feature_sets.values()])\n", 454 | " return cond_ent\n", 455 | "\n", 456 | " # 信息增益\n", 457 | " @staticmethod\n", 458 | " def info_gain(ent, cond_ent):\n", 459 | " return ent - cond_ent\n", 460 | "\n", 461 | " def info_gain_train(self, datasets):\n", 462 | " count = len(datasets[0]) - 1\n", 463 | " ent = self.calc_ent(datasets)\n", 464 | " best_feature = []\n", 465 | " for c in range(count):\n", 466 | " c_info_gain = self.info_gain(ent, self.cond_ent(datasets, axis=c))\n", 467 | " best_feature.append((c, c_info_gain))\n", 468 | " # 比较大小\n", 469 | " best_ = max(best_feature, key=lambda x: x[-1])\n", 470 | " return best_\n", 471 | "\n", 472 | " def train(self, train_data):\n", 473 | " \"\"\"\n", 474 | " input:数据集D(DataFrame格式),特征集A,阈值eta\n", 475 | " output:决策树T\n", 476 | " \"\"\"\n", 477 | " _, y_train, features = train_data.iloc[:, :-1], train_data.iloc[:, -1], train_data.columns[:-1]\n", 478 | " # 1,若D中实例属于同一类Ck,则T为单节点树,并将类Ck作为结点的类标记,返回T\n", 479 | " if len(y_train.value_counts()) == 1:\n", 480 | " return Node(root=True,\n", 481 | " label=y_train.iloc[0])\n", 482 | "\n", 483 | " # 2, 若A为空,则T为单节点树,将D中实例树最大的类Ck作为该节点的类标记,返回T\n", 484 | " if len(features) == 0:\n", 485 | " return Node(root=True, label=y_train.value_counts().sort_values(ascending=False).index[0])\n", 486 | "\n", 487 | " # 3,计算最大信息增益 同5.1,Ag为信息增益最大的特征\n", 488 | " max_feature, max_info_gain = self.info_gain_train(np.array(train_data))\n", 489 | " max_feature_name = features[max_feature]\n", 490 | "\n", 491 | " # 4,Ag的信息增益小于阈值eta,则置T为单节点树,并将D中是实例数最大的类Ck作为该节点的类标记,返回T\n", 492 | " if max_info_gain < self.epsilon:\n", 493 | " return Node(root=True, label=y_train.value_counts().sort_values(ascending=False).index[0])\n", 494 | "\n", 495 | " # 5,构建Ag子集\n", 496 | " node_tree = Node(root=False, feature_name=max_feature_name, feature=max_feature)\n", 497 | "\n", 498 | " feature_list = train_data[max_feature_name].value_counts().index\n", 499 | " for f in feature_list:\n", 500 | " sub_train_df = train_data.loc[train_data[max_feature_name] == f].drop([max_feature_name], axis=1)\n", 501 | "\n", 502 | " # 6, 递归生成树\n", 503 | " sub_tree = self.train(sub_train_df)\n", 504 | " node_tree.add_node(f, sub_tree)\n", 505 | "\n", 506 | " # pprint.pprint(node_tree.tree)\n", 507 | " return node_tree\n", 508 | "\n", 509 | " def fit(self, train_data):\n", 510 | " self._tree = self.train(train_data)\n", 511 | " return self._tree\n", 512 | "\n", 513 | " def predict(self, X_test):\n", 514 | " return self._tree.predict(X_test)" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 9, 520 | "metadata": { 521 | "collapsed": true 522 | }, 523 | "outputs": [], 524 | "source": [ 525 | "datasets, labels = create_data()\n", 526 | "data_df = pd.DataFrame(datasets, columns=labels)\n", 527 | "dt = DTree()\n", 528 | "tree = dt.fit(data_df)" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": 10, 534 | "metadata": { 535 | "scrolled": true 536 | }, 537 | "outputs": [ 538 | { 539 | "data": { 540 | "text/plain": [ 541 | "{'label:': None, 'feature': 2, 'tree': {'否': {'label:': None, 'feature': 1, 'tree': {'否': {'label:': '否', 'feature': None, 'tree': {}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}, '是': {'label:': '是', 'feature': None, 'tree': {}}}}" 542 | ] 543 | }, 544 | "execution_count": 10, 545 | "metadata": {}, 546 | "output_type": "execute_result" 547 | } 548 | ], 549 | "source": [ 550 | "tree" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": 11, 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "data": { 560 | "text/plain": [ 561 | "'否'" 562 | ] 563 | }, 564 | "execution_count": 11, 565 | "metadata": {}, 566 | "output_type": "execute_result" 567 | } 568 | ], 569 | "source": [ 570 | "dt.predict(['老年', '否', '否', '一般'])" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "---\n", 578 | "\n", 579 | "## sklearn.tree.DecisionTreeClassifier\n", 580 | "\n", 581 | "### criterion : string, optional (default=”gini”)\n", 582 | "The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain." 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 12, 588 | "metadata": { 589 | "collapsed": true 590 | }, 591 | "outputs": [], 592 | "source": [ 593 | "# data\n", 594 | "def create_data():\n", 595 | " iris = load_iris()\n", 596 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 597 | " df['label'] = iris.target\n", 598 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 599 | " data = np.array(df.iloc[:100, [0, 1, -1]])\n", 600 | " # print(data)\n", 601 | " return data[:,:2], data[:,-1]\n", 602 | "\n", 603 | "X, y = create_data()\n", 604 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": 13, 610 | "metadata": { 611 | "collapsed": true 612 | }, 613 | "outputs": [], 614 | "source": [ 615 | "from sklearn.tree import DecisionTreeClassifier\n", 616 | "\n", 617 | "from sklearn.tree import export_graphviz\n", 618 | "import graphviz" 619 | ] 620 | }, 621 | { 622 | "cell_type": "code", 623 | "execution_count": 14, 624 | "metadata": {}, 625 | "outputs": [ 626 | { 627 | "data": { 628 | "text/plain": [ 629 | "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n", 630 | " max_features=None, max_leaf_nodes=None,\n", 631 | " min_impurity_split=1e-07, min_samples_leaf=1,\n", 632 | " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", 633 | " presort=False, random_state=None, splitter='best')" 634 | ] 635 | }, 636 | "execution_count": 14, 637 | "metadata": {}, 638 | "output_type": "execute_result" 639 | } 640 | ], 641 | "source": [ 642 | "clf = DecisionTreeClassifier()\n", 643 | "clf.fit(X_train, y_train,)" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 15, 649 | "metadata": {}, 650 | "outputs": [ 651 | { 652 | "data": { 653 | "text/plain": [ 654 | "0.93333333333333335" 655 | ] 656 | }, 657 | "execution_count": 15, 658 | "metadata": {}, 659 | "output_type": "execute_result" 660 | } 661 | ], 662 | "source": [ 663 | "clf.score(X_test, y_test)" 664 | ] 665 | }, 666 | { 667 | "cell_type": "code", 668 | "execution_count": 18, 669 | "metadata": { 670 | "collapsed": true 671 | }, 672 | "outputs": [], 673 | "source": [ 674 | "tree_pic = export_graphviz(clf, out_file=\"mytree.pdf\")\n", 675 | "with open('mytree.pdf') as f:\n", 676 | " dot_graph = f.read()" 677 | ] 678 | }, 679 | { 680 | "cell_type": "code", 681 | "execution_count": 19, 682 | "metadata": {}, 683 | "outputs": [ 684 | { 685 | "data": { 686 | "image/svg+xml": [ 687 | "\r\n", 688 | "\r\n", 690 | "\r\n", 692 | "\r\n", 693 | "\r\n", 695 | "\r\n", 696 | "Tree\r\n", 697 | "\r\n", 698 | "\r\n", 699 | "0\r\n", 700 | "\r\n", 701 | "X[0] <= 5.45\r\n", 702 | "gini = 0.4996\r\n", 703 | "samples = 70\r\n", 704 | "value = [34, 36]\r\n", 705 | "\r\n", 706 | "\r\n", 707 | "1\r\n", 708 | "\r\n", 709 | "X[1] <= 2.85\r\n", 710 | "gini = 0.2392\r\n", 711 | "samples = 36\r\n", 712 | "value = [31, 5]\r\n", 713 | "\r\n", 714 | "\r\n", 715 | "0->1\r\n", 716 | "\r\n", 717 | "\r\n", 718 | "True\r\n", 719 | "\r\n", 720 | "\r\n", 721 | "8\r\n", 722 | "\r\n", 723 | "X[1] <= 3.35\r\n", 724 | "gini = 0.1609\r\n", 725 | "samples = 34\r\n", 726 | "value = [3, 31]\r\n", 727 | "\r\n", 728 | "\r\n", 729 | "0->8\r\n", 730 | "\r\n", 731 | "\r\n", 732 | "False\r\n", 733 | "\r\n", 734 | "\r\n", 735 | "2\r\n", 736 | "\r\n", 737 | "gini = 0.0\r\n", 738 | "samples = 4\r\n", 739 | "value = [0, 4]\r\n", 740 | "\r\n", 741 | "\r\n", 742 | "1->2\r\n", 743 | "\r\n", 744 | "\r\n", 745 | "\r\n", 746 | "\r\n", 747 | "3\r\n", 748 | "\r\n", 749 | "X[0] <= 5.3\r\n", 750 | "gini = 0.0605\r\n", 751 | "samples = 32\r\n", 752 | "value = [31, 1]\r\n", 753 | "\r\n", 754 | "\r\n", 755 | "1->3\r\n", 756 | "\r\n", 757 | "\r\n", 758 | "\r\n", 759 | "\r\n", 760 | "4\r\n", 761 | "\r\n", 762 | "gini = 0.0\r\n", 763 | "samples = 27\r\n", 764 | "value = [27, 0]\r\n", 765 | "\r\n", 766 | "\r\n", 767 | "3->4\r\n", 768 | "\r\n", 769 | "\r\n", 770 | "\r\n", 771 | "\r\n", 772 | "5\r\n", 773 | "\r\n", 774 | "X[1] <= 3.2\r\n", 775 | "gini = 0.32\r\n", 776 | "samples = 5\r\n", 777 | "value = [4, 1]\r\n", 778 | "\r\n", 779 | "\r\n", 780 | "3->5\r\n", 781 | "\r\n", 782 | "\r\n", 783 | "\r\n", 784 | "\r\n", 785 | "6\r\n", 786 | "\r\n", 787 | "gini = 0.0\r\n", 788 | "samples = 1\r\n", 789 | "value = [0, 1]\r\n", 790 | "\r\n", 791 | "\r\n", 792 | "5->6\r\n", 793 | "\r\n", 794 | "\r\n", 795 | "\r\n", 796 | "\r\n", 797 | "7\r\n", 798 | "\r\n", 799 | "gini = 0.0\r\n", 800 | "samples = 4\r\n", 801 | "value = [4, 0]\r\n", 802 | "\r\n", 803 | "\r\n", 804 | "5->7\r\n", 805 | "\r\n", 806 | "\r\n", 807 | "\r\n", 808 | "\r\n", 809 | "9\r\n", 810 | "\r\n", 811 | "gini = 0.0\r\n", 812 | "samples = 31\r\n", 813 | "value = [0, 31]\r\n", 814 | "\r\n", 815 | "\r\n", 816 | "8->9\r\n", 817 | "\r\n", 818 | "\r\n", 819 | "\r\n", 820 | "\r\n", 821 | "10\r\n", 822 | "\r\n", 823 | "gini = 0.0\r\n", 824 | "samples = 3\r\n", 825 | "value = [3, 0]\r\n", 826 | "\r\n", 827 | "\r\n", 828 | "8->10\r\n", 829 | "\r\n", 830 | "\r\n", 831 | "\r\n", 832 | "\r\n", 833 | "\r\n" 834 | ], 835 | "text/plain": [ 836 | "" 837 | ] 838 | }, 839 | "execution_count": 19, 840 | "metadata": {}, 841 | "output_type": "execute_result" 842 | } 843 | ], 844 | "source": [ 845 | "graphviz.Source(dot_graph)" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "metadata": {}, 851 | "source": [ 852 | "----" 853 | ] 854 | }, 855 | { 856 | "cell_type": "code", 857 | "execution_count": null, 858 | "metadata": { 859 | "collapsed": true 860 | }, 861 | "outputs": [], 862 | "source": [] 863 | } 864 | ], 865 | "metadata": { 866 | "kernelspec": { 867 | "display_name": "Python 3", 868 | "language": "python", 869 | "name": "python3" 870 | }, 871 | "language_info": { 872 | "codemirror_mode": { 873 | "name": "ipython", 874 | "version": 3 875 | }, 876 | "file_extension": ".py", 877 | "mimetype": "text/x-python", 878 | "name": "python", 879 | "nbconvert_exporter": "python", 880 | "pygments_lexer": "ipython3", 881 | "version": "3.6.1" 882 | } 883 | }, 884 | "nbformat": 4, 885 | "nbformat_minor": 2 886 | } 887 | -------------------------------------------------------------------------------- /DecisonTree/dt.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2018/1/3 14:02 3 | # @Author : wangzy 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | from math import log 9 | 10 | 11 | # 定义节点类 二叉树 12 | class Node: 13 | def __init__(self, root=True, label=None, feature_name=None, feature=None): 14 | self.root = root 15 | self.label = label 16 | self.feature_name = feature_name 17 | self.feature = feature 18 | self.tree = {} 19 | self.result = {'label:': self.label, 'feature': self.feature, 'tree': self.tree} 20 | 21 | def __repr__(self): 22 | return '{}'.format(self.result) 23 | 24 | def add_node(self, val, node): 25 | self.tree[val] = node 26 | 27 | def predict(self, features): 28 | if self.root is True: 29 | return self.label 30 | return self.tree[features[self.feature]].predict(features) 31 | 32 | 33 | # 书上题目5.1 34 | def create_data(): 35 | datasets = [['青年', '否', '否', '一般', '否'], 36 | ['青年', '否', '否', '好', '否'], 37 | ['青年', '是', '否', '好', '是'], 38 | ['青年', '是', '是', '一般', '是'], 39 | ['青年', '否', '否', '一般', '否'], 40 | ['中年', '否', '否', '一般', '否'], 41 | ['中年', '否', '否', '好', '否'], 42 | ['中年', '是', '是', '好', '是'], 43 | ['中年', '否', '是', '非常好', '是'], 44 | ['中年', '否', '是', '非常好', '是'], 45 | ['老年', '否', '是', '非常好', '是'], 46 | ['老年', '否', '是', '好', '是'], 47 | ['老年', '是', '否', '好', '是'], 48 | ['老年', '是', '否', '非常好', '是'], 49 | ['老年', '否', '否', '一般', '否'], 50 | ] 51 | labels = [u'年龄', u'有工作', u'有自己的房子', u'信贷情况', u'类别'] 52 | # 返回数据集和每个维度的名称 53 | return datasets, labels 54 | 55 | 56 | class DTree: 57 | def __init__(self, epsilon=0.1): 58 | self.epsilon = epsilon 59 | self._tree = {} 60 | 61 | # 熵 62 | @staticmethod 63 | def calc_ent(datasets): 64 | data_length = len(datasets) 65 | label_count = {} 66 | for i in range(data_length): 67 | label = datasets[i][-1] 68 | if label not in label_count: 69 | label_count[label] = 0 70 | label_count[label] += 1 71 | ent = -sum([(p/data_length)*log(p/data_length, 2) for p in label_count.values()]) 72 | return ent 73 | 74 | # 经验条件熵 75 | def cond_ent(self, datasets, axis=0): 76 | data_length = len(datasets) 77 | feature_sets = {} 78 | for i in range(data_length): 79 | feature = datasets[i][axis] 80 | if feature not in feature_sets: 81 | feature_sets[feature] = [] 82 | feature_sets[feature].append(datasets[i]) 83 | cond_ent = sum([(len(p)/data_length)*self.calc_ent(p) for p in feature_sets.values()]) 84 | return cond_ent 85 | 86 | # 信息增益 87 | @staticmethod 88 | def info_gain(ent, cond_ent): 89 | return ent - cond_ent 90 | 91 | def info_gain_train(self, datasets): 92 | count = len(datasets[0]) - 1 93 | ent = self.calc_ent(datasets) 94 | best_feature = [] 95 | for c in range(count): 96 | c_info_gain = self.info_gain(ent, self.cond_ent(datasets, axis=c)) 97 | best_feature.append((c, c_info_gain)) 98 | # 比较大小 99 | best_ = max(best_feature, key=lambda x: x[-1]) 100 | return best_ 101 | 102 | def train(self, train_data): 103 | """ 104 | input:数据集D(DataFrame格式),特征集A,阈值eta 105 | output:决策树T 106 | """ 107 | _, y_train, features = train_data.iloc[:, :-1], train_data.iloc[:, -1], train_data.columns[:-1] 108 | # 1,若D中实例属于同一类Ck,则T为单节点树,并将类Ck作为结点的类标记,返回T 109 | if len(y_train.value_counts()) == 1: 110 | return Node(root=True, 111 | label=y_train.iloc[0]) 112 | 113 | # 2, 若A为空,则T为单节点树,将D中实例树最大的类Ck作为该节点的类标记,返回T 114 | if len(features) == 0: 115 | return Node(root=True, label=y_train.value_counts().sort_values(ascending=False).index[0]) 116 | 117 | # 3,计算最大信息增益 同5.1,Ag为信息增益最大的特征 118 | max_feature, max_info_gain = self.info_gain_train(np.array(train_data)) 119 | max_feature_name = features[max_feature] 120 | 121 | # 4,Ag的信息增益小于阈值eta,则置T为单节点树,并将D中是实例数最大的类Ck作为该节点的类标记,返回T 122 | if max_info_gain < self.epsilon: 123 | return Node(root=True, label=y_train.value_counts().sort_values(ascending=False).index[0]) 124 | 125 | # 5,构建Ag子集 126 | node_tree = Node(root=False, feature_name=max_feature_name, feature=max_feature) 127 | 128 | feature_list = train_data[max_feature_name].value_counts().index 129 | for f in feature_list: 130 | sub_train_df = train_data.loc[train_data[max_feature_name] == f].drop([max_feature_name], axis=1) 131 | 132 | # 6, 递归生成树 133 | sub_tree = self.train(sub_train_df) 134 | node_tree.add_node(f, sub_tree) 135 | 136 | # pprint.pprint(node_tree.tree) 137 | return node_tree 138 | 139 | def fit(self, train_data): 140 | self._tree = self.train(train_data) 141 | return self._tree 142 | 143 | def predict(self, X_test): 144 | return self._tree.predict(X_test) 145 | 146 | 147 | if __name__ == '__main__': 148 | datasets, labels = create_data() 149 | data_df = pd.DataFrame(datasets, columns=labels) 150 | dt = DTree() 151 | tree = dt.fit(data_df) 152 | print(dt.predict(['老年', '否', '否', '一般'])) 153 | 154 | -------------------------------------------------------------------------------- /DecisonTree/mytree.pdf: -------------------------------------------------------------------------------- 1 | digraph Tree { 2 | node [shape=box] ; 3 | 0 [label="X[0] <= 5.45\ngini = 0.4996\nsamples = 70\nvalue = [34, 36]"] ; 4 | 1 [label="X[1] <= 2.85\ngini = 0.2392\nsamples = 36\nvalue = [31, 5]"] ; 5 | 0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ; 6 | 2 [label="gini = 0.0\nsamples = 4\nvalue = [0, 4]"] ; 7 | 1 -> 2 ; 8 | 3 [label="X[0] <= 5.3\ngini = 0.0605\nsamples = 32\nvalue = [31, 1]"] ; 9 | 1 -> 3 ; 10 | 4 [label="gini = 0.0\nsamples = 27\nvalue = [27, 0]"] ; 11 | 3 -> 4 ; 12 | 5 [label="X[1] <= 3.2\ngini = 0.32\nsamples = 5\nvalue = [4, 1]"] ; 13 | 3 -> 5 ; 14 | 6 [label="gini = 0.0\nsamples = 1\nvalue = [0, 1]"] ; 15 | 5 -> 6 ; 16 | 7 [label="gini = 0.0\nsamples = 4\nvalue = [4, 0]"] ; 17 | 5 -> 7 ; 18 | 8 [label="X[1] <= 3.35\ngini = 0.1609\nsamples = 34\nvalue = [3, 31]"] ; 19 | 0 -> 8 [labeldistance=2.5, labelangle=-45, headlabel="False"] ; 20 | 9 [label="gini = 0.0\nsamples = 31\nvalue = [0, 31]"] ; 21 | 8 -> 9 ; 22 | 10 [label="gini = 0.0\nsamples = 3\nvalue = [3, 0]"] ; 23 | 8 -> 10 ; 24 | } -------------------------------------------------------------------------------- /EM/em.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# EM算法\n", 8 | "\n", 9 | "# Expectation Maximization algorithm\n", 10 | "\n", 11 | "### Maximum likehood function\n", 12 | "\n", 13 | "[likehood & maximum likehood](http://fangs.in/post/thinkstats/likelihood/)\n", 14 | "\n", 15 | "> 在统计学中,似然函数(likelihood function,通常简写为likelihood,似然)是一个非常重要的内容,在非正式场合似然和概率(Probability)几乎是一对同义词,但是在统计学中似然和概率却是两个不同的概念。概率是在特定环境下某件事情发生的可能性,也就是结果没有产生之前依据环境所对应的参数来预测某件事情发生的可能性,比如抛硬币,抛之前我们不知道最后是哪一面朝上,但是根据硬币的性质我们可以推测任何一面朝上的可能性均为50%,这个概率只有在抛硬币之前才是有意义的,抛完硬币后的结果便是确定的;而似然刚好相反,是在确定的结果下去推测产生这个结果的可能环境(参数),还是抛硬币的例子,假设我们随机抛掷一枚硬币1,000次,结果500次人头朝上,500次数字朝上(实际情况一般不会这么理想,这里只是举个例子),我们很容易判断这是一枚标准的硬币,两面朝上的概率均为50%,这个过程就是我们运用出现的结果来判断这个事情本身的性质(参数),也就是似然。" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "$$P(Y|\\theta) = \\prod[\\pi p^{y_i}(1-p)^{1-y_i}+(1-\\pi) q^{y_i}(1-q)^{1-y_i}]$$\n", 23 | "\n", 24 | "### E step:\n", 25 | "\n", 26 | "$$\\mu^{i+1}=\\frac{\\pi (p^i)^{y_i}(1-(p^i))^{1-y_i}}{\\pi (p^i)^{y_i}(1-(p^i))^{1-y_i}+(1-\\pi) (q^i)^{y_i}(1-(q^i))^{1-y_i}}$$" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np\n", 38 | "import math" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "pro_A, pro_B, por_C = 0.5, 0.5, 0.5\n", 50 | "\n", 51 | "def pmf(i, pro_A, pro_B, por_C):\n", 52 | " pro_1 = pro_A * math.pow(pro_B, data[i]) * math.pow((1-pro_B), 1-data[i])\n", 53 | " pro_2 = pro_A * math.pow(pro_C, data[i]) * math.pow((1-pro_C), 1-data[i])\n", 54 | " return pro_1 / (pro_1 + pro_2)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "### M step:\n", 62 | "\n", 63 | "$$\\pi^{i+1}=\\frac{1}{n}\\sum_{j=1}^n\\mu^{i+1}_j$$\n", 64 | "\n", 65 | "$$p^{i+1}=\\frac{\\sum_{j=1}^n\\mu^{i+1}_jy_i}{\\sum_{j=1}^n\\mu^{i+1}_j}$$\n", 66 | "\n", 67 | "$$q^{i+1}=\\frac{\\sum_{j=1}^n(1-\\mu^{i+1}_jy_i)}{\\sum_{j=1}^n(1-\\mu^{i+1}_j)}$$" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "class EM:\n", 77 | " def __init__(self, prob):\n", 78 | " self.pro_A, self.pro_B, self.pro_C = prob\n", 79 | " \n", 80 | " # e_step\n", 81 | " def pmf(self, i):\n", 82 | " pro_1 = self.pro_A * math.pow(self.pro_B, data[i]) * math.pow((1-self.pro_B), 1-data[i])\n", 83 | " pro_2 = (1 - self.pro_A) * math.pow(self.pro_C, data[i]) * math.pow((1-self.pro_C), 1-data[i])\n", 84 | " return pro_1 / (pro_1 + pro_2)\n", 85 | " \n", 86 | " # m_step\n", 87 | " def fit(self, data):\n", 88 | " count = len(data)\n", 89 | " print('init prob:{}, {}, {}'.format(self.pro_A, self.pro_B, self.pro_C))\n", 90 | " for d in range(count):\n", 91 | " _ = yield\n", 92 | " _pmf = [self.pmf(k) for k in range(count)]\n", 93 | " pro_A = 1/ count * sum(_pmf)\n", 94 | " pro_B = sum([_pmf[k]*data[k] for k in range(count)]) / sum([_pmf[k] for k in range(count)])\n", 95 | " pro_C = sum([(1-_pmf[k])*data[k] for k in range(count)]) / sum([(1-_pmf[k]) for k in range(count)])\n", 96 | " print('{}/{} pro_a:{:.3f}, pro_b:{:.3f}, pro_c:{:.3f}'.format(d+1, count, pro_A, pro_B, pro_C))\n", 97 | " self.pro_A = pro_A\n", 98 | " self.pro_B = pro_B\n", 99 | " self.pro_C = pro_C\n", 100 | " " 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "data=[1,1,0,1,0,0,1,0,1,1]" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "init prob:0.5, 0.5, 0.5\n" 122 | ] 123 | } 124 | ], 125 | "source": [ 126 | "em = EM(prob=[0.5, 0.5, 0.5])\n", 127 | "f = em.fit(data)\n", 128 | "next(f)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 6, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "1/10 pro_a:0.500, pro_b:0.600, pro_c:0.600\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "# 第一次迭代\n", 146 | "f.send(1)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 7, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "name": "stdout", 156 | "output_type": "stream", 157 | "text": [ 158 | "2/10 pro_a:0.500, pro_b:0.600, pro_c:0.600\n" 159 | ] 160 | } 161 | ], 162 | "source": [ 163 | "# 第二次\n", 164 | "f.send(2)" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 8, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "init prob:0.4, 0.6, 0.7\n" 177 | ] 178 | } 179 | ], 180 | "source": [ 181 | "em = EM(prob=[0.4, 0.6, 0.7])\n", 182 | "f2 = em.fit(data)\n", 183 | "next(f2)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 9, 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "1/10 pro_a:0.406, pro_b:0.537, pro_c:0.643\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "f2.send(1)" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 10, 206 | "metadata": {}, 207 | "outputs": [ 208 | { 209 | "name": "stdout", 210 | "output_type": "stream", 211 | "text": [ 212 | "2/10 pro_a:0.406, pro_b:0.537, pro_c:0.643\n" 213 | ] 214 | } 215 | ], 216 | "source": [ 217 | "f2.send(2)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": { 224 | "collapsed": true 225 | }, 226 | "outputs": [], 227 | "source": [] 228 | } 229 | ], 230 | "metadata": { 231 | "kernelspec": { 232 | "display_name": "Python 3", 233 | "language": "python", 234 | "name": "python3" 235 | }, 236 | "language_info": { 237 | "codemirror_mode": { 238 | "name": "ipython", 239 | "version": 3 240 | }, 241 | "file_extension": ".py", 242 | "mimetype": "text/x-python", 243 | "name": "python", 244 | "nbconvert_exporter": "python", 245 | "pygments_lexer": "ipython3", 246 | "version": "3.6.1" 247 | } 248 | }, 249 | "nbformat": 4, 250 | "nbformat_minor": 2 251 | } 252 | -------------------------------------------------------------------------------- /KNearestNeighbors/KNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## K近邻" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "#### 距离度量" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import math\n", 26 | "from itertools import combinations" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "- p = 1 曼哈顿距离\n", 34 | "- p = 2 欧氏距离\n", 35 | "- p = inf 闵式距离minkowski_distance " 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": { 42 | "collapsed": true 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "def L(x, y, p=2):\n", 47 | " # x1 = [1, 1], x2 = [5,1]\n", 48 | " if len(x) == len(y) and len(x) > 1:\n", 49 | " sum = 0\n", 50 | " for i in range(len(x)):\n", 51 | " sum += math.pow(abs(x[i] - y[i]), p)\n", 52 | " return math.pow(sum, 1/p)\n", 53 | " else:\n", 54 | " return 0" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 3, 60 | "metadata": { 61 | "collapsed": true 62 | }, 63 | "outputs": [], 64 | "source": [ 65 | "# 课本例3.1\n", 66 | "x1 = [1, 1]\n", 67 | "x2 = [5, 1]\n", 68 | "x3 = [4, 4]" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 4, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "(4.0, '1-[5, 1]')\n", 81 | "(4.0, '1-[5, 1]')\n", 82 | "(3.7797631496846193, '1-[4, 4]')\n", 83 | "(3.5676213450081633, '1-[4, 4]')\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "# x1, x2\n", 89 | "for i in range(1, 5):\n", 90 | " r = { '1-{}'.format(c):L(x1, c, p=i) for c in [x2, x3]}\n", 91 | " print(min(zip(r.values(), r.keys())))" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "python实现,遍历所有数据点,找出n个距离最近的点的分类情况,少数服从多数" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 5, 104 | "metadata": { 105 | "collapsed": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "import numpy as np\n", 110 | "import pandas as pd\n", 111 | "import matplotlib.pyplot as plt\n", 112 | "%matplotlib inline\n", 113 | "\n", 114 | "from sklearn.datasets import load_iris\n", 115 | "from sklearn.model_selection import train_test_split\n", 116 | "\n", 117 | "from collections import Counter" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 6, 123 | "metadata": { 124 | "collapsed": true 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "# data\n", 129 | "iris = load_iris()\n", 130 | "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 131 | "df['label'] = iris.target\n", 132 | "df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 133 | "# data = np.array(df.iloc[:100, [0, 1, -1]])" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 7, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "data": { 143 | "text/plain": [ 144 | "" 145 | ] 146 | }, 147 | "execution_count": 7, 148 | "metadata": {}, 149 | "output_type": "execute_result" 150 | }, 151 | { 152 | "data": { 153 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH9lJREFUeJzt3X+UXWV97/H312E00xKYCxkVZpIGgWYBgRIYiRCKINJI\nSCFFSmGh3gALLlyquKhxGUsV0QpKBUFWSYNY5MINjTQG5OeiIAjIDycJJhiMQNFmhtwSwkoACZCM\n3/vH3nMyc5gfZ585zzn72efzWmvWnL3Pnj3fZx+Yb/Z+nu/zmLsjIiIC8J5GByAiIvmhpCAiIiVK\nCiIiUqKkICIiJUoKIiJSoqQgIiIlSgoiIlKipCAiIiVKCiIiUrJT6F9gZi1AD9Dn7nPL3jsauB14\nMd21zN0vHe18kyZN8qlTpwaIVESkuFasWPGKu3eMdVzwpABcCDwL7DLC+4+UJ4vRTJ06lZ6enpoE\nJiLSLMzsd5UcF/TxkZl1AScA3w/5e0REpDZC9yl8F/gi8IdRjjnCzFab2T1mdsBwB5jZuWbWY2Y9\nGzduDBKoiIgETApmNhd42d1XjHLYSmCKux8EfA9YPtxB7r7Y3bvdvbujY8xHYiIiUqWQfQqzgBPN\nbA4wAdjFzG52908NHODurw16fbeZ/bOZTXL3VwLGJSJSlW3bttHb28tbb73V6FBGNGHCBLq6umht\nba3q54MlBXdfCCyE0iijLwxOCOn+DwL/7e5uZoeR3LlsChWTiMh49Pb2MnHiRKZOnYqZNTqcd3F3\nNm3aRG9vL3vttVdV56jH6KMhzOw8AHdfBJwCnG9m24GtwGmuVX9EJKfeeuut3CYEADNj9913Zzx9\nr3VJCu7+EPBQ+nrRoP3XAtfWIwaRelu+qo8r7lvHS5u3smd7GwtmT2PejM5GhyXjlNeEMGC88dX9\nTkGkGSxf1cfCZWvYuq0fgL7NW1m4bA2AEoPkmqa5EAngivvWlRLCgK3b+rnivnUNikiK4t5772Xa\ntGnss88+XH755TU/v5KCSAAvbd6aab9IJfr7+7ngggu45557WLt2LUuWLGHt2rU1/R16fCQSwJ7t\nbfQNkwD2bG9rQDTSKLXuV3rqqafYZ599+NCHPgTAaaedxu23387+++9fq5B1pyASwoLZ02hrbRmy\nr621hQWzpzUoIqm3gX6lvs1bcXb0Ky1f1Vf1Ofv6+pg8eXJpu6uri76+6s83HCUFkQDmzejkspMP\npLO9DQM629u47OQD1cncRGLtV9LjI5FA5s3oVBJoYiH6lTo7O1m/fn1pu7e3l87O2v43pjsFEZEA\nRuo/Gk+/0oc//GGee+45XnzxRd555x1uvfVWTjzxxKrPNxwlBRGRAEL0K+20005ce+21zJ49m/32\n249TTz2VAw4YdnLp6n9HTc8mIiLAjiLFWle1z5kzhzlz5tQixGEpKYiIBBJjv5IeH4mISImSgoiI\nlCgpiIhIiZKCiIiUKCmIiEiJkoI0veWr+ph1+YPs9aW7mHX5g+Oam0YktLPOOov3v//9TJ8+Pcj5\nlRSkqYWYtEwkpPnz53PvvfcGO7+SgjS1WCctk0isXgpXTYdL2pPvq5eO+5RHHXUUu+22Ww2CG56K\n16SpaTEcCWb1UvjJ52Bb+t/SlvXJNsBBpzYurjHoTkGaWohJy0QAeODSHQlhwLatyf4cU1KQpqbF\ncCSYLb3Z9ueEHh9JUws1aZkIu3Ylj4yG259jSgrS9GKctEwicOxXhvYpALS2JfvH4fTTT+ehhx7i\nlVdeoauri6997WucffbZ4wx2ByUFaZhaL2oukisDnckPXJo8Mtq1K0kI4+xkXrJkSQ2CG5mSgjTE\nQH3AwHDQgfoAQIlBiuOgU3M90mg46miWhlB9gEg+KSlIQ6g+QGLl7o0OYVTjjU9JQRpC9QESowkT\nJrBp06bcJgZ3Z9OmTUyYMKHqc6hPQRpiwexpQ/oUQPUBkn9dXV309vaycePGRocyogkTJtDVVf2w\nVyUFaQjVB0iMWltb2WuvvRodRlDBk4KZtQA9QJ+7zy17z4CrgTnAm8B8d18ZOibJB9UHiORPPe4U\nLgSeBXYZ5r3jgX3Tr5nAdel3kaaimg3Ji6AdzWbWBZwAfH+EQ04CbvLEE0C7me0RMiaRvNGaDpIn\noUcffRf4IvCHEd7vBAZPDtKb7hNpGqrZkDwJlhTMbC7wsruvqMG5zjWzHjPryXOvv0g1VLMheRLy\nTmEWcKKZ/Ra4FfiYmd1cdkwfMHnQdle6bwh3X+zu3e7e3dHRESpekYZQzYbkSbCk4O4L3b3L3acC\npwEPuvunyg67A/iMJT4CbHH3DaFiEskjrekgeVL3OgUzOw/A3RcBd5MMR32eZEjqmfWOR6TRVLMh\neWJ5LdceSXd3t/f09DQ6DBGRqJjZCnfvHus4VTRL4Vy8fA1LnlxPvzstZpw+czLfmHdgo8MSiYKS\nghTKxcvXcPMT/1Xa7ncvbSsxiIxNs6RKoSx5cpg1cUfZLyJDKSlIofSP0Ec20n4RGUpJQQqlxSzT\nfhEZSklBCuX0mZMz7ReRodTRLIUy0Jms0Uci1VGdgohIE1CdgjTEGdc/zmMvvFranrX3btxyzuEN\njKhxtEaCxEh9ClIz5QkB4LEXXuWM6x9vUESNozUSJFZKClIz5QlhrP1FpjUSJFZKCiIBaI0EiZWS\ngkgAWiNBYqWkIDUza+/dMu0vMq2RILFSUpCaueWcw9+VAJp19NG8GZ1cdvKBdLa3YUBnexuXnXyg\nRh9J7qlOQUSkCahOQRoi1Nj8LOdVfYBI9ZQUpGYGxuYPDMUcGJsPjOuPcpbzhopBpFmoT0FqJtTY\n/CznVX2AyPgoKUjNhBqbn+W8qg8QGR8lBamZUGPzs5xX9QEi46OkIDUTamx+lvOqPkBkfNTRLDUz\n0JFb65E/Wc4bKgaRZqE6BRGRJqA6hZyKcQx9jDGLSHWUFOooxjH0McYsItVTR3MdxTiGPsaYRaR6\nSgp1FOMY+hhjFpHqKSnUUYxj6GOMWUSqp6RQRzGOoY8xZhGpnjqa6yjGMfQxxiwi1QtWp2BmE4Cf\nAe8jST63uftXy445GrgdeDHdtczdLx3tvKpTEBHJLg91Cm8DH3P3N8ysFXjUzO5x9yfKjnvE3ecG\njEPG6eLla1jy5Hr63Wkx4/SZk/nGvAPHfWxe6h/yEodIHoyZFMzsfcAngamDjx/rX/Se3IK8kW62\npl9xlU8LFy9fw81P/Fdpu9+9tF3+xz7LsXmpf8hLHCJ5UUlH8+3AScB24PeDvsZkZi1m9jTwMnC/\nuz85zGFHmNlqM7vHzA6oMG6pkyVPrq94f5Zj81L/kJc4RPKiksdHXe7+iWpO7u79wMFm1g782Mym\nu/szgw5ZCUxJHzHNAZYD+5afx8zOBc4FmDJlSjWhSJX6R+hzGm5/lmPzUv+QlzhE8qKSO4Wfm9nw\nD4Ur5O6bgZ8Cnyjb/5q7v5G+vhtoNbNJw/z8Ynfvdvfujo6O8YQiGbWYVbw/y7F5qX/ISxwieTFi\nUjCzNWa2GjgSWGlm69LHPAP7R2VmHekdAmbWBhwH/LrsmA+aJX8xzOywNJ5N1TdHau30mZMr3p/l\n2LzUP+QlDpG8GO3x0XhHBO0B/NDMWkj+2C919zvN7DwAd18EnAKcb2bbga3AaR7bXN4FN9BBXMmI\noizH5qX+IS9xiOTFmHUKZvZ/3P3TY+2rF9UpiIhkV8s6hSEjgtJ/+R9abWDNLtSY+Cz1ASHPnaV9\nMV6L6KxeCg9cClt6YdcuOPYrcNCpjY5KcmzEpGBmC4EvA21m9trAbuAdYHEdYiucUGPis9QHhDx3\nlvbFeC2is3op/ORzsC0dSbVlfbINSgwyohE7mt39MnefCFzh7rukXxPdfXd3X1jHGAsj1Jj4LPUB\nIc+dpX0xXovoPHDpjoQwYNvWZL/ICEa7UzgkffmjQa9L3H1lsKgKKtSY+Cz1ASHPnaV9MV6L6Gzp\nzbZfhNH7FL6Tfp8AdAO/JHl8dBDQAxweNrTi2bO9jb5h/uiNd0x8i9mwf/RGqhsIde4s7YvxWkRn\n167kkdFw+0VGMNrjo2Pc/RhgA3BIWjx2KDAD6KtXgEUSakx8lvqAkOfO0r4Yr0V0jv0KtJYl2da2\nZL/ICCoZfTTN3dcMbLj7M2a2X8CYCivUmPgs9QEhz52lfTFei+gMdCZr9JFkUEmdwhKSCfBuTned\nAezs7qcHjm1YqlMQEcmulnUKZwLnAxem2z8DrhtHbBKZPNQeSORULxGNMZOCu78FXJV+SZPJQ+2B\nRE71ElEZbUK8pen3NelEeEO+6heiNFIeag8kcqqXiMpodwoDj4u0VGYTy0PtgURO9RJRGW1I6ob0\n5ceB97r77wZ/1Sc8abQs6w1obQIZ1kh1EaqXyKVKFtmZAvyLmf2nmf3IzD5rZgeHDkzyIQ+1BxI5\n1UtEpZKO5q9CaaGcc4AFwHeBltF+ToohD7UHEjnVS0SlkjqFi4FZwM7AKuBR4JFBj5fqSnUKIiLZ\n1bJO4WRgO3AX8DDwuLu/Pc74ci/UePss583LugCqPciZoo/5L3r7smjAtajk8dEhZrYLyd3CccBi\nM3vZ3Y8MGlkDhRpvn+W8eVkXQLUHOVP0Mf9Fb18WDboWY3Y0m9l0kqkt/ifwNyST4T0YLKIcCDXe\nPst587IugGoPcqboY/6L3r4sGnQtKnl8dDnJ1BbXAL9w921BI8qBUOPts5w3L+sCqPYgZ4o+5r/o\n7cuiQddizDsFd5/r7t929583Q0KAcOPts5x3pPn/670ugGoPcqboY/6L3r4sGnQtKqlTaDqhxttn\nOW9e1gVQ7UHOFH3Mf9Hbl0WDrkUlj4+aTqjx9lnOm5d1AVR7kDNFH/Nf9PZl0aBrMWadQt6oTkFE\nJLtx1ymY2U+AETOGu59YZWxNLQ/1D2dc/ziPvfBqaXvW3rtxyzlacltkiDsvghU3gveDtcCh82Hu\nleM/b87rMEZ7fPRPdYuiSeSh/qE8IQA89sKrnHH940oMIgPuvAh6btix7f07tseTGCKowxhtltSH\nR/uqZ5BFkYf6h/KEMNZ+kaa04sZs+ysVQR3GmB3NZrYvcBmwPzBhYL+7fyhgXIWUh/oHEamA92fb\nX6kI6jAqGZL6ryRrMm8HjgFuAm4OGVRR5aH+QUQqYCNMAj3S/kpFUIdRSVJoc/cHSEYq/c7dLwFO\nCBtWMeWh/mHW3rsNe46R9os0pUPnZ9tfqQjqMCpJCm+b2XuA58zsb83sr0im0ZaM5s3o5LKTD6Sz\nvQ0DOtvbuOzkA2tS/1DpeW855/B3JQCNPhIpM/dK6D57x52BtSTb4x19dNCp8JfXwK6TAUu+/+U1\nuelkhsrWU/gw8CzQDnwd2BX4trs/ET68d1OdgohIdjVbT8Hdf5Ge8D3A59z99QoDmEAykd770t9z\n28AqboOOMeBqYA7wJjDf3VdWcv6sstYHxLaGQJa1F4p+LYKOA88ydj1UHCHbl/Mx9OOStW1Fvhaj\nqGT0UTdJZ/PEdHsLcJa7rxjjR98GPubub5hZK/Comd1TdodxPLBv+jWTpEN7ZvZmjC5rfUBsawhk\nWXuh6Nci6DjwLGPXQ8URsn0RjKGvWta2FflajKGSPoUfAP/b3ae6+1TgApIkMSpPvJFutqZf5c+q\nTgJuSo99Amg3sz0qjr5CWesDYltDIMvaC0W/FkHHgWcZux4qjpDti2AMfdWytq3I12IMlSSFfnd/\nZGDD3R8lGZ46JjNrMbOngZeB+939ybJDOoHBf7l6033l5znXzHrMrGfjxo2V/Oohso7jj23cf5a1\nF4p+LYKOA88ydj1UHCHbF8EY+qplbVuRr8UYKkkKD5vZv5jZ0Wb2UTP7Z+AhMzvEzA4Z7Qfdvd/d\nDwa6gMPSVdwyc/fF7t7t7t0dHR2Zfz7rOP7Yxv1nWXuh6Nci6DjwLGPXQ8URsn0RjKGvWta2Ffla\njKGSpPBnwJ8CXwUuAfYDZgDfocL5kdx9M/BT4BNlb/UBgxcI6Er31VTW+oDY1hDIsvZC0a9F0HHg\nWcauh4ojZPsiGENftaxtK/K1GEMlo4+OqebEZtYBbHP3zWbWBhwHfKvssDuAvzWzW0k6mLe4+4Zq\nft9osq4JENsaAlnWXij6tQg6B/1AZ3Ilo49CxRGyfUVeyyBr24p8LcZQSZ3CB4BvAnu6+/Fmtj9w\nuLvfMMbPHQT8EGghuSNZ6u6Xmtl5AO6+KB2Sei3JHcSbwJnuPmoRguoURESyq1mdAnAjyWijv0+3\nfwP8GzBqUnD31SSPmcr3Lxr02klGM4mISA5U0qcwyd2XAn8AcPftwDinCsy/5av6mHX5g+z1pbuY\ndfmDLF9V864OidHqpXDVdLikPfm+emltjg0lawx5aF9s5y2YSu4Ufm9mu5PWGJjZR4AtQaNqsOgK\ntqQ+shQ05aH4KWTBVmzFeXn4PCJRyZ3CRSQdwnub2WMkU2d/NmhUDRZdwZbUR5aCpjwUP4Us2Iqt\nOC8Pn0ckKhl9tNLMPgpMAwxY5+7bgkfWQNEVbEl9ZCloykPxU8iCrdiK8/LweURizDsFM/trkjUV\nfgXMA/5trKK12EVXsCX1kaWgKQ/FTyELtmIrzsvD5xGJSh4f/YO7v25mRwLHkow6ui5sWI0VXcGW\n1EeWgqY8FD+FLNiKrTgvD59HJCqa+yj9fgJwvbvfBbw3XEiNF2oxHIlclgVS8rCYStYY8tC+2M5b\nQJUUr91JMvXEccAhwFbgKXf/s/DhvZuK10REsqtl8dqpJBXH/5ROWbEHsGC8AYoUXpYFefIitpjz\nshBOXuKogUpGH70JLBu0vQGo+fxEIoWSZUGevIgt5rzUHuQljhqppE9BRLLKsiBPXsQWc15qD/IS\nR40oKYiEkGVBnryILea81B7kJY4aUVIQCSHLgjx5EVvMeak9yEscNaKkIBJClgV58iK2mPNSe5CX\nOGpESUEkhLlXQvfZO/6VbS3Jdh47bAfEFnNeag/yEkeNjFmnkDeqUxARya6WdQoiYcQ4tjtUzKHq\nA2K8xtJQSgrSGDGO7Q4Vc6j6gBivsTSc+hSkMWIc2x0q5lD1ATFeY2k4JQVpjBjHdoeKOVR9QIzX\nWBpOSUEaI8ax3aFiDlUfEOM1loZTUpDGiHFsd6iYQ9UHxHiNpeGUFKQxYhzbHSrmUPUBMV5jaTjV\nKYiINIFK6xR0pyCyeilcNR0uaU++r15a//OGikEkI9UpSHMLNZY/y3lVTyA5ojsFaW6hxvJnOa/q\nCSRHlBSkuYUay5/lvKonkBxRUpDmFmosf5bzqp5AckRJQZpbqLH8Wc6regLJESUFaW6hxvJnOa/q\nCSRHgtUpmNlk4CbgA4ADi9396rJjjgZuB15Mdy1z91F711SnICKSXR7WU9gO/J27rzSzicAKM7vf\n3deWHfeIu88NGIfUU4zz92eJOcb25YGuWzSCJQV33wBsSF+/bmbPAp1AeVKQoohxvL3qCcLTdYtK\nXfoUzGwqMAN4cpi3jzCz1WZ2j5kdUI94JJAYx9urniA8XbeoBK9oNrOdgX8HPu/ur5W9vRKY4u5v\nmNkcYDmw7zDnOBc4F2DKlCmBI5aqxTjeXvUE4em6RSXonYKZtZIkhFvcfVn5++7+mru/kb6+G2g1\ns0nDHLfY3bvdvbujoyNkyDIeMY63Vz1BeLpuUQmWFMzMgBuAZ9192DmAzeyD6XGY2WFpPJtCxSSB\nxTjeXvUE4em6RSXk46NZwKeBNWb2dLrvy8AUAHdfBJwCnG9m24GtwGke21zessNAp2FMo0yyxBxj\n+/JA1y0qWk9BRKQJ5KFOQfJKY8aHuvMiWHEjeH+y6tmh88e/6plIpJQUmo3GjA9150XQc8OObe/f\nsa3EIE1Icx81G40ZH2rFjdn2ixSckkKz0Zjxobw/236RglNSaDYaMz6UtWTbL1JwSgrNRmPGhzp0\nfrb9IgWnpNBsNHf/UHOvhO6zd9wZWEuyrU5maVKqUxARaQKqU6ij5av6uOK+dby0eSt7trexYPY0\n5s3obHRYtVP0uoaity8PdI2joaQwTstX9bFw2Rq2bktGq/Rt3srCZWsAipEYil7XUPT25YGucVTU\npzBOV9y3rpQQBmzd1s8V961rUEQ1VvS6hqK3Lw90jaOipDBOL23emml/dIpe11D09uWBrnFUlBTG\nac/2tkz7o1P0uoaity8PdI2joqQwTgtmT6OtdWihU1trCwtmT2tQRDVW9LqGorcvD3SNo6KO5nEa\n6Ewu7Oijos+FX/T25YGucVRUpyAi0gQqrVPQ4yORIlu9FK6aDpe0J99XL43j3NIwenwkUlQh6wNU\ne1BYulMQKaqQ9QGqPSgsJQWRogpZH6Dag8JSUhApqpD1Aao9KCwlBZGiClkfoNqDwlJSECmqkGtn\naF2OwlKdgohIE1CdgoiIZKakICIiJUoKIiJSoqQgIiIlSgoiIlKipCAiIiVKCiIiUqKkICIiJcGS\ngplNNrOfmtlaM/uVmV04zDFmZteY2fNmttrMDgkVj4yD5s0XaRoh11PYDvydu680s4nACjO7393X\nDjrmeGDf9GsmcF36XfJC8+aLNJVgdwruvsHdV6avXweeBcoXLj4JuMkTTwDtZrZHqJikCpo3X6Sp\n1KVPwcymAjOAJ8ve6gTWD9ru5d2JAzM718x6zKxn48aNocKU4WjefJGmEjwpmNnOwL8Dn3f316o5\nh7svdvdud+/u6OiobYAyOs2bL9JUgiYFM2slSQi3uPuyYQ7pAyYP2u5K90leaN58kaYScvSRATcA\nz7r7lSMcdgfwmXQU0keALe6+IVRMUgXNmy/SVEKOPpoFfBpYY2ZPp/u+DEwBcPdFwN3AHOB54E3g\nzIDxSLUOOlVJQKRJBEsK7v4oYGMc48AFoWIQEZFsVNEsIiIlSgoiIlKipCAiIiVKCiIiUqKkICIi\nJUoKIiJSoqQgIiIllpQKxMPMNgK/a3QcI5gEvNLoIAJS++JV5LaB2leJP3H3MSePiy4p5JmZ9bh7\nd6PjCEXti1eR2wZqXy3p8ZGIiJQoKYiISImSQm0tbnQAgal98Spy20Dtqxn1KYiISInuFEREpERJ\noQpm1mJmq8zszmHeO9rMtpjZ0+lXVEuUmdlvzWxNGnvPMO+bmV1jZs+b2WozO6QRcVargvbF/vm1\nm9ltZvZrM3vWzA4vez/2z2+s9kX7+ZnZtEFxP21mr5nZ58uOCf75hVxkp8guBJ4Fdhnh/UfcfW4d\n46m1Y9x9pDHRxwP7pl8zgevS7zEZrX0Q9+d3NXCvu59iZu8F/qjs/dg/v7HaB5F+fu6+DjgYkn94\nkixN/OOyw4J/frpTyMjMuoATgO83OpYGOQm4yRNPAO1mtkejgxIws12Bo0iWwcXd33H3zWWHRfv5\nVdi+ojgWeMHdywt1g39+SgrZfRf4IvCHUY45Ir21u8fMDqhTXLXiwH+Y2QozO3eY9zuB9YO2e9N9\nsRirfRDv57cXsBH41/Tx5vfN7I/Ljon586ukfRDv5zfYacCSYfYH//yUFDIws7nAy+6+YpTDVgJT\n3P0g4HvA8roEVztHuvvBJLepF5jZUY0OqMbGal/Mn99OwCHAde4+A/g98KXGhlRTlbQv5s8PgPSx\n2InAjxrx+5UUspkFnGhmvwVuBT5mZjcPPsDdX3P3N9LXdwOtZjap7pFWyd370u8vkzzPPKzskD5g\n8qDtrnRfFMZqX+SfXy/Q6+5Pptu3kfwRHSzmz2/M9kX++Q04Hljp7v89zHvBPz8lhQzcfaG7d7n7\nVJLbuwfd/VODjzGzD5qZpa8PI7nGm+oebBXM7I/NbOLAa+AvgGfKDrsD+Ew6CuIjwBZ331DnUKtS\nSfti/vzc/f8B681sWrrrWGBt2WHRfn6VtC/mz2+Q0xn+0RHU4fPT6KMaMLPzANx9EXAKcL6ZbQe2\nAqd5PBWCHwB+nP4/tRPwf9393rL23Q3MAZ4H3gTObFCs1aikfTF/fgCfBW5JH0H8J3BmgT4/GLt9\nUX9+6T9WjgP+16B9df38VNEsIiIlenwkIiIlSgoiIlKipCAiIiVKCiIiUqKkICIiJUoKIhmlM3GO\nNEPuu/bX4PfNM7P9B20/ZGaFXY9YGktJQST/5gH7j3mUSA0oKUjhpJXLd5nZL83sGTP7m3T/oWb2\ncDoZ3n0Ds0um//K+Op3D/pm0EhYzO8zMHk8nX/v5oEraSmP4gZk9lf78Sen++Wa2zMzuNbPnzOzb\ng37mbDP7Tfoz15vZtWZ2BMk8OFek8e2dHv7X6XG/MbM/r9GlE1FFsxTSJ4CX3P0ESKZcNrNWkgnS\nTnL3jWmi+EfgrPRn/sjdD04nyPsBMB34NfDn7r7dzD4OfBP4ZIUx/D3JNChnmVk78JSZ/Uf63sHA\nDOBtYJ2ZfQ/oB/6BZC6f14EHgV+6+8/N7A7gTne/LW0PwE7ufpiZzQG+Cny8mgslUk5JQYpoDfAd\nM/sWyR/TR8xsOskf+vvTP6otwOA5Y5YAuPvPzGyX9A/5ROCHZrYvyZTbrRli+AuSyRO/kG5PAKak\nrx9w9y0AZrYW+BNgEvCwu7+a7v8R8KejnH9Z+n0FMDVDXCKjUlKQwnH331iyTOEc4Btm9gDJjKi/\ncvfDR/qxYba/DvzU3f/KzKYCD2UIw4BPpqtp7dhpNpPkDmFAP9X9fzhwjmp/XmRY6lOQwjGzPYE3\n3f1m4AqSRzLrgA5L1/Q1s1YbugDLQL/DkSQzT24BdmXHtMTzM4ZxH/DZQTN2zhjj+F8AHzWz/2Fm\nOzH0MdXrJHctIsEpKUgRHUjyDP9pkuft33D3d0hm0PyWmf0SeBo4YtDPvGVmq4BFwNnpvm8Dl6X7\ns/5r/Oskj5tWm9mv0u0Rpes8fBN4CngM+C2wJX37VmBB2mG99/BnEKkNzZIqTc/MHgK+4O49DY5j\nZ3d/I71T+DHwA3cvX7hdJCjdKYjkxyXp3c0zwItEuJSkxE93CiIiUqI7BRERKVFSEBGREiUFEREp\nUVIQEZESJQURESlRUhARkZL/D971my+K8xqCAAAAAElFTkSuQmCC\n", 154 | "text/plain": [ 155 | "" 156 | ] 157 | }, 158 | "metadata": {}, 159 | "output_type": "display_data" 160 | } 161 | ], 162 | "source": [ 163 | "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n", 164 | "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n", 165 | "plt.xlabel('sepal length')\n", 166 | "plt.ylabel('sepal width')\n", 167 | "plt.legend()" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 8, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "data = np.array(df.iloc[:100, [0, 1, -1]])\n", 177 | "X, y = data[:,:-1], data[:,-1]\n", 178 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 9, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "class KNN:\n", 188 | " def __init__(self, X_train, y_train, n_neighbors=3, p=2):\n", 189 | " \"\"\"\n", 190 | " parameter: n_neighbors 临近点个数\n", 191 | " parameter: p 距离度量\n", 192 | " \"\"\"\n", 193 | " self.n = n_neighbors\n", 194 | " self.p = p\n", 195 | " self.X_train = X_train\n", 196 | " self.y_train = y_train\n", 197 | " \n", 198 | " def predict(self, X):\n", 199 | " # 取出n个点\n", 200 | " knn_list = []\n", 201 | " for i in range(self.n):\n", 202 | " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n", 203 | " knn_list.append((dist, self.y_train[i]))\n", 204 | " \n", 205 | " for i in range(self.n, len(self.X_train)):\n", 206 | " max_index = knn_list.index(max(knn_list, key=lambda x: x[0]))\n", 207 | " dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n", 208 | " if knn_list[max_index][0] > dist:\n", 209 | " knn_list[max_index] = (dist, self.y_train[i])\n", 210 | " \n", 211 | " # 统计\n", 212 | " knn = [k[-1] for k in knn_list]\n", 213 | " count_pairs = Counter(knn)\n", 214 | " max_count = sorted(count_pairs, key=lambda x:x)[-1]\n", 215 | " return max_count\n", 216 | " \n", 217 | " def score(self, X_test, y_test):\n", 218 | " right_count = 0\n", 219 | " n = 10\n", 220 | " for X, y in zip(X_test, y_test):\n", 221 | " label = self.predict(X)\n", 222 | " if label == y:\n", 223 | " right_count += 1\n", 224 | " return right_count / len(X_test)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 10, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "clf = KNN(X_train, y_train)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 11, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "1.0" 245 | ] 246 | }, 247 | "execution_count": 11, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "clf.score(X_test, y_test)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 12, 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "name": "stdout", 263 | "output_type": "stream", 264 | "text": [ 265 | "Test Point: 1.0\n" 266 | ] 267 | } 268 | ], 269 | "source": [ 270 | "test_point = [6.0, 3.0]\n", 271 | "print('Test Point: {}'.format(clf.predict(test_point)))" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 13, 277 | "metadata": {}, 278 | "outputs": [ 279 | { 280 | "data": { 281 | "text/plain": [ 282 | "" 283 | ] 284 | }, 285 | "execution_count": 13, 286 | "metadata": {}, 287 | "output_type": "execute_result" 288 | }, 289 | { 290 | "data": { 291 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucVXW9//HXx3ESTHRSsGQGwtvhoQIBjqKiqaEHRUoj\nM00z1Eekx5NlRUl6vKWpUWnqIzlYlhdEyUNYpmippHmBuAkqmnq8MKO/I1KgCBqMn98fa81m2M5l\nr7332nuttd/Px2MeM+u71/7O57v2g/mw1vp+1tfcHREREYCtqh2AiIgkh5KCiIjkKCmIiEiOkoKI\niOQoKYiISI6SgoiI5CgpiIhIjpKCiIjkKCmIiEjO1nH/AjOrAxYCre4+Pu+1w4C7gZfDptnufml3\n/fXt29cHDRoUQ6QiItm1aNGit9y9X0/7xZ4UgG8CK4Dtu3j90fxk0Z1BgwaxcOHCsgQmIlIrzOzV\nQvaL9fKRmTUBxwC/jPP3iIhIecR9T+Ea4HvAB93sc5CZLTOz+8xsn852MLNJZrbQzBauWrUqlkBF\nRCTGpGBm44E33X1RN7stBga6+zDgOmBOZzu5+3R3b3b35n79erwkJiIiRYrznsJo4HNmNg7oBWxv\nZre5+yntO7j72x1+vtfMfmFmfd39rRjjEpEE2bhxIy0tLbz33nvVDiUTevXqRVNTE/X19UW9P7ak\n4O5TgCmQm2X03Y4JIWz/BPB/7u5mtj/BmcvquGISkeRpaWmhT58+DBo0CDOrdjip5u6sXr2alpYW\ndt1116L6qMTsoy2Y2ZkA7j4NOB44y8w2ARuAE12r/ojUlPfee08JoUzMjJ122olS7r1WJCm4+zxg\nXvjztA7t1wPXVyIGkUqbs6SVqfc/z+trNtC/oTeTxw7muBGN1Q4rkZQQyqfUY1nxMwWRWjBnSStT\nZi9nw8Y2AFrXbGDK7OUASgySaHrMhUgMpt7/fC4htNuwsY2p9z9fpYhECqOkIBKD19dsiNQuhZsx\nAwYNgq22Cr7PmFFaf2vWrOEXv/hFUe+95pprWL9+fWkB5Lnwwgv585//3O0+8+bN4/HHHy/r722n\npCASg/4NvSO1S2FmzIBJk+DVV8E9+D5pUmmJIWlJ4dJLL+WII47odh8lBZGUmTx2ML3r67Zo611f\nx+Sxg6sUUTacfz7k/w1evz5oL9Z5553HSy+9xPDhw5k8eTJTp05lv/32Y9iwYVx00UUAvPvuuxxz\nzDF86lOfYsiQIdx5551ce+21vP766xx++OEcfvjhXfa/3Xbbce6557LPPvswZsyY3MygpUuXcsAB\nBzBs2DA+//nP889//hOAiRMnctdddwHBs94uuugiRo4cydChQ3nuued45ZVXmDZtGldffTXDhw/n\n0UcfLX7wnVBSEInBcSMauWLCUBobemNAY0NvrpgwVDeZS/Taa9HaC3HllVey++67s3TpUo488khe\neOEFFixYwNKlS1m0aBGPPPIIc+fOpX///jz11FM8/fTTHHXUUZxzzjn079+fhx9+mIcffrjL/t99\n912am5t55plnOPTQQ7nkkksAOPXUU7nqqqtYtmwZQ4cOzbXn69u3L4sXL+ass87iJz/5CYMGDeLM\nM8/k3HPPZenSpRxyyCHFD74Tmn0kEpPjRjQqCZTZwIHBJaPO2svhgQce4IEHHmDEiBEArFu3jhde\neIFDDjmE73znO3z/+99n/Pjxkf4Qb7XVVnzpS18C4JRTTmHChAmsXbuWNWvWcOihhwLw1a9+lS9+\n8Yudvn/ChAkA7LvvvsyePbuU4RVESUFEUuPyy4N7CB0vIW27bdBeDu7OlClT+PrXv/6h1xYvXsy9\n997LBRdcwJgxY7jwwguL+h1R6wi22WYbAOrq6ti0aVNRvzMKXT4SkdQ4+WSYPh0++UkwC75Pnx60\nF6tPnz688847AIwdO5abbrqJdevWAdDa2sqbb77J66+/zrbbbsspp5zC5MmTWbx48Yfe25UPPvgg\nd4/g9ttv5+CDD2aHHXbgYx/7WO5+wK233po7a4gac7npTEFEUuXkk0tLAvl22mknRo8ezZAhQzj6\n6KP58pe/zIEHHggEN4lvu+02XnzxRSZPnsxWW21FfX09N9xwAwCTJk3iqKOOyt1b6MxHP/pRFixY\nwGWXXcbOO+/MnXfeCcDNN9/MmWeeyfr169ltt9349a9/XXDMn/3sZzn++OO5++67ue6668p6X8HS\n9qih5uZm18prItmxYsUK9tprr2qHEZvtttsud+ZRKZ0dUzNb5O7NPb1Xl49ERCRHl49ERMpg1KhR\nvP/++1u03XrrrRU/SyiVkoKISBnMnz+/2iGUhS4fiYhIjpKCiIjk6PKR1DwthiOymc4UpKa1L4bT\numYDzubFcOYsaa12aFJBc+fOZfDgweyxxx5ceeWV1Q6nqpQUpKZpMRxpa2vj7LPP5r777uPZZ59l\n5syZPPvss9UOq2p0+UhqmhbDSZ9yX+5bsGABe+yxB7vtthsAJ554InfffTd77713uUJOFZ0pSE3T\nYjjpEsflvtbWVgYMGJDbbmpqorW1di8fKilITdNiOOmiy33x0+UjqWntlx00+ygd4rjc19jYyMqV\nK3PbLS0tNDbW7uevpCA1T4vhpEf/ht60dpIASrnct99++/HCCy/w8ssv09jYyB133MHtt99eSpip\npstHUjVzlrQy+sqH2PW8PzL6yoc0DVR6FMflvq233prrr7+esWPHstdee3HCCSewzz77lBpqaulM\nQaqi/YZh+/Xh9huGgP7XLl2K63LfuHHjGDduXDlCTD0lBamK7m4YKilId3S5L166fCRVofoAkWRS\nUpCqUH2ASDIpKUhVqD5AJJl0T0GqQvUBIskUe1IwszpgIdDq7uPzXjPg58A4YD0w0d0Xxx2TJINu\nGIokTyUuH30TWNHFa0cDe4Zfk4AbKhCPSOKoZqO6Tj/9dHbeeWeGDBlS7VCqLtakYGZNwDHAL7vY\n5VjgFg88CTSY2S5xxiSSNFrTofomTpzI3Llzqx1GIsR9pnAN8D3ggy5ebwRWdthuCdtEaoYe8hbR\nsllw9RC4uCH4vmxWyV1++tOfZscddyxDcOkXW1Iws/HAm+6+qAx9TTKzhWa2cNWqVWWITiQ5VLMR\nwbJZ8IdzYO1KwIPvfzinLIlBAnGeKYwGPmdmrwB3AJ8xs9vy9mkFBnTYbgrbtuDu09292d2b+/Xr\nF1e8IlWhmo0IHrwUNuYly40bgnYpi9iSgrtPcfcmdx8EnAg85O6n5O32e+BUCxwArHX3N+KKSSSJ\nVLMRwdqWaO0SWcXrFMzsTAB3nwbcSzAd9UWCKamnVToekWpTzUYEOzSFl446aZeyqEhScPd5wLzw\n52kd2h04uxIxiCSZajYKNObC4B5Cx0tI9b2D9hKcdNJJzJs3j7feeoumpiYuueQSzjjjjBKDTSdV\nNEvmXDBnOTPnr6TNnTozTho1gMuOG1rtsKQchp0QfH/w0uCS0Q5NQUJoby/SzJkzyxBcNigpSKZc\nMGc5tz35Wm67zT23rcSQEcNOKDkJSNf0QDzJlJnzO7ne3E27iGxJSUEypc09Urskg+vzKZtSj6WS\ngmRKnVmkdqm+Xr16sXr1aiWGMnB3Vq9eTa9evYruQ/cUJFNOGjVgi3sKHdslmZqammhpaUFPKyiP\nXr160dRU/BRdJQXJlPabyZp9lB719fXsuuuu1Q5DQpa2U7bm5mZfuHBhtcMQEUkVM1vk7s097acz\nBSmrk298gsde+kdue/TuOzLjawdWMaLqmbOkVVXKkjq60Sxlk58QAB576R+cfOMTVYqoerRGgqSV\nkoKUTX5C6Kk9y7RGgqSVkoJIDLRGgqSVkoJIDLRGgqSVkoKUzejdO1/OsKv2LNMaCZJWSgpSNjO+\nduCHEkCtzj46bkQjV0wYSmNDbwxobOjNFROGavaRJJ7qFEREaoDqFKQq4pqbH6Vf1QeIFE9JQcqm\nfW5++1TM9rn5QEl/lKP0G1cMIrVC9xSkbOKamx+lX9UHiJRGSUHKJq65+VH6VX2ASGmUFKRs4pqb\nH6Vf1QeIlEZJQcomrrn5UfpVfYBIaXSjWcqm/UZuuWf+ROk3rhhEaoXqFEREaoDqFBIqjXPo0xiz\niBRHSaGC0jiHPo0xi0jxdKO5gtI4hz6NMYtI8ZQUKiiNc+jTGLOIFE9JoYLSOIc+jTGLSPGUFCoo\njXPo0xiziBRPN5orKI1z6NMYs4gUL7Y6BTPrBTwCbEOQfO5y94vy9jkMuBt4OWya7e6Xdtev6hRE\nRKJLQp3C+8Bn3H2dmdUDfzWz+9z9ybz9HnX38THGISW6YM5yZs5fSZs7dWacNGoAlx03tOR9k1L/\nkJQ4RJKgx6RgZtsAXwAGddy/p//Re3AKsi7crA+/0lU+LVwwZzm3PflabrvNPbed/8c+yr5JqX9I\nShwiSVHIjea7gWOBTcC7Hb56ZGZ1ZrYUeBP4k7vP72S3g8xsmZndZ2b7FBi3VMjM+SsLbo+yb1Lq\nH5ISh0hSFHL5qMndjyqmc3dvA4abWQPwOzMb4u5Pd9hlMTAwvMQ0DpgD7Jnfj5lNAiYBDBw4sJhQ\npEhtXdxz6qw9yr5JqX9IShwiSVHImcLjZtb5ReECufsa4GHgqLz2t919XfjzvUC9mfXt5P3T3b3Z\n3Zv79etXSigSUZ1Zwe1R9k1K/UNS4hBJii6TgpktN7NlwMHAYjN7PrzM097eLTPrF54hYGa9gSOB\n5/L2+YRZ8BfDzPYP41ld/HCk3E4aNaDg9ij7JqX+ISlxiCRFd5ePSp0RtAtws5nVEfyxn+Xu95jZ\nmQDuPg04HjjLzDYBG4ATPW3P8s649hvEhcwoirJvUuofkhKHSFL0WKdgZre6+1d6aqsU1SmIiERX\nzjqFLWYEhf/z37fYwGpdXHPio9QHxNl3lPGl8VikzrJZ8OClsLYFdmiCMRfCsBOqHZUkWJdJwcym\nAD8AepvZ2+3NwL+A6RWILXPimhMfpT4gzr6jjC+NxyJ1ls2CP5wDG8OZVGtXBtugxCBd6vJGs7tf\n4e59gKnuvn341cfdd3L3KRWMMTPimhMfpT4gzr6jjC+NxyJ1Hrx0c0Jot3FD0C7She7OFEaGP/62\nw8857r44tqgyKq458VHqA+LsO8r40ngsUmdtS7R2Ebq/p/DT8HsvoBl4iuDy0TBgIXBgvKFlT/+G\n3rR28kev1DnxdWad/tHrqm4grr6jjC+NxyJ1dmgKLhl11i7She4uHx3u7ocDbwAjw+KxfYERQGul\nAsySuObER6kPiLPvKONL47FInTEXQn1ekq3vHbSLdKGQ2UeD3X15+4a7P21me8UYU2bFNSc+Sn1A\nnH1HGV8aj0XqtN9M1uwjiaCQOoWZBA/Auy1sOhnYzt1Pijm2TqlOQUQkunLWKZwGnAV8M9x+BLih\nhNgkZZJQeyApp3qJ1OgxKbj7e8DV4ZfUmCTUHkjKqV4iVbp7IN6s8Pvy8EF4W3xVLkSppiTUHkjK\nqV4iVbo7U2i/XKSlMmtYEmoPJOVUL5Eq3U1JfSP88QjgI+7+asevyoQn1RZlvQGtTSCd6qouQvUS\niVTIIjsDgf82s/81s9+a2TfMbHjcgUkyJKH2QFJO9RKpUsiN5osgt1DO14DJwDVAXXfvk2xIQu2B\npJzqJVKlkDqFC4DRwHbAEuCvwKMdLi9VlOoURESiK2edwgRgE/BH4C/AE+7+fonxJV5c8+2j9JuU\ndQFUe5AwWZ/zn/XxRVGFY1HI5aORZrY9wdnCkcB0M3vT3Q+ONbIqimu+fZR+k7IugGoPEibrc/6z\nPr4oqnQserzRbGZDCB5t8VXgSwQPw3sotogSIK759lH6Tcq6AKo9SJisz/nP+viiqNKxKOTy0ZUE\nj7a4Fvibu2+MNaIEiGu+fZR+k7IugGoPEibrc/6zPr4oqnQsejxTcPfx7v5jd3+8FhICxDffPkq/\nXT3/v9LrAqj2IGGyPuc/6+OLokrHopA6hZoT13z7KP0mZV0A1R4kTNbn/Gd9fFFU6VgUcvmo5sQ1\n3z5Kv0lZF0C1BwmT9Tn/WR9fFFU6Fj3WKSSN6hRERKIruU7BzP4AdJkx3P1zRcZW05JQ/3DyjU/w\n2Ev/yG2P3n1HZnxNS26LbOGeb8Oi34C3gdXBvhNh/M9K7zfhdRjdXT76ScWiqBFJqH/ITwgAj730\nD06+8QklBpF293wbFv5q87a3bd4uJTGkoA6ju6ek/qW7r0oGmRVJqH/ITwg9tYvUpEW/idZeqBTU\nYfR4o9nM9gSuAPYGerW3u/tuMcaVSUmofxCRAnhbtPZCpaAOo5Apqb8mWJN5E3A4cAtwW5xBZVUS\n6h9EpADWxUOgu2ovVArqMApJCr3d/UGCmUqvuvvFwDHxhpVNSah/GL37jp320VW7SE3ad2K09kKl\noA6jkKTwvpltBbxgZv9pZp8neIy2RHTciEaumDCUxobeGNDY0JsrJgwtS/1Dof3O+NqBH0oAmn0k\nkmf8z6D5jM1nBlYXbJc6+2jYCfDZa2GHAYAF3z97bWJuMkNh6ynsB6wAGoAfAjsAP3b3J+MP78NU\npyAiEl3Z1lNw97+FHW4FnOPu7xQYQC+CB+ltE/6eu9pXceuwjwE/B8YB64GJ7r64kP6jilofkLY1\nBKKsvZD1YxHrPPAoc9fjiiNCvzNmwPnnw2uvwcCBcPnlcPLJ5ek7daKOLcvHohuFzD5qJrjZ3Cfc\nXguc7u6Lenjr+8Bn3H2dmdUDfzWz+/LOMI4G9gy/RhHc0B4VfRjdi1ofkLY1BKKsvZD1YxHrPPAo\nc9fjiiNCvzNmwKRJsH59sP3qq8E2dJEYUjCHvmhRx5blY9GDQu4p3AT8h7sPcvdBwNkESaJbHlgX\nbtaHX/nXqo4Fbgn3fRJoMLNdCo6+QFHrA9K2hkCUtReyfixinQceZe56XHFE6Pf88zcnhHbr1wft\npfadOlHHluVj0YNCkkKbuz/avuHufyWYntojM6szs6XAm8Cf3H1+3i6NQMe/XC1hW34/k8xsoZkt\nXLVqVSG/egtR5/Gnbd5/lLUXsn4sYp0HHmXuelxxROj3tdc62a+b9jTMoS9a1LFl+Vj0oJCk8Bcz\n+28zO8zMDjWzXwDzzGykmY3s7o3u3ubuw4EmYP9wFbfI3H26uze7e3O/fv0ivz/qPP60zfuPsvZC\n1o9FrPPAo8xdjyuOCP0OHNj5rl21p2EOfdGiji3Lx6IHhSSFTwH/BlwEXAzsBYwAfkqBz0dy9zXA\nw8BReS+1Ah0XCGgK28oqan1A2tYQiLL2QtaPRazzwKPMXY8rjgj9Xn45bLvtlm3bbhu0l9p36kQd\nW5aPRQ8KmX10eDEdm1k/YKO7rzGz3sCRwFV5u/0e+E8zu4PgBvNad3+jmN/XnahrAqRtDYEoay9k\n/VjE+gz69pvJhcw+iiuOCP2230wuePZRltcyiDq2LB+LHhRSp/Bx4EdAf3c/2sz2Bg5091/18L5h\nwM1AHcEZySx3v9TMzgRw92nhlNTrCc4g1gOnuXu3RQiqUxARia5sdQrAbwhmG7XPWfg7cCfQbVJw\n92UEl5ny26d1+NkJZjOJiEgCFHJPoa+7zwI+AHD3TUCJjwpMvjlLWhl95UPset4fGX3lQ8xZUvZb\nHZJGy2bB1UPg4obg+7JZ5dk3LlFjSML40tZvxhRypvCume1EWGNgZgcAa2ONqspSV7AllRGloCkJ\nxU9xFmwloDgvEf1mUCFnCt8muCG8u5k9RvDo7G/EGlWVpa5gSyojSkFTEoqf4izYSkBxXiL6zaBC\nZh8tNrNDgcGAAc+7+8bYI6ui1BVsSWVEKWhKQvFTnAVbCSjOS0S/GdTjmYKZfZFgTYVngOOAO3sq\nWku71BVsSWVEKWhKQvFTnAVbCSjOS0S/GVTI5aP/cvd3zOxgYAzBrKMb4g2rulJXsCWVEaWgKQnF\nT3EWbCWgOC8R/WZQQc8+Cr8fA9zo7n8EPhJfSNUX12I4knJRFkhJwmIqUWNIwvjS1m8GFVK8dg/B\noyeOBEYCG4AF7v6p+MP7MBWviYhEV87itRMIKo5/Ej6yYhdgcqkBimRelAV5kiJtMSdlIZykxFEG\nhcw+Wg/M7rD9BlD25xOJZEqUBXmSIm0xJ6X2IClxlEkh9xREJKooC/IkRdpiTkrtQVLiKBMlBZE4\nRFmQJynSFnNSag+SEkeZKCmIxCHKgjxJkbaYk1J7kJQ4ykRJQSQOURbkSYq0xZyU2oOkxFEmSgoi\ncRj/M2g+Y/P/sq0u2E7iDdt2aYs5KbUHSYmjTHqsU0ga1SmIiERXzjoFkXikcW53XDHHVR+QxmMs\nVaWkINWRxrndccUcV31AGo+xVJ3uKUh1pHFud1wxx1UfkMZjLFWnpCDVkca53XHFHFd9QBqPsVSd\nkoJURxrndscVc1z1AWk8xlJ1SgpSHWmc2x1XzHHVB6TxGEvVKSlIdaRxbndcMcdVH5DGYyxVpzoF\nEZEaUGidgs4URJbNgquHwMUNwfdlsyrfb1wxiESkOgWpbXHN5Y/Sr+oJJEF0piC1La65/FH6VT2B\nJIiSgtS2uObyR+lX9QSSIEoKUtvimssfpV/VE0iCKClIbYtrLn+UflVPIAmipCC1La65/FH6VT2B\nJEhsdQpmNgC4Bfg44MB0d/953j6HAXcDL4dNs92927trqlMQEYkuCespbAK+4+6LzawPsMjM/uTu\nz+bt96i7j48xDqmkND6/P0rMaRxfEui4pUZsScHd3wDeCH9+x8xWAI1AflKQrEjjfHvVE8RPxy1V\nKnJPwcwGASOA+Z28fJCZLTOz+8xsn0rEIzFJ43x71RPET8ctVWKvaDaz7YD/Ab7l7m/nvbwYGOju\n68xsHDAH2LOTPiYBkwAGDhwYc8RStDTOt1c9Qfx03FIl1jMFM6snSAgz3H12/uvu/ra7rwt/vheo\nN7O+new33d2b3b25X79+cYYspUjjfHvVE8RPxy1VYksKZmbAr4AV7t7pM4DN7BPhfpjZ/mE8q+OK\nSWKWxvn2qieIn45bqsR5+Wg08BVguZktDdt+AAwEcPdpwPHAWWa2CdgAnOhpe5a3bNZ+0zBNs0yi\nxJzG8SWBjluqaD0FEZEakIQ6BUkqzRnf0j3fhkW/AW8LVj3bd2Lpq56JpJSSQq3RnPEt3fNtWPir\nzdvetnlbiUFqkJ59VGs0Z3xLi34TrV0k45QUao3mjG/J26K1i2SckkKt0ZzxLVldtHaRjFNSqDWa\nM76lfSdGaxfJOCWFWqNn929p/M+g+YzNZwZWF2zrJrPUKNUpiIjUANUpVNCcJa1Mvf95Xl+zgf4N\nvZk8djDHjWisdljlk/W6hqyPLwl0jFNDSaFEc5a0MmX2cjZsDGartK7ZwJTZywGykRiyXteQ9fEl\ngY5xquieQomm3v98LiG027Cxjan3P1+liMos63UNWR9fEugYp4qSQoleX7MhUnvqZL2uIevjSwId\n41RRUihR/4bekdpTJ+t1DVkfXxLoGKeKkkKJJo8dTO/6LQudetfXMXns4CpFVGZZr2vI+viSQMc4\nVXSjuUTtN5MzO/so68/Cz/r4kkDHOFVUpyAiUgMKrVPQ5SORLFs2C64eAhc3BN+XzUpH31I1unwk\nklVx1geo9iCzdKYgklVx1geo9iCzlBREsirO+gDVHmSWkoJIVsVZH6Dag8xSUhDJqjjrA1R7kFlK\nCiJZFefaGVqXI7NUpyAiUgNUpyAiIpEpKYiISI6SgoiI5CgpiIhIjpKCiIjkKCmIiEiOkoKIiOQo\nKYiISE5sScHMBpjZw2b2rJk9Y2bf7GQfM7NrzexFM1tmZiPjikdKoOfmi9SMONdT2AR8x90Xm1kf\nYJGZ/cndn+2wz9HAnuHXKOCG8LskhZ6bL1JTYjtTcPc33H1x+PM7wAogf+HiY4FbPPAk0GBmu8QV\nkxRBz80XqSkVuadgZoOAEcD8vJcagZUdtlv4cOLAzCaZ2UIzW7hq1aq4wpTO6Ln5IjUl9qRgZtsB\n/wN8y93fLqYPd5/u7s3u3tyvX7/yBijd03PzRWpKrEnBzOoJEsIMd5/dyS6twIAO201hmySFnpsv\nUlPinH1kwK+AFe7+sy52+z1wajgL6QBgrbu/EVdMUgQ9N1+kpsQ5+2g08BVguZktDdt+AAwEcPdp\nwL3AOOBFYD1wWozxSLGGnaAkIFIjYksK7v5XwHrYx4Gz44pBRESiUUWziIjkKCmIiEiOkoKIiOQo\nKYiISI6SgoiI5CgpiIhIjpKCiIjkWFAqkB5mtgp4tdpxdKEv8Fa1g4iRxpdeWR4baHyF+KS79/jw\nuNQlhSQzs4Xu3lztOOKi8aVXlscGGl856fKRiIjkKCmIiEiOkkJ5Ta92ADHT+NIry2MDja9sdE9B\nRERydKYgIiI5SgpFMLM6M1tiZvd08tphZrbWzJaGX6laoszMXjGz5WHsCzt53czsWjN70cyWmdnI\nasRZrALGl/bPr8HM7jKz58xshZkdmPd62j+/nsaX2s/PzAZ3iHupmb1tZt/K2yf2zy/ORXay7JvA\nCmD7Ll5/1N3HVzCecjvc3buaE300sGf4NQq4IfyeJt2ND9L9+f0cmOvux5vZR4Bt815P++fX0/gg\npZ+fuz8PDIfgP54ESxP/Lm+32D8/nSlEZGZNwDHAL6sdS5UcC9zigSeBBjPbpdpBCZjZDsCnCZbB\nxd3/5e5r8nZL7edX4PiyYgzwkrvnF+rG/vkpKUR3DfA94INu9jkoPLW7z8z2qVBc5eLAn81skZlN\n6uT1RmBlh+2WsC0tehofpPfz2xVYBfw6vLz5SzP7aN4+af78ChkfpPfz6+hEYGYn7bF/fkoKEZjZ\neOBNd1/UzW6LgYHuPgy4DphTkeDK52B3H05wmnq2mX262gGVWU/jS/PntzUwErjB3UcA7wLnVTek\nsipkfGn+/AAIL4t9DvhtNX6/kkI0o4HPmdkrwB3AZ8zsto47uPvb7r4u/PleoN7M+lY80iK5e2v4\n/U2C65n75+3SCgzosN0UtqVCT+NL+efXArS4+/xw+y6CP6Idpfnz63F8Kf/82h0NLHb3/+vktdg/\nPyWFCNywjxqxAAADqElEQVR9irs3ufsggtO7h9z9lI77mNknzMzCn/cnOMarKx5sEczso2bWp/1n\n4N+Bp/N2+z1wajgL4gBgrbu/UeFQi1LI+NL8+bn7/wNWmtngsGkM8Gzebqn9/AoZX5o/vw5OovNL\nR1CBz0+zj8rAzM4EcPdpwPHAWWa2CdgAnOjpqRD8OPC78N/U1sDt7j43b3z3AuOAF4H1wGlVirUY\nhYwvzZ8fwDeAGeEliP8FTsvQ5wc9jy/Vn1/4n5Ujga93aKvo56eKZhERydHlIxERyVFSEBGRHCUF\nERHJUVIQEZEcJQUREclRUhCJKHwSZ1dPyP1Qexl+33FmtneH7Xlmltn1iKW6lBREku84YO8e9xIp\nAyUFyZywcvmPZvaUmT1tZl8K2/c1s7+ED8O7v/3pkuH/vH8ePsP+6bASFjPb38yeCB++9niHStpC\nY7jJzBaE7z82bJ9oZrPNbK6ZvWBmP+7wnjPM7O/he240s+vN7CCC5+BMDePbPdz9i+F+fzezQ8p0\n6ERU0SyZdBTwursfA8Ejl82snuABace6+6owUVwOnB6+Z1t3Hx4+IO8mYAjwHHCIu28ysyOAHwFf\nKDCG8wkeg3K6mTUAC8zsz+Frw4ERwPvA82Z2HdAG/BfBs3zeAR4CnnL3x83s98A97n5XOB6Ard19\nfzMbB1wEHFHMgRLJp6QgWbQc+KmZXUXwx/RRMxtC8If+T+Ef1Tqg4zNjZgK4+yNmtn34h7wPcLOZ\n7UnwyO36CDH8O8HDE78bbvcCBoY/P+juawHM7Fngk0Bf4C/u/o+w/bfAv3XT/+zw+yJgUIS4RLql\npCCZ4+5/t2CZwnHAZWb2IMETUZ9x9wO7elsn2z8EHnb3z5vZIGBehDAM+EK4mtbmRrNRBGcI7doo\n7t9hex/Fvl+kU7qnIJljZv2B9e5+GzCV4JLM80A/C9f0NbN623IBlvb7DgcTPHlyLbADmx9LPDFi\nGPcD3+jwxM4RPez/N+BQM/uYmW3Nlpep3iE4axGJnZKCZNFQgmv4Swmut1/m7v8ieILmVWb2FLAU\nOKjDe94zsyXANOCMsO3HwBVhe9T/jf+Q4HLTMjN7JtzuUrjOw4+ABcBjwCvA2vDlO4DJ4Q3r3Tvv\nQaQ89JRUqXlmNg/4rrsvrHIc27n7uvBM4XfATe6ev3C7SKx0piCSHBeHZzdPAy+TwqUkJf10piAi\nIjk6UxARkRwlBRERyVFSEBGRHCUFERHJUVIQEZEcJQUREcn5/8iEC58zZruKAAAAAElFTkSuQmCC\n", 292 | "text/plain": [ 293 | "" 294 | ] 295 | }, 296 | "metadata": {}, 297 | "output_type": "display_data" 298 | } 299 | ], 300 | "source": [ 301 | "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n", 302 | "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n", 303 | "plt.plot(test_point[0], test_point[1], 'bo', label='test_point')\n", 304 | "plt.xlabel('sepal length')\n", 305 | "plt.ylabel('sepal width')\n", 306 | "plt.legend()" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "# scikitlearn" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 14, 319 | "metadata": { 320 | "collapsed": true 321 | }, 322 | "outputs": [], 323 | "source": [ 324 | "from sklearn.neighbors import KNeighborsClassifier" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": 15, 330 | "metadata": {}, 331 | "outputs": [ 332 | { 333 | "data": { 334 | "text/plain": [ 335 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 336 | " metric_params=None, n_jobs=1, n_neighbors=5, p=2,\n", 337 | " weights='uniform')" 338 | ] 339 | }, 340 | "execution_count": 15, 341 | "metadata": {}, 342 | "output_type": "execute_result" 343 | } 344 | ], 345 | "source": [ 346 | "clf_sk = KNeighborsClassifier()\n", 347 | "clf_sk.fit(X_train, y_train)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": 16, 353 | "metadata": {}, 354 | "outputs": [ 355 | { 356 | "data": { 357 | "text/plain": [ 358 | "1.0" 359 | ] 360 | }, 361 | "execution_count": 16, 362 | "metadata": {}, 363 | "output_type": "execute_result" 364 | } 365 | ], 366 | "source": [ 367 | "clf_sk.score(X_test, y_test)" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": { 373 | "collapsed": true 374 | }, 375 | "source": [ 376 | "### sklearn.neighbors.KNeighborsClassifier\n", 377 | "\n", 378 | "- n_neighbors: 临近点个数\n", 379 | "- p: 距离度量\n", 380 | "- algorithm: 近邻算法,可选{'auto', 'ball_tree', 'kd_tree', 'brute'}\n", 381 | "- weights: 确定近邻的权重" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": null, 387 | "metadata": { 388 | "collapsed": true 389 | }, 390 | "outputs": [], 391 | "source": [] 392 | } 393 | ], 394 | "metadata": { 395 | "kernelspec": { 396 | "display_name": "Python 3", 397 | "language": "python", 398 | "name": "python3" 399 | }, 400 | "language_info": { 401 | "codemirror_mode": { 402 | "name": "ipython", 403 | "version": 3 404 | }, 405 | "file_extension": ".py", 406 | "mimetype": "text/x-python", 407 | "name": "python", 408 | "nbconvert_exporter": "python", 409 | "pygments_lexer": "ipython3", 410 | "version": "3.6.1" 411 | } 412 | }, 413 | "nbformat": 4, 414 | "nbformat_minor": 2 415 | } 416 | -------------------------------------------------------------------------------- /LeastSquaresMethod/README.md: -------------------------------------------------------------------------------- 1 | # Least Squares Method 2 | 3 | -------------------------------------------------------------------------------- /LogisticRegression/LR.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Logistic Regression 逻辑斯谛回归\n", 8 | "\n", 9 | "LR是经典的分类方法" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "\n", 17 | "回归模型:$f(x) = \\frac{1}{1+e^{-wx}}$\n", 18 | "\n", 19 | "其中wx线性函数:$wx =w_0*x_0 + w_1*x_1 + w_2*x_2 +...+w_n*x_n,(x_0=1)$\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": { 26 | "collapsed": true 27 | }, 28 | "outputs": [], 29 | "source": [ 30 | "from math import exp\n", 31 | "import numpy as np\n", 32 | "import pandas as pd\n", 33 | "import matplotlib.pyplot as plt\n", 34 | "%matplotlib inline\n", 35 | "\n", 36 | "from sklearn.datasets import load_iris\n", 37 | "from sklearn.model_selection import train_test_split" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": { 44 | "collapsed": true 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "# data\n", 49 | "def create_data():\n", 50 | " iris = load_iris()\n", 51 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 52 | " df['label'] = iris.target\n", 53 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 54 | " data = np.array(df.iloc[:100, [0,1,-1]])\n", 55 | " # print(data)\n", 56 | " return data[:,:2], data[:,-1]" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 3, 62 | "metadata": { 63 | "collapsed": true 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "X, y = create_data()\n", 68 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 4, 74 | "metadata": { 75 | "collapsed": true 76 | }, 77 | "outputs": [], 78 | "source": [ 79 | "class LogisticReressionClassifier:\n", 80 | " def __init__(self, max_iter=200, learning_rate=0.01):\n", 81 | " self.max_iter = max_iter\n", 82 | " self.learning_rate = learning_rate\n", 83 | " \n", 84 | " def sigmoid(self, x):\n", 85 | " return 1 / (1 + exp(-x))\n", 86 | "\n", 87 | " def data_matrix(self, X):\n", 88 | " data_mat = []\n", 89 | " for d in X:\n", 90 | " data_mat.append([1.0, *d])\n", 91 | " return data_mat\n", 92 | "\n", 93 | " def fit(self, X, y):\n", 94 | " # label = np.mat(y)\n", 95 | " data_mat = self.data_matrix(X) # m*n\n", 96 | " self.weights = np.zeros((len(data_mat[0]),1), dtype=np.float32)\n", 97 | "\n", 98 | " for iter_ in range(self.max_iter):\n", 99 | " for i in range(len(X)):\n", 100 | " result = self.sigmoid(np.dot(data_mat[i], self.weights))\n", 101 | " error = y[i] - result \n", 102 | " self.weights += self.learning_rate * error * np.transpose([data_mat[i]])\n", 103 | " print('LogisticRegression Model(learning_rate={},max_iter={})'.format(self.learning_rate, self.max_iter))\n", 104 | "\n", 105 | " # def f(self, x):\n", 106 | " # return -(self.weights[0] + self.weights[1] * x) / self.weights[2]\n", 107 | "\n", 108 | " def score(self, X_test, y_test):\n", 109 | " right = 0\n", 110 | " X_test = self.data_matrix(X_test)\n", 111 | " for x, y in zip(X_test, y_test):\n", 112 | " result = np.dot(x, self.weights)\n", 113 | " if (result > 0 and y == 1) or (result < 0 and y == 0):\n", 114 | " right += 1\n", 115 | " return right / len(X_test)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 5, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "LogisticRegression Model(learning_rate=0.01,max_iter=200)\n" 128 | ] 129 | } 130 | ], 131 | "source": [ 132 | "lr_clf = LogisticReressionClassifier()\n", 133 | "lr_clf.fit(X_train, y_train)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "data": { 143 | "text/plain": [ 144 | "1.0" 145 | ] 146 | }, 147 | "execution_count": 6, 148 | "metadata": {}, 149 | "output_type": "execute_result" 150 | } 151 | ], 152 | "source": [ 153 | "lr_clf.score(X_test, y_test)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 7, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "data": { 163 | "text/plain": [ 164 | "" 165 | ] 166 | }, 167 | "execution_count": 7, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | }, 171 | { 172 | "data": { 173 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VfW57/HPQxKSMMYQxgwmJIjIICCCCELAOiFVrFal\n2h61LQXpaXtsbfVebwdPe6r13mpbpoJWalWsI1qOw7GSMIMNg4AguhOGJEwhmDAlIdn53T920Bgz\nrCRr7zXkeb9evJK9slj7+e2lDyu/9d2/LcYYlFJK+UsnpwtQSillP23uSinlQ9rclVLKh7S5K6WU\nD2lzV0opH9LmrpRSPqTNXSmlfEibu1JK+ZA2d6WU8qFop544KSnJpKenO/X0SinlSZs3bz5mjOnd\n0n6Wm7uIRAF5QLExZnqDn2UDrwN76za9aox5uLnjpaenk5eXZ/XplVJKASKy38p+rbly/yGwG+jR\nxM/XNGz6SimlnGFpzl1EUoDrgSfDW45SSik7WL2h+gTwU6C2mX0uF5HtIvKWiAxtbAcRmSUieSKS\nV1JS0tpalVJKWdTitIyITAeOGmM2182tN2YLkGaMOSUi04DlwKCGOxljFgOLAcaMGaNrDSulHFFd\nXU1RURGVlZVOl9KkuLg4UlJSiImJadPftzLnPgG4oa5pxwE9RORZY8yd53Ywxpyo9/2bIrJARJKM\nMcfaVJVSSoVRUVER3bt3Jz09HRFxupwvMcZQWlpKUVERGRkZbTpGi9MyxpgHjTEpxph04HZgZf3G\nDiAi/aTuFRKRsXXHLW1TRUopFWaVlZX06tXLlY0dQETo1atXu36zaHPOXURmAxhjFgG3AHNEpAao\nAG43+hFPSikXc2tjP6e99bWquRtjcoHcuu8X1ds+D5jXrkqU8pHlW4t57J09HCyrYEBCPPdfM5gZ\no5KdLkt1ILr8gFI2W761mAdf3UFxWQUGKC6r4MFXd7B8a7HTpSkXefvttxk8eDBZWVk88sgjth9f\nm7tSNnvsnT1UVAe/sK2iOshj7+xxqCLlNsFgkLlz5/LWW2+xa9culi1bxq5du2x9DsfWllHKrw6W\nVbRqu3I/u6fZ3n//fbKyshg4cCAAt99+O6+//joXXXSRXSXrlbtSdhuQEN+q7crdwjHNVlxcTGpq\n6mePU1JSKC62d9pOm7tSNrv/msHEx0R9YVt8TBT3XzPYoYpUe3h1mk2nZZSy2blf1zUt4w/hmGZL\nTk6msLDws8dFRUUkJ9v734c2d6XCYMaoZG3mPjEgIZ7iRhp5e6bZLr30Uj755BP27t1LcnIyL7zw\nAs8//3x7yvwSnZZRSqlmhGOaLTo6mnnz5nHNNdcwZMgQbr31VoYObXS9xbY/h61HU0opnwnXNNu0\nadOYNm2aHSU2Spu7Ukq1wIvTbDoto5RSPqTNXSmlfEibu1JK+ZA2d6WU8iFt7kop5UPa3JVSygH3\n3HMPffr0YdiwYWE5vjZ3pQgtDjXhkZVkPPDfTHhkpa69rsLurrvu4u233w7b8bW5qw5PP1xDtWj7\ni/D4MPhlQujr9hfbfchJkyaRmJhoQ3GN0+auOjyvrvqnImT7i/CPH0B5IWBCX//xA1safDhpc1cd\nnn64hmrWew9DdYP/FqorQttdTJu76vD0wzVUs8qLWrfdJbS5qw5PP1xDNatnSuu2u4Q2d9XhzRiV\nzG+/NpzkhHgESE6I57dfG+65haJUmFz5c4hp8FtcTHxoezvMnDmT8ePHs2fPHlJSUnjqqafadbyG\ndFVIpfDmqn8qQkbcGvr63sOhqZieKaHGfm57Gy1btsyG4pqmzV25it2fMq+ULUbc2u5mHmna3JVr\nnMubn4slnsubA9rglWolnXNXrqF5cxVJxhinS2hWe+vT5q5cQ/PmKlLi4uIoLS11bYM3xlBaWkpc\nXFybj6HTMso1wvEp80o1JiUlhaKiIkpKSpwupUlxcXGkpLQ9bqnNXbnG/dcM/sKcO2jeXIVHTEwM\nGRkZTpcRVtrclWuE61PmleqILDd3EYkC8oBiY8z0Bj8T4A/ANOAMcJcxZoudhaqOQfPmyo+MMdTU\nGmKiInebszVX7j8EdgM9GvnZdcCguj/jgIV1X5VSzdBcv7/V1hre+fAw83MDTB8xgNmTMyP23Jaa\nu4ikANcDvwHua2SXG4FnTOjW80YRSRCR/saYQ/aVqpS/aK7fv6qDtbyx7SALV+UTOHqKjKSupJwX\n2WCA1Sv3J4CfAt2b+HkyUFjvcVHdNm3uSjWhuVy/NndvqqwO8vLmIhatyqfo0wou7NedP80cxbTh\n/YnqJBGtpcXmLiLTgaPGmM0ikt2eJxORWcAsgLS0tPYcSinP01y/f5yuquH5TQdYsqaAoyerGJma\nwC+/OpQrh/QhdEsy8qxcuU8AbhCRaUAc0ENEnjXG3Flvn2Igtd7jlLptX2CMWQwsBhgzZow73z2g\nVIRort/7ys9Us3T9Pp5ev5eyM9VcntmLJ24byfjMXo419XNabO7GmAeBBwHqrtx/0qCxA7wBfF9E\nXiB0I7Vc59uVap7m+r2r5GQVT64t4NkN+zl9NshXhvTh3ilZjE47z+nSPtPmnLuIzAYwxiwC3iQU\ngwwQikLebUt1SvmY5vq9p7isgsWr8nnhX4VUB2u5fsQA7s3OZEj/xkKEzhKn1lYYM2aMycvLc+S5\nlVKqNQpKTrEwN5/XtoZmm782Opk52VlkJHWNeC0istkYM6al/fQdqsrXHlq+g2WbCgkaQ5QIM8el\n8usZw50uS3nEroMnWJAb4M0dh4iJ6sSdl53PdycNJNkD90W0uSvfemj5Dp7deOCzx0FjPnusDV41\nZ8uBT5m/MsB7Hx2lW2w035ucybcnZpDULdbp0izT5q58a9mmwia3a3NXDRljWJ9fyryVATYUlJLQ\nJYb7rrqAfxufTs8uMU6X12ra3JVvBZu4n9TUdtUx1dYa3vvoKPNzAmwrLKNP91geun4IM8em0TXW\nuy3Su5Ur1YIokUYbeZTD+WPlDsFaw4rtB1mYm89Hh0+SmhjPb24axs2jU4iLiXK6vHbT5q58a+a4\n1C/MudffrjquszW1vLa1iIW5+ewrPUNWn248ftvFfHXEAKIjuGpjuGlzV751bl5d0zIKoOJskBf+\ndYDFqws4VF7JsOQeLLpzNFdf1I9OEV73JRI0566U8rUTldU8u3E/T63ZS+nps4xNT2Tu1CwmDUpy\nfImAttCcu3KVO5ZsYF3+8c8eT8hM5LnvjnewovDSddqdd/z0WZ5et5el6/dxsrKGyRf0Zu6ULMZm\nJDpdWkRoc1dh17CxA6zLP84dSzb4ssHrOu3OOnKiksWrC3h+0wEqa4JcO7Qf92ZnMTylp9OlRZQ2\ndxV2DRt7S9u9Ttdpd8aB0jMsWp3Py3lFBI3hxosHMCc7k0F9m/oYCn/T5q6UzXSd9sj6+MhJFubm\n88YHB4kS4etjUpg9OZPUxC5Ol+Yobe5K2UzXaY+MHUXlzMv5hHc+PEKXzlHcMyGd71wxkL494pwu\nzRW0uauwm5CZ2OgUzIRMf97Y0nXaw2tTQSnzc/NZ/XEJPeKi+cHULO6ekMF5XTs7XZqraHNXYffc\nd8d3qLSMrtNuP2MMuR+XsCAnwL/2fUpSt8787NoLufOyNLrHeW/dl0jQnLtSyrVqaw3vfHiY+bkB\ndhafYEDPOL43OZPbLk31xRIBbaE5d+Uqdue+rR5P8+beVB2s5Y1tB1mQGyC/5DQZSV353c0jmDEq\nmc7R/lkiIJy0uauwszv3bfV4mjf3nsrqIC9tLuLPq/Ip+rSCC/t1508zRzFteH+ifLhEQDhpc1dh\nZ3fu2+rxNG/uHaeranh+0wGWrCng6MkqRqUl8KsbhjL1wj6eXCLADbS5q7CzO/dt9XiaN3e/8jPV\nLF2/j6fX76XsTDUTsnrxxG0jGZ/ZS5t6O2lzV2Fnd+7b6vE0b+5eJSereHJtAc9u2M/ps0G+MqQP\n907JYnTaeU6X5ht6Z0KF3f3XDCa+QbKhPblvq8ez+3lV+xWXVfCL13cy8dGVLFldwNQhfXnrh1fw\n5L9dqo3dZnrlrsLO7ty31eNp3tw9CkpOsTA3n9e2FiMCXxuVwuzsTDKSujpdmm9pzl0pFTa7Dp5g\nfm6AN3cconNUJ2aOTWPWpIE6NdYOmnP3CbfntN1en3LG5v2fsiAnwHsfHaVbbDSzJ2fy7YkZJHWL\ndbq0DkObu4u5Paft9vpUZBljWJ9fyryVATYUlHJelxh+fNUFfGt8Oj276BIBkabN3cXcntN2e30q\nMmprDe99dJR5OQE+KCyjb49YHrp+CDPHptE1VluMU/SVdzG357TdXp8Kr2CtYcX2gyzMzeejwydJ\nTYznNzcN45ZLUoiN7pjrvriJNncXc3tO2+31qfA4W1PLa1uLWJibz77SMwzq043Hb7uYr44YQHSU\npqvdQs+Ei7k9p+32+pS9Ks4GeXrdXiY/lsPPXtlB97gYFt15Ce/8aBI3jUrRxu4yeuXuYm7Pabu9\nPmWPE5XV/G3Dfv6ydi+lp88yNj2RR24ewaRBSbpEgIu1mHMXkThgNRBL6B+Dl40xv2iwTzbwOrC3\nbtOrxpiHmzuu5tyVcrfjp8/y9Lq9LF2/j5OVNUy+oDdzp2QxNsOfn6DlFXbm3KuAqcaYUyISA6wV\nkbeMMRsb7LfGGDO9LcUq73po+Q6WbSokaAxRIswcl8qvZwxv835O5eY1r/+5w+WVLFlTwPObDlBZ\nE+Taof24NzuL4Sk9nS5NtUKLzd2ELu1P1T2MqfvjzNtalas8tHwHz2488NnjoDGfPa7fuK3u51Ru\nXvP6IQdKz7BwVT6vbC4iaAw3jhzAvdmZZPXp7nRpqg0s3QERkSgR2QYcBd41xmxqZLfLRWS7iLwl\nIkNtrVK50rJNhZa2W92vudx8ODn1vG7x8ZGT/MfftzHl/+XyyuYivj4mhdyfZPP7W0dqY/cwSzdU\njTFBYKSIJACvicgwY8zOertsAdLqpm6mAcuBQQ2PIyKzgFkAaWlp7S5eOSvYxP2ahtut7udUbr6j\n5vW3F5UxPyfAOx8eoUvnKO6ZkM53rhhI3x5xTpembNCqtIwxpkxEcoBrgZ31tp+o9/2bIrJARJKM\nMcca/P3FwGII3VBtV+XKcVEijTbuqAYJCqv7OZWb72h5/U0FpczLCbDmk2P0iIvmB1cO4u7L0zmv\na2enS1M2anFaRkR6112xIyLxwFXARw326Sd1mSgRGVt33FL7y1VuMnNcqqXtVvdzKjffEfL6xhhy\n9hzl64vWc9vijew+dIKfXXsh6x6Yyn1XXaCN3YesXLn3B/4qIlGEmvaLxpgVIjIbwBizCLgFmCMi\nNUAFcLtxai1hFTHnboa2lIKxup9TuXk/5/Vraw1vf3iY+TkBPjx4ggE94/jVDUO57dJU4mJ0iQA/\n0/XclfKh6mAtb2w7yILcAPklp8lI6sqc7ExmjEymc7S+k9TLdD13n7A7f201b2738ayOw+3jdbvK\nrS/y0pv/w59PTaDI9OHChCB/mjmGacP7E9VJ303akWhzdzG789dW8+Z2H8/qONw+Xjc7XVXD88vf\nYMm2ao6amxgln/CrmL8yteYjRP4InW51ukQVYfr7mYvZnb+2mje3+3hWx+H28bpR2Zmz/OGfnzDh\n0ZX8Zmssg6SI52N+zaudf8GVUVuRmgp4r9mVQJRP6ZW7i9mdv7aaN7f7eFbH4fbxuknJySqeXFvA\nsxv2c/pskK8M6cvc/O8xqlPgyzuXF0W+QOU4be4uZnf+2mre3O7jWR2H28frBsVlFfx5VT5//1ch\n1cFarh8RWiJgSP8e8HgVlDfyl3qmRLxO5TydlnExu/PXVvPmdh/P6jjcPl4n5Zec4v6XPmDy73JY\n9v4BZoxM5r0fZ/OnmaNCjR3gyp9DTIN/CGPiQ9tVh6NX7i5md/7aat7c7uNZHYfbx+uEXQdPMD83\nwJs7DhEb3Yk7LzufWZMGNv7bzIi6m6bvPRyaiumZEmrsI/RmakekOXelXGjz/k+ZnxNg5UdH6RYb\nzTfHn8+3J2aQ1C3W6dKUwzTnriLCqfy6HxljWBcoZX5OgA0FpZzXJYYfX3UB37o8nZ7xMU6X17zt\nL+pvDC6jzV21mVP5db+prTX8c/cR5ufm80FhGX17xPLQ9UOYOTaNrrEe+F90+4vwjx9Add3N8PLC\n0GPQBu8gD/yXo9yquVx6/aZtdb+OJlhrWLH9IAty8tlz5CSpifH8103DufmSZGKjPbTuy3sPf97Y\nz6muy9drc3eMNnfVZk7l173ubE0tr24pYtGqfPaVnmFQn248ftvFfHXEAKKjPBhgaypHr/l6R2lz\nV23mVH7dqyrOBln2/gGWrCngUHklw5N7sujOS7j6or508vK6Lz1TQlMxjW1XjvHgZYJyC6fy615z\norKa+TkBJj66kodX7CI1sQt/vWcsb3x/AtcO6+ftxg6ar3cpvXJXbeZUft0rjp8+y1/W7uWvG/Zx\nsrKGyRf05vtTs7g0PdHp0uyl+XpX0py7UjY7XF7J4tUFLHv/AJU1Qa4d2o+5U7IYltzT6dKUD2jO\n3SF257mtHs+pdcs1v/65/aWnWbSqgFc2FxE0hhtHhtZ9yerTvf0H91OO3E9jscKh8Wpzt5HdeW6r\nx3Nq3XLNr4d8fOQkC3ICvPHBQaKjOnHrpSl8b1ImqYld7HkCP+XI/TQWKxwcr95QtZHd65FbPZ5T\n65bbPV6v2V5Uxqxn8rj68dX8z64jfHtiBmt/OoVfzxhuX2OH5nPkXuOnsVjh4Hj1yt1Gdue5rR7P\nqXXLO2J+3RjDpr3HmZ8TYM0nx+gRF80PrhzE3Zenc17XzuF5Uj/lyP00FiscHK82dxvZnee2ejyn\n1i3vSPl1Ywy5e0qYnxMgb/+nJHXrzAPXXcgd49LoHhfmdV/8lCP301iscHC8Oi1jI7vz3FaP59S6\n5R0hvx6sNby54xDT/7SWu5f+i4NlFfzqhqGs/dlUZk/ODH9jB3/lyP00FiscHK9eudvI7jy31eM5\ntW65n/Pr1cFaXt92kIW5AfJLTpOR1JXf3TKCGSOT6Rwd4WsiP+XI/TQWKxwcr+bclaqnsjrIS3mF\nLFpVQHFZBUP692DulEyuG9afKK+/k1T5gubcfcKp3PwdSzawLv/4Z48nZCby3HfHt/l53e5UVQ3P\nb9rPkjV7KTlZxai0BB6+cShTL+yDePgzV1U9K+6DzUvBBEGi4JK7YPrv2348l+f1tbm7mFO5+YaN\nHWBd/nHuWLLBdw2+7MxZlq7fx9Pr9lFeUc2ErF784faRjB/YS5u6n6y4D/Ke+vyxCX7+uC0N3gN5\nfb2h6mJO5eYbNvaWtnvR0ZOV/PbN3Ux4ZCVP/PMTLk1P5LV7L+e571zG5ZlJ2tj9ZvPS1m1viQfy\n+nrl7mJO5eb9rOjTM/x5VQF/zyukJljL9BEDuHdKJhf26+F0aSqcTLB121vigby+NncXcyo370f5\nJadYmJvP8q3FiMDXRqUwOzuTjKSuTpemIkGiGm/k0sZPvPJAXl+nZVzMqdz8hMzGl6RtarubfXiw\nnLnPbeErv1/Fiu0HufOy81l1/xQevWWENvaO5JK7Wre9JR7I6+uVu4s5lZt/7rvjPZ+W2bz/OPNW\nBsjZU0L32GjmTM7knokZJHWLdbo05YRzN03tSst4IK/fYs5dROKA1UAsoX8MXjbG/KLBPgL8AZgG\nnAHuMsZsae64mnNXdjPGsDZwjPk5ATYWHOe8LjHcMyGDb12eTs/4CLyTVKkIsDPnXgVMNcacEpEY\nYK2IvGWM2Vhvn+uAQXV/xgEL6776htV8uNvXN7e67ruXxltba/jn7iPMzwnwQVE5fXvE8tD1Q/jG\nuDS6fPQqLLrJvqsrq1lpuzPQbj+ek6yOxU9jtqDF5m5Cl/an6h7G1P1peLl/I/BM3b4bRSRBRPob\nYw7ZWq1DrObD3b6+udV1370y3ppgLf+94xALcvLZc+QkaYld+K+bhnPzJcnERkfZn0W2mpW2+3nd\nfjwnWR2Ln8ZskaUbqiISJSLbgKPAu8aYTQ12SQbq3zouqtvmC1bz4W5f39zquu9uH29VTZBl7x/g\nyt+v4ocvbKPWGJ64bSQrfzyZb4xLCzV2sD+LbDUrbffzuv14TrI6Fj+N2SJLN1SNMUFgpIgkAK+J\nyDBjzM7WPpmIzAJmAaSlpbX2rzvGaj7c7Tlyq+u+u3W8Z87WsOz9QpasLuDwiUqGJ/dk0Z2XcPVF\nfenU2LovdmeRrWal7X5etx/PSVbH4qcxW9SqKKQxpgzIAa5t8KNioP76sil12xr+/cXGmDHGmDG9\ne/duba2OaSoH3nC71f2c0tT67g23u228JyqrmZ8TYOKjOfznil2k9erCM/eM5Y3vT+DaYf0ab+zQ\ndOa4rVnkpjLRDbfb/bxuP56TrI7FT2O2qMXmLiK9667YEZF44Crgowa7vQF8S0IuA8r9Mt8O1vPh\nbl/f3Oq6724Zb+mpKh575yMm/HYlj72zhxEpPXlp9nhe/N54Jl3Qu+UlAuzOIlvNStv9vG4/npOs\njsVPY7bIyrRMf+CvIhJF6B+DF40xK0RkNoAxZhHwJqEYZIBQFPLuMNXrCKv5cLevb2513Xenx3uo\nvILFqwtY9v4BqmpquW5YP+7NzmJYcs/WHcjuLLLVrLTdz+v24znJ6lj8NGaLdD135Rr7S0+zaFU+\nL28uotbAjSMHcG92Jll9ujtdmlKuoeu5O8QNuW+v2XP4JAtyA/zjg4NER3XitktT+d6kTFITuzhd\n2pe5PVNtd33hGIdm9iNCm7uNnM59e80HhWXMywnw7q4jdOkcxXeuGMh3JmbQp0ec06U1zu2Zarvr\nC8c4NLMfMTotY6MJj6xsdNXF5IR41j0w1YGK3McYw8aC4yzIDbDmk2P0iIvmrgkZ3H15Oud17ex0\nec17fFgTKwGmwn/sbP1+bq8vHOOw+5hOvdYO0mkZB7g95+4kYwy5e0qYlxNg8/5PSeoWywPXXcgd\n49LoHueRdV/cnqm2u75wjEMz+xGjzd1GHXm99KYEaw1v7zzM/JwAuw6dIDkhnodvHMqtY1KJi2ki\nN+5WVtfwdmqtb7vrC8c47D6mB9ZVd4qu524jt+fcI6k6WMtLeYVc9fgq5j6/hcrqIL+7ZQQ5P8nm\nW+PTvdfYwf2ZarvrC8c4NLMfMXrlbiO359wjobI6yEt5hSxaVUBxWQVD+vdg3jdGcd2w/kQ19U5S\nr3B7ptru+sIxDs3sR4zeUFW2OFVVw3Mb97NkzV6OnapidFoC35+axZTBffTDppWykd5QVRFRduYs\nT6/bx9L1+yivqGZiVhL3ThnJ+IG9OnZTt7ruu1PcXh+4/70CLqfNXbXJ0ZOVPLVmL89u3M/ps0G+\nMqQvc6dkMirtPKdLc57Vdd+d4vb6wP3vFfAAnZZRrVJ4/AyLVxfw97xCaoK1TB8xgHunZHJhvx5O\nl+Yev0psfHlgiYJfHP/y9khze33g/vcKOEinZZStAkdPsTA3n9e3FSMCN49OYfbkTNKTujpdmvtY\nXffdKW6vD9z/XgEP0OaumrWzuJwFuQHe2nmY2OhO3HnZ+cyaNLBDZ/dbJFFNXxm7gdvrA/e/V8AD\nNOeuGpW37zh3P/0+0/+0ljUfH2PO5EzW/mwqv7xhqDb2llhd990pbq8P3P9eAQ/QK3f1GWMMawPH\nmLcywKa9xzmvSww/ufoCvjk+nZ7xHlkiwA2srvvuFLfXB+5/r4AH6A1VRW2t4d3dR1iQE+CDonL6\n9ojlu1cM5Bvj0ujSWf/9V8pN9IaqalFNsJYV2w+xIDfAx0dOkZbYhf+6aTg3X5JMbLRD869uzxjb\nXZ/deXO3v34qYrS5d0BVNUFe3VLMolX57C89w6A+3XjitpFMH9Gf6CgHb8O4PWNsd312583d/vqp\niNJpmQ7kzNkalr1fyJLVBRw+UcmIlJ7MnZLFVUP60skN6764PWNsd312583d/vopW+i0jPpMeUU1\nf9uwj7+s28fx02cZl5HI724ZwRWDkty1RIDbM8Z212d33tztr5+KKG3uPlZ6qoq/rNvLM+v3c7Kq\nhuzBvfn+lCzGpCc6XVrj3J4xtrs+u/Pmbn/9VERpzt2HDpVX8Kt/fMiER1eyIDefKy5IYsW/T2Tp\n3WPd29jB/Rlju+uzO2/u9tdPRZReufvIvmOnWbQqn1e2FFFrYMbIZOZkDySrT3enS7PG7Rlju+uz\nO2/u9tdPRZTeUPWBPYdPsiA3wD8+OEh0VCduG5PKrEkDSU3s4nRpSimb6Q3VDuCDwjLm5QR4d9cR\nunSO4jtXDOQ7EzPo0yPO6dK8x+58uNXjaS5dhYk2d48xxrCx4DjzcwKsDRyjZ3wMP7xyEHddns55\nXTs7XZ432Z0Pt3o8zaWrMNJpGY8wxpCz5yjzc/LZvP9TkrrF8t0rMrjjsvPpFqv/RreL3flwq8fT\nXLpqA52W8YlgreHtnYeZnxNg16ETJCfE8/CNQ7l1TCpxMS5aotXL7M6HWz2e5tJVGGlzd6nqYC3L\ntxazcFU+BSWnGZjUlcduGcGMUcnEOLlEgB/ZnQ+3ejzNpasw0i7hMpXVQZ7ZsI/sx3K5/+XtxEZH\nMf8bo3n3vsl8fUyqNvZwsDsfbvV4mktXYaRX7i5xqqqG5zbuZ8mavRw7VcXotAT+c8ZQpgzu464l\nAvzI7ny41eNpLl2FUYs3VEUkFXgG6AsYYLEx5g8N9skGXgf21m161RjzcHPH1RuqIZ+ePsvS9ftY\nun4f5RXVTMxKYu6ULC4bmKhNXSn1JXbeUK0BfmyM2SIi3YHNIvKuMWZXg/3WGGOmt6XYjujoiUqe\nXLuXZzfu58zZIFdd1Je5U7IYmZrgdGnh4fY8t+bS20dfF9dpsbkbYw4Bh+q+Pykiu4FkoGFzVxYU\nHj/D4tUF/D2vkJpgLV+9eABzsjO5sF8Pp0sLH7fnuTWX3j76urhSq3LuIpIOrAaGGWNO1NueDbwK\nFAHFwE+MMR82d6yONi0TOHqKhbn5vL6tGBG4eXQKsydnkp7U1enSws/teW7NpbePvi4RZXvOXUS6\nAa8AP6o3cWVxAAAKpklEQVTf2OtsAdKMMadEZBqwHBjUyDFmAbMA0tLSrD61p+0sLmdBboC3dh4m\nNroT3xx/PrMmDaR/z/iW/7JfuD3Prbn09tHXxZUsNXcRiSHU2J8zxrza8Of1m70x5k0RWSAiScaY\nYw32WwwshtCVe7sqd7m8fceZlxMgd08J3WOjuTc7k3smZNCrW6zTpUWe2/PcmktvH31dXKnF0LSE\nIhtPAbuNMY2uRSoi/er2Q0TG1h231M5CvcAYw5pPSrjtzxu4ZdEGtheV85OrL2DtA1O5/5oLO2Zj\nB/fnuTWX3j76uriSlSv3CcA3gR0isq1u2/8C0gCMMYuAW4A5IlIDVAC3G6cWrXFAba3h3d1HWJAT\n4IOicvr1iOP/TL+ImWNT6dJZ30rg+jy35tLbR18XV9KFw9qhJljLiu2HWJAb4OMjp0hL7MKc7Ey+\nNjqZ2Ghd90UpZT9dOCyMqmqCvLK5mEWr8jlw/AwX9O3GH24fyfXD+xPtl+UBOlpuecV99n0iklIu\noM29Fc6crWHZ+4UsWV3A4ROVjEjpyf++/hKuGtKXTp189G7SjpZbXnEf5D31+WMT/PyxNnjlUdrc\nLSivqOZvG/bxl3X7OH76LOMyEnns6yOYmJXkzyUC3nv488Z+TnVFaLsfm/vmpU1v1+auPEqbezNK\nT1Xx1Nq9/G3Dfk5W1TBlcG/mTsliTHqi06WFV0fLLZtg67Yr5QHa3BtxqLyCxasLWPb+Aapqapk2\nrD9zsjMZltzT6dIio6PlliWq8UYuelNceZc293r2HTvNolX5vLKliFoDM0YmMyc7k6w+3ZwuLbKu\n/PkX59zB37nlS+764px7/e1KeZQ2d2DP4ZPMzwmwYvtBoqM6cfulacyaNJDUxC5Ol+aMjpZbPjev\nrmkZ5SMdOue+rbCM+TkB3t11hK6do7jzsvP59sQM+vSIc7QupZRqiubcm2CMYWPBcebnBFgbOEbP\n+Bh+9JVB3HV5OgldOkesjuVbi3nsnT0cLKtgQEI8918zmBmjkiP2/LbxSx7eL+Nwir5+rtNhmrsx\nhpw9R5m3MsCWA2UkdYvlwesu5I7LzqdbbGRfhuVbi3nw1R1UVIdu4hWXVfDgqzsAvNXg/ZKH98s4\nnKKvnyv5flomWGt4a+ch5ufks/vQCZIT4pk9eSBfH5NKXIwzaYgJj6ykuKziS9uTE+JZ98BUBypq\nI7+s4+2XcThFX7+I6vDTMtXBWpZvLWbhqnwKSk4zsHdX/u/XL+bGkQOIcXiJgIONNPbmtruWX/Lw\nfhmHU/T1cyXfNffK6iAv5hXy51UFFJdVcFH/Hsz/xmiuHdaPKJcsETAgIb7RK/cBCR77AA+/5OH9\nMg6n6OvnSj5Z5QpOVdWwaFU+Ex/N4eevf0i/nnE8fdel/PcPJnL9iP6uaewA918zmPgGU0LxMVHc\nf81ghypqI7+s4+2XcThFXz9X8vyV+6enz7J0/T6Wrt9HeUU1VwxKYu6UUYzLSHTtui/nbpp6Pi3j\nlzy8X8bhFH39XMmzN1SPnqjkybV7eXbjfs6cDXL1RX2ZOyWLi1MTbKxSKaXcxbc3VIs+PcOiVfm8\nmFdETbCWGy4ewJzsLAb36+50aUo5z+68uebXPctzzX1n8Qn+/q9CbrkkhdmTMzm/V1enS1LKHezO\nm2t+3dM8Ny1TW2s4erKKfj11iQClvsDuvLnm113J6rSM59IynTqJNnalGmN33lzz657mueaulGpC\nU7nytubN7T6eiiht7kr5hd15c82ve5o2d6X8YsSt8NU/hubEkdDXr/6x7Tc/7T6eiijP3VBVSqmO\nzLc3VJVSSrVMm7tSSvmQNnellPIhbe5KKeVD2tyVUsqHtLkrpZQPaXNXSikf0uaulFI+1GJzF5FU\nEckRkV0i8qGI/LCRfURE/igiARHZLiKjw1Oucp3tL4ZWD/xlQujr9hedrkgphbX13GuAHxtjtohI\nd2CziLxrjNlVb5/rgEF1f8YBC+u+Kj/T9b6Vcq0Wr9yNMYeMMVvqvj8J7AYaftjnjcAzJmQjkCAi\n/W2vVrnLew9/3tjPqa4IbVdKOapVc+4ikg6MAjY1+FEyUH9V/yK+/A8AIjJLRPJEJK+kpKR1lSr3\n0fW+lXIty81dRLoBrwA/MsacaMuTGWMWG2PGGGPG9O7duy2HUG6i630r5VqWmruIxBBq7M8ZY15t\nZJdiILXe45S6bcrPdL1vpVzLSlpGgKeA3caY3zex2xvAt+pSM5cB5caYQzbWqdxI1/tWyrWspGUm\nAN8EdojItrpt/wtIAzDGLALeBKYBAeAMcLf9pSpXGnGrNnOlXKjF5m6MWQtIC/sYYK5dRSmllGof\nfYeqUkr5kDZ3pZTyIW3uSinlQ9rclVLKh7S5K6WUD2lzV0opH9LmrpRSPiShiLoDTyxSAuxv419P\nAo7ZWI6T/DIWHYe7+GUc4J+x2DWO840xLS7O5Vhzbw8RyTPGjHG6Djv4ZSw6DnfxyzjAP2OJ9Dh0\nWkYppXxIm7tSSvmQV5v7YqcLsJFfxqLjcBe/jAP8M5aIjsOTc+5KKaWa59Urd6WUUs1wfXMXkSgR\n2SoiKxr5mYjIH0UkICLbRWS0EzVa0cI4skWkXES21f1x7UcZicg+EdlRV2deIz/3xDmxMA5PnBMR\nSRCRl0XkIxHZLSLjG/zcK+ejpXF45XwMrlfjNhE5ISI/arBPRM6JlQ/rcNoPgd1Aj0Z+dh0wqO7P\nOGBh3Vc3am4cAGuMMdMjWE97TDHGNJXX9dI5aW4c4I1z8gfgbWPMLSLSGejS4OdeOR8tjQM8cD6M\nMXuAkRC6oCP0caOvNdgtIufE1VfuIpICXA882cQuNwLPmJCNQIKI9I9YgRZZGIefeOKc+IGI9AQm\nEfoYTIwxZ40xZQ12c/35sDgOL7oSyDfGNHyzZkTOiaubO/AE8FOgtomfJwOF9R4X1W1zm5bGAXB5\n3a9ob4nI0AjV1RYG+KeIbBaRWY383CvnpKVxgPvPSQZQAjxdN+X3pIh0bbCPF86HlXGA+89HQ7cD\nyxrZHpFz4trmLiLTgaPGmM1O19IeFsexBUgzxowA/gQsj0hxbTPRGDOS0K+Wc0VkktMFtVFL4/DC\nOYkGRgMLjTGjgNPAA86W1CZWxuGF8/GZuqmlG4CXnKrBtc2d0Adz3yAi+4AXgKki8myDfYqB1HqP\nU+q2uUmL4zDGnDDGnKr7/k0gRkSSIl6pBcaY4rqvRwnNJY5tsIsXzkmL4/DIOSkCiowxm+oev0yo\nSdbnhfPR4jg8cj7quw7YYow50sjPInJOXNvcjTEPGmNSjDHphH69WWmMubPBbm8A36q7+3wZUG6M\nORTpWptjZRwi0k9EpO77sYTOS2nEi22BiHQVke7nvgeuBnY22M3158TKOLxwTowxh4FCERlct+lK\nYFeD3Vx/PqyMwwvno4GZND4lAxE6J15Iy3yBiMwGMMYsAt4EpgEB4Axwt4OltUqDcdwCzBGRGqAC\nuN24891lfYHX6v4fiwaeN8a87cFzYmUcXjkn/w48VzcNUADc7cHzAS2Pwyvn49wFw1XA9+pti/g5\n0XeoKqWUD7l2WkYppVTbaXNXSikf0uaulFI+pM1dKaV8SJu7Ukr5kDZ3pZTyIW3uSinlQ9rclVLK\nh/4/2nEMn3PSDR8AAAAASUVORK5CYII=\n", 174 | "text/plain": [ 175 | "" 176 | ] 177 | }, 178 | "metadata": {}, 179 | "output_type": "display_data" 180 | } 181 | ], 182 | "source": [ 183 | "x_ponits = np.arange(4, 8)\n", 184 | "y_ = -(lr_clf.weights[1]*x_ponits + lr_clf.weights[0])/lr_clf.weights[2]\n", 185 | "plt.plot(x_ponits, y_)\n", 186 | "\n", 187 | "#lr_clf.show_graph()\n", 188 | "plt.scatter(X[:50,0],X[:50,1], label='0')\n", 189 | "plt.scatter(X[50:,0],X[50:,1], label='1')\n", 190 | "plt.legend()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": { 196 | "collapsed": true 197 | }, 198 | "source": [ 199 | "## sklearn\n", 200 | "\n", 201 | "### sklearn.linear_model.LogisticRegression\n", 202 | "\n", 203 | "solver参数决定了我们对逻辑回归损失函数的优化方法,有四种算法可以选择,分别是:\n", 204 | "- a) liblinear:使用了开源的liblinear库实现,内部使用了坐标轴下降法来迭代优化损失函数。\n", 205 | "- b) lbfgs:拟牛顿法的一种,利用损失函数二阶导数矩阵即海森矩阵来迭代优化损失函数。\n", 206 | "- c) newton-cg:也是牛顿法家族的一种,利用损失函数二阶导数矩阵即海森矩阵来迭代优化损失函数。\n", 207 | "- d) sag:即随机平均梯度下降,是梯度下降法的变种,和普通梯度下降法的区别是每次迭代仅仅用一部分的样本来计算梯度,适合于样本数据多的时候。" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 8, 213 | "metadata": { 214 | "collapsed": true 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "from sklearn.linear_model import LogisticRegression" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 9, 224 | "metadata": { 225 | "collapsed": true 226 | }, 227 | "outputs": [], 228 | "source": [ 229 | "clf = LogisticRegression(max_iter=200)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 10, 235 | "metadata": {}, 236 | "outputs": [ 237 | { 238 | "data": { 239 | "text/plain": [ 240 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 241 | " intercept_scaling=1, max_iter=200, multi_class='ovr', n_jobs=1,\n", 242 | " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", 243 | " verbose=0, warm_start=False)" 244 | ] 245 | }, 246 | "execution_count": 10, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "clf.fit(X_train, y_train)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 11, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "data": { 262 | "text/plain": [ 263 | "1.0" 264 | ] 265 | }, 266 | "execution_count": 11, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | } 270 | ], 271 | "source": [ 272 | "clf.score(X_test, y_test)" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 12, 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "name": "stdout", 282 | "output_type": "stream", 283 | "text": [ 284 | "[[ 1.92455724 -3.20503018]] [-0.52573053]\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "print(clf.coef_, clf.intercept_)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 13, 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "data": { 299 | "text/plain": [ 300 | "" 301 | ] 302 | }, 303 | "execution_count": 13, 304 | "metadata": {}, 305 | "output_type": "execute_result" 306 | }, 307 | { 308 | "data": { 309 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8FfW5+PHPQwKEsMoiWwgBAi4gu4gKApHbKlpxrxbt\ndelFQbS9rbW19tfe29a22l0iIFVvqyLeat2qyG0lgIALZZFFEAkkQNiJAoGQkOX5/TEnIaRZhmQm\nZ+ac5/16ndfJmcyZ83zPaB5mvvPMI6qKMcYYA9As2gEYY4wJDksKxhhjKllSMMYYU8mSgjHGmEqW\nFIwxxlSypGCMMaaSJQVjjDGVfE8KIpIgImtF5K0afjdeRI6IyMeRx4/8jscYY0ztEpvgM74JbAba\n1fL7Zap6dRPEYYwxph6+JgURSQGuAh4Fvu3FNjt37qxpaWlebMoYY+LG6tWrD6lql/rW8/tI4ffA\nQ0DbOta5RETWA7uBB1X1k7o2mJaWxqpVqzwM0RhjYp+I7HCznm9zCiJyNXBAVVfXsdoaIFVVBwMz\ngddr2dZUEVklIqsOHjzoQ7TGGGPA34nmS4FrRCQXeAnIEJEXqq6gqkdV9Vjk5wVAcxHpXH1DqjpX\nVUeq6sguXeo9+jHGGNNAviUFVX1YVVNUNQ24BchS1duqriMi3UREIj+PisST71dMxhhj6tYUVx+d\nRkTuBVDVOcCNwDQRKQVOALeo3cvbGBNQJSUl5OXlUVRUFO1QapWUlERKSgrNmzdv0PslbH+DR44c\nqTbRbIyJhpycHNq2bUunTp2InOQIFFUlPz+fgoIC+vTpc9rvRGS1qo6sbxtW0WxMAM2bB2lp0KyZ\n8zxvXrQjMgBFRUWBTQgAIkKnTp0adSTT5KePjDF1mzcPpk6FwkLn9Y4dzmuAKVOiF5dxBDUhVGhs\nfHakYEzAPPLIqYRQobDQWW6M3ywpGBMwO3ee2XITXxYuXMg555xDeno6v/zlLz3fviUFYwImNfXM\nlpvg8npuqKysjPvuu4933nmHTZs2MX/+fDZt2uRFqJUsKRgTMI8+CsnJpy9LTnaWm/ComBvasQNU\nT80NNSYxrFy5kvT0dPr27UuLFi245ZZbeOONN7wLGksKxgTOlCkwdy707g0izvPcuTbJHDZ+zA3t\n3r2bXr16Vb5OSUlh9+7dDd9gDezqI2MCaMoUSwJhF9a5ITtSMMYYH/gxN9SzZ0927dpV+TovL4+e\nPXs2fIM1sKRgjDE+8GNu6MILL2Tr1q3k5ORw8uRJXnrpJa655prGBVqNJQVjjPGBH3NDiYmJZGZm\n8uUvf5nzzjuPm2++mYEDB3oXNDanYIwxvvFjbmjSpElMmjTJ241WYUcKxhhjKllSMMYYU8mSgjHG\nmEqWFIwxxlSypGCMMaaSJQVjPGBNcUyssKRgTCP5ceMzY2pz1113cfbZZzNo0CBftm9JwZhGsqY4\nplY58+D1NHixmfOc0/h/Kdxxxx0sXLiw0dupjSUFYxoprDc+Mz7LmQcrp0LhDkCd55VTG50YLrvs\nMjp27OhNjDWwpGBMI1lTHFOjdY9AWbVDyLJCZ3mAWVIwppGsKY6pUWEth4q1LQ8ISwrGNJI1xTE1\nSq7lULG25QFhScEYD0yZArm5UF7uPFtCMAx5FBKqHUImJDvLA8ySgokZVitgAqXPFBg1F5J7A+I8\nj5rrLG+EW2+9lYsvvpgtW7aQkpLCM8884028EXbrbBMTKmoFKi4NragVAPtXu4miPlManQSqmz9/\nvqfbq86OFExMsFoBY7xhScHEBKsVMMYblhRMTLBaAdNUVDXaIdSpsfFZUjAxwWoFTFNISkoiPz8/\nsIlBVcnPzycpKanB27CJZhMTKiaTH3nEOWWUmuokBJtkNl5KSUkhLy+PgwcPRjuUWiUlJZGSktLg\n90tQM15tRo4cqatWrYp2GMYY0yRUlRXZ+TyRtZVrh/bkaxc17JyoiKxW1ZH1ref76SMRSRCRtSLy\nVg2/ExF5QkSyRWS9iAz3Ox5jYonVZsQuVWXxpwe4fvb73PbMR+zIP05Sc//P+DfF6aNvApuBdjX8\n7kqgf+RxETA78myMqYfVZsSm8nLl75v2k7l4Kxt3H6Vnh1b89NpB3DQihaTmCb5/vq9JQURSgKuA\nR4Fv17DKZOA5dc5hfSgiHUSku6ru9TMuY2JBXbUZlhTCp6xcWbBhL5lZ2WzZX0DvTsk8fsNgrh3W\nkxaJTXdNkN9HCr8HHgLa1vL7nsCuKq/zIstOSwoiMhWYCpBq1xgaA1htRqwoLSvnjY/38OSSbLYf\nPE6/Lq353VeH8JXBPUhMaPoLRH1LCiJyNXBAVVeLyPjGbEtV5wJzwZlo9iA8Y0IvNdU5ZVTTchN8\nJ0vLeXVNHrOWbGPn54Wc260tmV8bxpWDupPQTKIWl59HCpcC14jIJCAJaCciL6jqbVXW2Q30qvI6\nJbLMGFOPRx89fU4BrDYjDIpKynh51S7mLN3O7sMnuKBne+bePoKJ53WlWRSTQQXfkoKqPgw8DBA5\nUniwWkIAeBOYISIv4UwwH7H5BGPcsdqMcDlxsox5H+1g7nvbOVBQzPDUDvzsukGMH9AFkegngwpN\nXrwmIvcCqOocYAEwCcgGCoE7mzoeY8JsyhRLAkF3rLiU5z/YwdPLtpN//CSj+3bk918dysX9OgUq\nGVRokqSgqkuAJZGf51RZrsB9TRGDMWdq+nSng1pZGSQkOKdqZs2KdlQmLI6cKOHP7+fy7IocDheW\nMLZ/Zx64vD8XpnWMdmh1sttcGFOD6dNh9uxTr8vKTr22xGDq8sXxkzyzPIc/v59LQXEpE887mxkZ\n/Rnaq0O0Q3PFbnNhTA0SE51EUF1CApSWNn08JvgOFhTz9LLtPP/hDgpPlnHloG7MyEhnYI/20Q4N\ncH+bCztSMKYGNSWEupab+LXvSBFzlm5j/sqdlJSVc/XgHszISGdA19rKs4LNkoIxNUhIqP1IwRiA\nvC8Kmb1kGy+vyqNMleuG9WT6+H707dIm2qE1iiUFY2owderpcwpVl5v4lnvoOLOWZPPqmt2IwI0j\nejF9fD96dUyu/80hYEnBmBpUTCbb1UemQvaBAjKzsnlz3R6aJzTjttG9uWdcX7q3bxXt0DxlE83G\nGFOHzXuPkpmVzYKNe0lKTOC20an8x2V9Obttw7ubRYNNNBtjTCOszzvMzKxs/rFpP21aJjJtXD/u\nHtOHTm1aRjs0X1mPZhN4EyeCyKnHxInRjsh/1jwnelbv+Jx/f3Yl12Su4KPt+XxrYn9WfC+Dh644\nN+YTAtiRggm4iRNh0aLTly1a5Cx/993oxOQ3a57T9FSVD7bnk5mVzfvb8unYugUPXXEOt4/uTduk\n5tEOr0nZnIIJtLpuDROy/3RdS0ur+ZbYvXtDbm5TRxPbVJX3th5i5qKtrNrxBV3atuSey/rytYtS\nSW4RW/9mtjkFY0LKmuf4T1VZtPkAM7O2si7vCN3bJ/Hf1wzkqxf2apKWl0FmScGYgLHmOf4pL1cW\nfrKPmVnZbN57lF4dW/GL6y/g+uE9aZkY38mggiUFE2iXX/6vcwoVy2OVNc/xXlm58tb6PWRmZbP1\nwDH6dm7Nr28awuShPWgehZaXQWZJwQTau+/+62Tz5ZfH7iQzWPMcL5WUlfP62t3MWrKNnEPHGdC1\nDX+4ZShXD+4R1ZaXQWYTzcaYmFNcWsYrq/OYvWQbeV+cYGCPdtyfkc6Xzu8WiJaX0eB2otmOm0zg\neX3NvtvtWa1A+BSVlPGnFTmM/9USHnltI53btOTZO0by1v1juGJQ97hNCGfCTh+ZQPP6mn2327Na\ngXA5Xlwa6X+cw6FjxYxK68jjNw5mTHrnQLa8DDI7fWQCzetr9t1uz2oFwqGgqITnIv2Pvygs4dL0\nTtyf0Z/RfTtFO7TAsToFExO8vmbf7fasViDYDhee5NkVufxpRQ5Hi0qZcE4XZmT0Z0Tvs6IdWuhZ\nUjCB5vU1+263Z7UCwZR/rJinl+fw/Ac7OFZcypfO78r9Gf25ICUYLS9jgU00m0B79FHnGv2qGnPN\nvtvtef25pnEOHC3iZ29tYsxji5mzdBvjzunCO98cy9yvj7SE4DE7UjCB5vU1+263Z7UCwbDn8Anm\nLN3GS//cRVm5MnlID6ZPSCf97HC3vAwym2g2xgTOzvxCZi/N5pXVeajCjSNSmDa+H707tY52aKFl\ndQpxLgzX2IchRtO0th08xnf+so4Jv1nCX1fv5pYLU1n60AR+ecNgSwhNxE4fxaAwXGMfhhhN09my\nr4DMxdm8vX4PLRKb8e8Xp3HPuL50bReulpexwE4fxaAwXGMfhhiN/zbuPkJmVjYLP9lH6xYJ3H5x\nGt8Y24fOcdDhrKlZnUIcC8M19mGI0fhn7c4vyMzKZtGnB2iblMgDGenceWkfzmrdItqhxT1LCjEo\nDNfYhyFG472VOZ8zM2sry7YeokNycx780gBuvziN9q3iq+VlkFlSiEFhuB9/GGI03lBVVmTn80TW\nVlbmfE7nNi14+MpzuW10b1q3tD9BQWN7JAaF4Rr7MMRoGkdVWbLlIE9kbWXtzsN0bdeSH119PreO\nSqVVC+tyFlQ20WyM8VR5ufKPzfvJzMpmw+4j9OzQimnj+3HTyBRreRlFUa9TEJEkEVkpIutE5BMR\n+e8a1hkvIkdE5OPI40d+xWOCZ/p0SEwEEed5+vTGrRfNugeruXBaXv5t3R4mPbGMe55fzdGiEh6/\nYTBLvjue20b3toQQEvWePhKRlsANQFrV9VX1J/W8tRjIUNVjItIcWC4i76jqh9XWW6aqV59Z2Cbs\npk+H2bNPvS4rO/V61qwzXy+adQ/xXnNRWlbOm+v28OTibLYdPE6/Lq353VeH8JXBPUi0/sehU+/p\nIxFZCBwBVgNlFctV9TeuP0QkGVgOTFPVj6osHw88eCZJwU4fxYbEROcPfHUJCVBaeubrRbPuIV5r\nLk6WlvPa2jxmLdnGjvxCzu3Wlvsz+nPFoG7W/ziAvKxTSFHVKxoYRAJOMkkHnqyaEKq4RETWA7tx\nEsQnNWxnKjAVINWuWYwJNf2hr2m52/WiWfcQbzUXRSVlvLxqF3OWbmf34RMMTmnP3NtHMPG8rtbu\nMga4SQrvi8gFqrrhTDeuqmXAUBHpALwmIoNUdWOVVdYAqZFTTJOA14H+NWxnLjAXnCOFM43DBE9C\nQu1HAA1ZL5p1D/FSc3HiZFmk5eV2DhQUM6L3WTx63SDGDehiLS9jSK0n/ERkQ+Rf8GOANSKyRUTW\nV1numqoeBhYDV1RbflRVj0V+XgA0F5HOZzwKEzoV59zrW+52vWj2P4j13gvHikuZvWQbYx7L4mdv\nb6Zvl9a8+I2LeOXeixl/ztmWEGJMXUcKjZr8FZEuQImqHhaRVsC/AY9VW6cbsF9VVURG4SSp/MZ8\nrgmHikniuXOdI4GEBOcPfdXJ4zNZL5p1D7Fac3HkRAl/fj+XZ1fkcLiwhMsGdOH+jHQuTOsY7dCM\nj9xMND+vqrfXt6yG9w0G/gwk4Pyx/4uq/kRE7gVQ1TkiMgOYBpQCJ4Bvq+r7dW3XJpqN8dcXx0/y\n7Ioc/rQil4LiUiaedzYzMvoztFeHaIdmGsHLieaB1TacAIyo702quh4YVsPyOVV+zgQyXcRgjPHZ\nwYJinl62nec/3EHhyTKuHNSNGRnpDOxh7S7jSV1zCg+LSAEwWESORh4FwAHgjSaL0DSIH8VUbovI\nvN6e27F4PWavxxtU+44U8d9/+4Sxj2exf/0zLD/3LnIGf4XZbW5gYPFb0Q7PNDVVrfMB/KK+dZry\nMWLECDV1e+EF1eRkVTj1SE52ljfUtGmnb6/iMW2av9tzOxavx+z1eINo1+fH9Qevrtf+P1ig/R5+\nW+fN/7mWzW+lOo9Tj5eSVbc34j8cExjAKnXxN7bWOQURGV5PMlnjfYqqn80p1M+PYiq3RWReb8/t\nWLwes9fjDZLcQ8eZtSSbV9fsRgRuGtmLaeP60eu986Gwhi8xuTdcm9vkcRpveTGnUFGxnASMBNYB\nAgwGVgEXNzZI4w8/iqncFpF5vT23Y/F6zF6PNwiyDxSQmZXNm+v20DyhGbeN7s094/rSvX0rZ4XC\nWr6s2pabmFRrUlDVCQAi8iowXCPFayIyCPivJonONIgfxVRui8i83p7bsXg9Zq/HG02b9x4lMyub\nBRv3kpSYwDfG9uUbY/twdttq/Y+TU2s5UoixKjxTJzd3qzpHq1Qzq1ORfJ5/IZnG8qOYym0Rmdfb\nczsWr8fs9XijYX3eYf7juVVc+YdlLP3sINPH92PF9zP4waTz/jUhAAx5FBKqfYkJyc5yEz/qm3QA\n5gNPA+Mjjz8C891MWPjxsIlmd154QbV3b1UR57kxk8wVpk1TTUhwJlwTEho/6ep2e27H4vWYvR5v\nU1mVm69ff+Yj7f29t/SCHy/U3/1jix4+ftLdm7e/oPpab9V54jzbJHPMoLETzRVEJAmnwOyyyKL3\ngNmqWuRLlqqHTTQb869UlQ+3O/2P39+WT8fWLfjG2D7cPro3bZOs/7HxsMmOqhap6u9U9brI43fR\nSggmtkSr/iCWqCpLPzvIzU99wK1//JCtB47xw6vOY/n3JjB9fHq4EkLOPHg9DV5s5jzn2I6Ohlon\nmkXkL6p6s4hsAP7lcEJVB/samYlpbhvTxHsDm9qoKos2H2Bm1lbW5R2hR/skfjJ5IDeP7EVS8xDO\nhufMg5VToSyyowt3OK8B+sTxjo6CuuoUuqvqXhHpXdPvVbWGyxT8Z6ePYkO06g/CrrxcWfjJPmZm\nZbN571F6dWzF9PHp3DA8hRaJIe5y9nqa1Uj4rNF1Cqq6N/LjROA9Vd3qVXDGRKv+IKzKypW31u8h\nMyubrQeO0bdza3590xAmD+1B81hoeWk1EoHh5oZ4qcBTIpKG00XtPZy+yh/7GJeJcdGqPwibkrJy\nXl+7m1lLtpFz6DgDurbhiVuHcdUF3WOr5aXVSASGm4nmH6tqBs7dUpcB38VJDsY0WLTqD8KiuNTp\ncjbh10v47ivrSW6RwJzbhrPwm5dxzZAesZUQwGokAqTeIwUR+SFwKdAGWAs8iJMcjGkwt41pYrWB\nTW2KSsp4aeVOnnpvO3uPFDG0Vwd+MnkgE2K9w1nFZPK6R5xTRsmpTkKwSeYm56ZOYQ1OE5y3gaXA\nB6pa3ASx1cgmmk0sOl5cGul/nMOhY8WMSuvI/ZenMya9c2wnA9NkvKxTGI4z2bwSp6XmBhFZ3vgQ\nTVVeX4vvdnvR7Blg9QdwtKiEJxdnM+axLH6+4FPO7daW/506mr/cezFj+3fxLiHEUg1ALI3FrSYc\ns5vTR4OAscA4nLul7sJOH3nK62vx3W5v+nSYPfvU67KyU6+r90D2WrzXHxwuPMmzK3L504ocjhaV\nMuGcLszI6M+I3md5/2GxVAMQS2Nxq4nH7Ob00Vs4VxwtB/6pqiWeR3EGYvH0kdfX4rvdXjR7BsRr\n/UH+sWKeXp7D8x/s4FhxKV8e2JUZE/pzQYqPLS9jqQYglsbilkdj9qxHs6pe7fpTTYN4fS2+2+1F\ns2dAvNUfHDhaxNz3tjPvo50UlZZx1QXdmZGRzrnd2vn/4bFUAxBLY3Gricfspk7B+Mzra/Hdbi+a\nPQPipf5gz+ETzFm6jZf+uYuycmXykB5Mn5BO+tltmi6IWKoBiKWxuNXEY46BUsjw8/pafLfbi2bP\ngFivP9iZX8j3/7qecb9azPyVO7l+WE+yvjOO3351aNMmBIitGoBYGotbTT1mN/fXDtIjVvspeN0L\nwO32otkzwI+eD9GWfaBA//N/12rfh9/W/o8s0B++tkHzviiMdlix1SchlsbilgdjprH9FETkb9Rw\nd9QqyeQaf9JU3WJxotmE35Z9BWQuzuat9XtomdiMKRf1ZuplfenaroYOZ8ZEgRcTzb/2MB7TxObN\n874K2O02J06ERYtOvb78cnj33cZ9dlBt3H2EmVlb+b9P9tO6RQL3XNaPb4ztQ+c2LaMdmvHayumw\nbS5oGUgC9JsKoxp57XbOvMBVcdd1l9SlTRmI8Y4fNQBut1k9IYDzeuLE2EoMa3d+wcysbLI+PUDb\npEQeyEjnzkv7cFbrFtEOzfhh5XTIrlLUo2WnXjc0MQS05sJNnUJ/4BfA+UDlsbCq9vU3tJrZ6aP6\n+VED4HabdRXg1vOfWih8tD2fzMXZLNt6iA7JzfnGmD58/ZI02oWpw5k5c/MTnURQnSTArQ0s6mni\nmgvP6hSA/wF+DPwOmADciV21FGh+1ADEW11BVarKiux8nsjaysqcz+ncpgUPX3kut43uTeuWdlV3\nXKgpIdS13I2A1ly4+S+6laouEhFRp9vaf4nIauBHPsdmGsiPGoB4qSuoSlVZvOUAM7OyWbvzMN3a\nJfHjr5zPLRem0qpFCFtemoaThNqPFBoqoDUXbv7FXywizYCtIjJDRK7DuY22CSg/agDcbvPyy2t+\nf23Lg6i8XFm4cR9fyVzOXX9axYGjxfzs2kEsfWg8d17axxJCPOpXS/FObcvdCGjNhZsjhW8CycAD\nwE+BDODf/QzKNI4fPQjcbvPdd8N79VFZubJgw14ys7LZsr+AtE7JPH7jYK4b1jM2Wl6ahquYTPby\n6qOA9pCod6K5ckWRdoCqaoG/IdXNJpqN10rLynnj4z08uSSb7QePk352G2ZMSOfqwd1JtGRgYoRn\nE80iMhJnsrlt5PUR4C5VrbMlp4gk4dxdtWXkc15R1R9XW0eAPwCTgELgDlVdU19MxnjhZGk5r67J\nY9aSbez8vJBzu7Xlya8N58pB3WgWa+0ujXHJzT+DngWmq2qaqqYB9+EkifoUAxmqOgQYClwhIqOr\nrXMl0D/ymArMJsa4bSQThoYzbhvyBH3MRSVlPPdBLuN/tZjvv7qBDsnN+ePXR7LggbFcNbj7qYTg\nR2OTldOdyxtfFOd5ZS1fotefHfTtRZPbscTSmOvgZk6hTFUrm+qo6nIRqffC3Mi9No5FXjaPPKqf\nq5oMPBdZ90MR6SAi3VV1r7vwg81twVcYGs64bcgT5DGfOFkWaXm5nQMFxYzofRY/v/4Cxg2oocOZ\nH4VFbgugvP7soG8vmtyOJZbGXA83xWu/B1oB83H+qH8VKAJeAKjrdI+IJACrgXTgSVX9XrXfvwX8\nUlWXR14vAr6nqrVOGoRpTsFtwVcYGs64bcgTxDEfKy7l+Q928PSy7eQfP8nFfTtxf0Y6F/frVHu7\nSz8Ki9wWQHn92UHfXjS5HUsMjNnL4rUhkecfV1s+DCdJZNT2RlUtA4aKSAfgNREZpKobXXzmaURk\nKs7pJVJDdGG824KvMBSGuW3IE6QxHzlRwp9W5PLsihyOnCjhsgFdeCAjnZFpHet/sx+FRW4LoLz+\n7KBvL5rcjiWWxlwPN53XJjT2Q1T1sIgsBq4AqiaF3UCvKq9TIsuqv38uMBecI4XGxtNU3BZ8haEw\nzG1DniCM+fPjJ3l2eQ5/fj+XguJSJp7XlRkZ6Qzt1cH9RvwoLHJbAOX1Zwd9e9HkdiyxNOZ61DvR\nLCJdReQZEXkn8vp8Ebnbxfu6RI4QEJFWwL8Bn1Zb7U3g6+IYDRyJlfkEcF/wFYaGM24b8kRzzAcK\nivj5gs2MeSyLJ5dkM3ZAZ95+YAxP//vIM0sI4E9hkdsCKK8/O+jbiya3Y4mlMdenvoYLwDvAzcC6\nyOtEYIOL9w0G1gLrcY4OfhRZfi9wb+RnAZ4EtgEbgJH1bTdsTXbcNpIJQ8MZtw15mnrMew+f0B+/\nsVEHPLJA+3z/LX1g/hr9bN/Rhm2sKj+auXw0TfXFBNV5OM8f1fIlev3ZQd9eNLkdS8jHTGOb7FQQ\nkX+q6oUislZVh0WWfayqQ71NT+6EaaLZ+GvX54XMXrqNV1blUa7KdcN6Mn1COn06t452aMYEjtuJ\nZjd1CsdFpBORy0krTvM0Mj5TTRjqFIIi99BxvvvyOib8egkvr9rFjSNTWPzgeH5105DgJ4SgXxPv\nR3xBr5GIk/oDt9xcffRtnHP//URkBdAFuNHXqOJMGOoUgiD7QAGZWdm8uW4PzROacdvo3twzri/d\n27eKdmjuBP2aeD/iC3qNRBzVH7jl6t5HIpIInIMzB7BFVUv8Dqw2sXj6KAx1CtG0ac9RMhdv5Z2N\n+2jVPIHbRvfmG2P7cHbbkPU/Dvo18X7EF/QaiRioP3DLy3sf3QQsVNVPROSHwHAR+ZnaPYo8E4Y6\nhWhYn3eYJxZl8+7m/bRpmcj08f24e0xfOoa15WXQr4n3I76g10jEUf2BW25OH/0/VX1ZRMYAlwO/\nxrlH0UW+RhZHwlCn0JRW5X7OzKxsln52kPatmvOfEwdwxyVptE8OecvLoF8T70d8Qa+RiKP6A7fc\nTDRXVNtcBfxRVd8GQvpPtWAKQ52C31SV97cd4ta5H3LjnA/YsPsID11xDsu/N4FvTuwf/oQAwb8m\n3o/4gl4jEU/1By65OVLYLSJP4RSfPSYiLbEezZ7yoylOWKgqSz87SGZWNqt2fEGXti354VXn8bWL\nUkluEWP9j902VYlW8xU/4vN6LEHfXgxwU6eQjHN7ig2qulVEugMXqOrfmyLA6mJxojkeqSrvbj5A\nZtZW1uUdoUf7JO4d34+bR/Yiqbm1uzTGa55NNKtqIfBqldd7gZi5FYVpWuXlysJP9jEzK5vNe4/S\nq2MrfnH9BdwwPIUWiXYAWmnldG9bP3ot6PGBc7lpNI4AovW5Homx43MTVKVl5by1fi+Zi7PJPnCM\nvp1b85ubhjB5aA9reVmd274L0RL0+CD4tR4B5rpHc1DY6aNwKSkr57W1u5m1OJvc/EIGdG3DjIz+\nXHVBdxKs5WXN3PZdiJagxwfBr/WIAi/7KRhzxopLy3h5VR6zl2xj9+ETDOzRjjm3jeBL53e1/sf1\ncdt3IVqCHh8Ev9YjwCwpGE8VlZQxf+VOnlq6nX1HixjaqwM/vXYgE845u/YuZ+Z0bvsuREvQ44Pg\n13oEmJ3MNZ44XlzK3Pe2Meaxxfz33zaR2jGZ5+8exWvTLyHj3K6WEM6E274L0RL0+CD4tR4BZkcK\nplGOFpX2o0vkAAARUUlEQVTw3Pu5PLM8hy8KSxiT3pn7M4ZxUd9O0Q4tvComa4N6dU/Q44Pg13oE\nmE00mwY5XHiSZ1fk8qcVORwtKiXj3LOZkZHO8NSzoh2aMaYGNtFsfHHoWDFPL8vh+Q9yOX6yjC8P\n7Mr9Gf0Z1LN9tEMLx/XhXsfodb1AGL5D4ytLCsaVA0eLeOq97cz7aAfFpeVcdUF3ZmSkc263dtEO\nzRGG68O9jtHreoEwfIfGd3b6yNRp9+ETzFmyjf9dtYuycmXy0B7cNyGdfl3aRDu00wX4+vBKXsfo\ndb1AGL5D02B2+sg0ys78QmYtyeava/IAuGF4CtPG96N3p4C2uwzD9eFex+h1vUAYvkPjO0sK5jTb\nDh7jycXZvPHxHhKaCbdcmMq94/vRs0PAW16G4fpwr2P0ul4gDN+h8Z3VKRgAtuwrYMaLa5j426Us\n2LCXOy5JY9lDE/jptYOCnxAgHNeHex2j1/UCYfgOje/sSCHObdx9hJlZW/m/T/bTukUC947rx91j\n+tC5Tctoh3ZmwnB9uNcxel0vEIbv0PjOJprj1JqdX5CZlU3Wpwdom5TInZf24a5L0+iQbE31jIlF\nNtFsavTR9nxmZmWzPPsQZyU358EvDeDrl6TRLikG2l1Gk9fX97vdntUVGI9ZUogDqsry7EPMXJTN\nytzP6dymJT+YdC5TLupN65b2n0CjeX19v9vtWV2B8YGdPophqsriLQd4YlE2H+86TLd2Sdwzri+3\njkq1lpde8vr6frfbs7oCcwbs9FEcKy9X/r5pP5mLt7Jx91F6dmjFo9cN4sYRKbRMtGTgOa+v73e7\nPasrMD6wpBBDysqVtzfs5cmsbLbsLyCtUzKP3ziY64b1pLm1vPSP19f3u92e1RUYH9hfihhQWlbO\nX1fn8W+/W8oD89dSpsrvvzqUd789jptH9rKE4Devr+93uz2rKzA+sCOFEDtZWs5f1+Qxa0k2uz4/\nwXnd2zFrynCuGNjNWl42Ja+v73e7PasrMD6wieYQKiop4y+rdjFnyTb2HClicEp77s/oz8TzrOWl\nMaZmUZ9oFpFewHNAV0CBuar6h2rrjAfeAHIii15V1Z/4FVPYFZ4s5cWPdvLUe9s5WFDMiN5n8fPr\nL2DcgC6WDIwxnvDz9FEp8B1VXSMibYHVIvIPVd1Ubb1lqnq1j3GE3rHiUp77IJdnluWQf/wkF/ft\nxB9uGcrFfTvFRzIIQ4GWFZs1nn03geBbUlDVvcDeyM8FIrIZ6AlUTwqmFkdOlPCnFbk8uyKHIydK\nGDegC/dnpDMyrWO0Q2s6YSjQsmKzxrPvJjCaZE5BRNKA94BBqnq0yvLxwKtAHrAbeFBVP6lrW/Ew\np/D58ZM8s3w7z72/g4LiUiae15X7M9IZ0qtDtENremEo0LJis8az78Z3UZ9TqBJIG+CvwLeqJoSI\nNUCqqh4TkUnA60D/GrYxFZgKkJoau9dgHygo4ullObzw4Q5OlJRx5aBuzJjQn/N7BKTlZTSEoUDL\nis0az76bwPA1KYhIc5yEME9VX63++6pJQlUXiMgsEemsqoeqrTcXmAvOkYKfMUfD3iMneGrpduav\n3ElJWTnXDHFaXvbv2jbaoUVfGAq0rNis8ey7CQzfqprEmQF9Btisqr+tZZ1ukfUQkVGRePL9iilo\ndn1eyA9e28C4x5fwwoc7uGZIDxZ9Zzy/v2WYJYQKYSjQsmKzxrPvJjD8PFK4FLgd2CAiH0eW/QBI\nBVDVOcCNwDQRKQVOALdo2AonGiDn0HFmLc7mtbW7aSbCjSNTmDauH706Jtf/5ngThgItKzZrPPtu\nAsOK15rQ1v0FZC7O5m/r9tA8oRm3jkrlnnF96d4+BO0ujTGhFpiJZgOb9hwlc/FW3tm4j1bNE/iP\nsX25e2wfzm6bFO3Q/BGP15uvnO5dW0xjosiSgo/W7TrMzKxs3t28n7YtE7lvfDp3jelDx9Yx3PIy\nHq83Xzkdsmefeq1lp15bYjAhY6ePfLAq93OeyMrmvc8O0r5Vc+66tA93XJpG+1Zx0PIyHq83n5/o\nJILqJAFuLW36eIypgZ0+amKqygfb8nkiaysfbv+cTq1b8L0rzuW20am0jaf+x/F4vXlNCaGu5cYE\nmCWFRlJVln52kJlZ2aze8QVnt23JD686j69dlEpyizj8euPxenNJqP1IwZiQicO/Wt5QVd7dfICZ\nWVtZn3eEHu2T+Onkgdw0sld89z8e8ujpcwoQ+9eb95t6+pxC1eXGhIwlhTNUXq68s3EfM7O28um+\nAlI7JvPL6y/g+uEptEi0Dmdxeb15xWSyXX1kYoBNNLtUWlbOW+v3krk4m+wDx+jbpTX3jU9n8tAe\nJFq7S2NMwLmdaLa/ZvUoKSvnL//cxcTfLuVb//sxCSLMvHUY//jPcdwwIqXJE8K8eZCWBs2aOc/z\n5jXpx3srZ55ztdKLzZznnBAPJpbGEi32HQaCnT6qRXFpGS+vymP2km3sPnyCgT3aMee2EXzp/K5R\n6388bx5MnQqFkdP1O3Y4rwGmhO3sTCzVM8TSWKLFvsPAsNNH1Zw4Wcb8lTt56r1t7D9azNBeHXjg\n8nQmnBP9/sdpaU4iqK53b8jNbepoGimW6hliaSzRYt+h76xO4QwdLy7lhQ938Mdl2zl07CSj+nTk\nNzcN5dL04LS83FnLpf61LQ+0WKpniKWxRIt9h4ER90nhaFEJz72fyzPLc/iisISx/TszY0I6F/Xt\nFO3Q/kVqas1HCqHsOxRL9QyxNJZose8wMOJ2ovlw4Ul++/ctXPrLLH79988YlnoWr06/hOfvviiQ\nCQHg0UchudrdtZOTneWhE0v3z4+lsUSLfYeBEXdHCoeOFfP0shye/yCX4yfL+PLArtyf0Z9BPdtH\nO7R6VUwmP/KIc8ooNdVJCKGbZIbYqmeIpbFEi32HgRE3E837jxbx1NLtvLhyB8Wl5Vw9uAczJqRz\nTjfrcGaMiX020VzNqtwv+PMHuUwe6vQ/7telTbRDMiZ4/OiFEY/9NUIsbpLClYO6seTB8dby0pja\n+FErYPUHoRM3E83NmoklBGPqsu6R029kCM7rdY8Ea5vGV3GTFIwx9fCjVsDqD0LHkoIxxlFbTUBj\nagX82KbxlSUFY4zDj1oBqz8IHUsKxhhHnykwaq5zvyHEeR41t3ETwn5s0/gqbuoUjDEmnlk/BWOM\nMWfMkoIxxphKlhSMMcZUsqRgjDGmkiUFY4wxlSwpGGOMqWRJwRhjTCVLCsYYYyr5lhREpJeILBaR\nTSLyiYh8s4Z1RESeEJFsEVkvIsP9iscYY0z9/DxSKAW+o6rnA6OB+0Tk/GrrXAn0jzymArN9jMcE\nTc48eD0NXmzmPOfMi3ZExsQ935KCqu5V1TWRnwuAzUDPaqtNBp5Tx4dABxHp7ldMJkAqmq8U7gD0\nVPMVSwzGRFWTzCmISBowDPio2q96AruqvM7jXxOHiUXWfMWYQPI9KYhIG+CvwLdU9WgDtzFVRFaJ\nyKqDBw96G6CJDmu+Ykwg+ZoURKQ5TkKYp6qv1rDKbqBXldcpkWWnUdW5qjpSVUd26dLFn2BN07Lm\nK8YEkp9XHwnwDLBZVX9by2pvAl+PXIU0Gjiiqnv9iskEiDVfMSaQEn3c9qXA7cAGEfk4suwHQCqA\nqs4BFgCTgGygELjTx3hMkFQ0WVn3iHPKKDnVSQjWfMWYqPItKajqckDqWUeB+/yKwQRcnymWBIwJ\nGKtoNsYYU8mSgjHGmEqWFIwxxlSypGCMMaaSJQVjjDGVxLkAKDxE5CCwo4Fv7wwc8jCcaLKxBFOs\njCVWxgE2lgq9VbXe6t/QJYXGEJFVqjoy2nF4wcYSTLEyllgZB9hYzpSdPjLGGFPJkoIxxphK8ZYU\n5kY7AA/ZWIIpVsYSK+MAG8sZias5BWOMMXWLtyMFY4wxdYjZpCAiCSKyVkTequF3IiJPiEi2iKwX\nkeHRiNGtesYyXkSOiMjHkcePohGjGyKSKyIbInGuquH3odgvLsYRpn3SQUReEZFPRWSziFxc7feh\n2Cfgaiyh2C8ick6VGD8WkaMi8q1q6/i2X/y8dXa0fROnL3S7Gn53JdA/8rgImB15Dqq6xgKwTFWv\nbsJ4GmOCqtZ2nXWY9ktd44Dw7JM/AAtV9UYRaQFUa3IRqn1S31ggBPtFVbcAQ8H5ByFO47HXqq3m\n236JySMFEUkBrgKermWVycBz6vgQ6CAi3ZsswDPgYiyxJDT7JRaISHvgMpxmWKjqSVU9XG21UOwT\nl2MJo8uBbapavWDXt/0Sk0kB+D3wEFBey+97AruqvM6LLAui+sYCcEnkEPIdERnYRHE1hALvishq\nEZlaw+/Dsl/qGweEY5/0AQ4C/xM5Pfm0iLSutk5Y9ombsUA49ktVtwDza1ju236JuaQgIlcDB1R1\ndbRjaSyXY1kDpKrqYGAm8HqTBNcwY1R1KM6h730iclm0A2qg+sYRln2SCAwHZqvqMOA48P3ohtRg\nbsYSlv0CQOQU2DXAy035uTGXFHDagF4jIrnAS0CGiLxQbZ3dQK8qr1Miy4Km3rGo6lFVPRb5eQHQ\nXEQ6N3mkLqjq7sjzAZxzpKOqrRKK/VLfOEK0T/KAPFX9KPL6FZw/rFWFYp/gYiwh2i8VrgTWqOr+\nGn7n236JuaSgqg+raoqqpuEcemWp6m3VVnsT+HpkBn80cERV9zZ1rPVxMxYR6SYiEvl5FM4+zW/y\nYOshIq1FpG3Fz8CXgI3VVgv8fnEzjrDsE1XdB+wSkXMiiy4HNlVbLfD7BNyNJSz7pYpbqfnUEfi4\nX2L56qPTiMi9AKo6B1gATAKygULgziiGdsaqjeVGYJqIlAIngFs0mBWJXYHXIv9PJgIvqurCEO4X\nN+MIyz4BuB+YFzlVsR24M4T7pEJ9YwnNfon8g+PfgHuqLGuS/WIVzcYYYyrF3OkjY4wxDWdJwRhj\nTCVLCsYYYypZUjDGGFPJkoIxxphKlhSMOUORu23Wdsfaf1nuweddKyLnV3m9RERiouewCR5LCsYE\n37XA+fWuZYwHLCmYmBOpOn5bRNaJyEYR+Wpk+QgRWRq5kd3/VdxVMvIv7z+Ic+/6jZFqV0RklIh8\nELnB2vtVqmXdxvCsiKyMvH9yZPkdIvKqiCwUka0i8niV99wtIp9F3vNHEckUkUtw7n/zq0h8/SKr\n3xRZ7zMRGevRV2dM/FQ0m7hyBbBHVa8C57bKItIc5yZok1X1YCRRPArcFXlPsqoOjdzc7llgEPAp\nMFZVS0VkIvBz4AaXMTyCc1uSu0SkA7BSRN6N/G4oMAwoBraIyEygDPh/OPfrKQCygHWq+r6IvAm8\npaqvRMYDkKiqo0RkEvBjYGJDvihjqrOkYGLRBuA3IvIYzh/TZSIyCOcP/T8if1QTgKr3ipkPoKrv\niUi7yB/ytsCfRaQ/zu2ym59BDF/CuZnhg5HXSUBq5OdFqnoEQEQ2Ab2BzsBSVf08svxlYEAd2381\n8rwaSDuDuIypkyUFE3NU9TNx2hNOAn4mIotw7mb6iapeXNvbanj9U2Cxql4nImnAkjMIQ4AbIl20\nTi0UuQjnCKFCGQ37/7BiGw19vzE1sjkFE3NEpAdQqKovAL/COSWzBegikb69ItJcTm+yUjHvMAbn\njpNHgPacuh3xHWcYxv8B91e5K+ewetb/JzBORM4SkUROP01VgHPUYozvLCmYWHQBzjn8j3HOt/9M\nVU/i3CXzMRFZB3wMXFLlPUUishaYA9wdWfY48IvI8jP91/hPcU43rReRTyKvaxXp0fBzYCWwAsgF\njkR+/RLw3ciEdb+at2CMN+wuqSbuicgS4EFVXRXlONqo6rHIkcJrwLOqWr1huzG+siMFY4LjvyJH\nNxuBHALeLtLEJjtSMMYYU8mOFIwxxlSypGCMMaaSJQVjjDGVLCkYY4ypZEnBGGNMJUsKxhhjKv1/\nKrGqjTEvHgYAAAAASUVORK5CYII=\n", 310 | "text/plain": [ 311 | "" 312 | ] 313 | }, 314 | "metadata": {}, 315 | "output_type": "display_data" 316 | } 317 | ], 318 | "source": [ 319 | "x_ponits = np.arange(4, 8)\n", 320 | "y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]\n", 321 | "plt.plot(x_ponits, y_)\n", 322 | "\n", 323 | "plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0')\n", 324 | "plt.plot(X[50:, 0], X[50:, 1], 'bo', color='orange', label='1')\n", 325 | "plt.xlabel('sepal length')\n", 326 | "plt.ylabel('sepal width')\n", 327 | "plt.legend()" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": { 334 | "collapsed": true 335 | }, 336 | "outputs": [], 337 | "source": [] 338 | } 339 | ], 340 | "metadata": { 341 | "kernelspec": { 342 | "display_name": "Python 3", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.6.1" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 2 361 | } 362 | -------------------------------------------------------------------------------- /NaiveBayes/GaussianNB.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 朴素贝叶斯\n", 8 | "\n", 9 | "基于贝叶斯定理与特征条件独立假设的分类方法。\n", 10 | "\n", 11 | "模型:\n", 12 | "\n", 13 | "- 高斯模型\n", 14 | "- 多项式模型\n", 15 | "- 伯努利模型" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "import numpy as np\n", 27 | "import pandas as pd\n", 28 | "import matplotlib.pyplot as plt\n", 29 | "%matplotlib inline\n", 30 | "\n", 31 | "from sklearn.datasets import load_iris\n", 32 | "from sklearn.model_selection import train_test_split\n", 33 | "\n", 34 | "from collections import Counter\n", 35 | "import math" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "# data\n", 45 | "def create_data():\n", 46 | " iris = load_iris()\n", 47 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 48 | " df['label'] = iris.target\n", 49 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 50 | " data = np.array(df.iloc[:100, :])\n", 51 | " # print(data)\n", 52 | " return data[:,:-1], data[:,-1]" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "X, y = create_data()\n", 62 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/plain": [ 73 | "(array([ 4.8, 3. , 1.4, 0.1]), 0.0)" 74 | ] 75 | }, 76 | "execution_count": 4, 77 | "metadata": {}, 78 | "output_type": "execute_result" 79 | } 80 | ], 81 | "source": [ 82 | "X_test[0], y_test[0]" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "参考:https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/\n", 90 | "\n", 91 | "## GaussianNB 高斯朴素贝叶斯\n", 92 | "\n", 93 | "特征的可能性被假设为高斯\n", 94 | "\n", 95 | "概率密度函数:\n", 96 | "$$P(x_i | y_k)=\\frac{1}{\\sqrt{2\\pi\\sigma^2_{yk}}}exp(-\\frac{(x_i-\\mu_{yk})^2}{2\\sigma^2_{yk}})$$\n", 97 | "\n", 98 | "数学期望(mean):$\\mu$,方差:$\\sigma^2=\\frac{\\sum(X-\\mu)^2}{N}$" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 7, 104 | "metadata": { 105 | "collapsed": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "class NaiveBayes:\n", 110 | " def __init__(self):\n", 111 | " self.model = None\n", 112 | "\n", 113 | " # 数学期望\n", 114 | " @staticmethod\n", 115 | " def mean(X):\n", 116 | " return sum(X) / float(len(X))\n", 117 | "\n", 118 | " # 标准差(方差)\n", 119 | " def stdev(self, X):\n", 120 | " avg = self.mean(X)\n", 121 | " return math.sqrt(sum([pow(x-avg, 2) for x in X]) / float(len(X)))\n", 122 | "\n", 123 | " # 概率密度函数\n", 124 | " def gaussian_probability(self, x, mean, stdev):\n", 125 | " exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))\n", 126 | " return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent\n", 127 | "\n", 128 | " # 处理X_train\n", 129 | " def summarize(self, train_data):\n", 130 | " summaries = [(self.mean(i), self.stdev(i)) for i in zip(*train_data)]\n", 131 | " return summaries\n", 132 | "\n", 133 | " # 分类别求出数学期望和标准差\n", 134 | " def fit(self, X, y):\n", 135 | " labels = list(set(y))\n", 136 | " data = {label:[] for label in labels}\n", 137 | " for f, label in zip(X, y):\n", 138 | " data[label].append(f)\n", 139 | " self.model = {label: self.summarize(value) for label, value in data.items()}\n", 140 | " return 'gaussianNB train done!'\n", 141 | "\n", 142 | " # 计算概率\n", 143 | " def calculate_probabilities(self, input_data):\n", 144 | " # summaries:{0.0: [(5.0, 0.37),(3.42, 0.40)], 1.0: [(5.8, 0.449),(2.7, 0.27)]}\n", 145 | " # input_data:[1.1, 2.2]\n", 146 | " probabilities = {}\n", 147 | " for label, value in self.model.items():\n", 148 | " probabilities[label] = 1\n", 149 | " for i in range(len(value)):\n", 150 | " mean, stdev = value[i]\n", 151 | " probabilities[label] *= self.gaussian_probability(input_data[i], mean, stdev)\n", 152 | " return probabilities\n", 153 | "\n", 154 | " # 类别\n", 155 | " def predict(self, X_test):\n", 156 | " # {0.0: 2.9680340789325763e-27, 1.0: 3.5749783019849535e-26}\n", 157 | " label = sorted(self.calculate_probabilities(X_test).items(), key=lambda x: x[-1])[-1][0]\n", 158 | " return label\n", 159 | "\n", 160 | " def score(self, X_test, y_test):\n", 161 | " right = 0\n", 162 | " for X, y in zip(X_test, y_test):\n", 163 | " label = self.predict(X)\n", 164 | " if label == y:\n", 165 | " right += 1\n", 166 | "\n", 167 | " return right / float(len(X_test))" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 8, 173 | "metadata": { 174 | "collapsed": true 175 | }, 176 | "outputs": [], 177 | "source": [ 178 | "model = NaiveBayes()" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 9, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "data": { 188 | "text/plain": [ 189 | "'gaussianNB train done!'" 190 | ] 191 | }, 192 | "execution_count": 9, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "model.fit(X_train, y_train)" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 10, 204 | "metadata": {}, 205 | "outputs": [ 206 | { 207 | "name": "stdout", 208 | "output_type": "stream", 209 | "text": [ 210 | "0.0\n" 211 | ] 212 | } 213 | ], 214 | "source": [ 215 | "print(model.predict([4.4, 3.2, 1.3, 0.2]))" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 11, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "1.0" 227 | ] 228 | }, 229 | "execution_count": 11, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "model.score(X_test, y_test)" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": { 241 | "collapsed": true 242 | }, 243 | "source": [ 244 | "scikit-learn实例\n", 245 | "\n", 246 | "# sklearn.naive_bayes" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 12, 252 | "metadata": { 253 | "collapsed": true 254 | }, 255 | "outputs": [], 256 | "source": [ 257 | "from sklearn.naive_bayes import GaussianNB" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 13, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "data": { 267 | "text/plain": [ 268 | "GaussianNB(priors=None)" 269 | ] 270 | }, 271 | "execution_count": 13, 272 | "metadata": {}, 273 | "output_type": "execute_result" 274 | } 275 | ], 276 | "source": [ 277 | "clf = GaussianNB()\n", 278 | "clf.fit(X_train, y_train)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 14, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/plain": [ 289 | "1.0" 290 | ] 291 | }, 292 | "execution_count": 14, 293 | "metadata": {}, 294 | "output_type": "execute_result" 295 | } 296 | ], 297 | "source": [ 298 | "clf.score(X_test, y_test)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 16, 304 | "metadata": {}, 305 | "outputs": [ 306 | { 307 | "name": "stderr", 308 | "output_type": "stream", 309 | "text": [ 310 | "E:\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.\n", 311 | " DeprecationWarning)\n" 312 | ] 313 | }, 314 | { 315 | "data": { 316 | "text/plain": [ 317 | "array([ 0.])" 318 | ] 319 | }, 320 | "execution_count": 16, 321 | "metadata": {}, 322 | "output_type": "execute_result" 323 | } 324 | ], 325 | "source": [ 326 | "clf.predict([4.4, 3.2, 1.3, 0.2])" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 17, 332 | "metadata": { 333 | "collapsed": true 334 | }, 335 | "outputs": [], 336 | "source": [ 337 | "from sklearn.naive_bayes import BernoulliNB, MultinomialNB # 伯努利模型和多项式模型" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": { 344 | "collapsed": true 345 | }, 346 | "outputs": [], 347 | "source": [] 348 | } 349 | ], 350 | "metadata": { 351 | "kernelspec": { 352 | "display_name": "Python 3", 353 | "language": "python", 354 | "name": "python3" 355 | }, 356 | "language_info": { 357 | "codemirror_mode": { 358 | "name": "ipython", 359 | "version": 3 360 | }, 361 | "file_extension": ".py", 362 | "mimetype": "text/x-python", 363 | "name": "python", 364 | "nbconvert_exporter": "python", 365 | "pygments_lexer": "ipython3", 366 | "version": "3.6.1" 367 | } 368 | }, 369 | "nbformat": 4, 370 | "nbformat_minor": 2 371 | } 372 | -------------------------------------------------------------------------------- /Perceptron/README.md: -------------------------------------------------------------------------------- 1 | # Perceptron 感知机 2 | 3 | 4 | 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # statistical-learning-method 2 | 《统计学习方法》笔记-基于Python算法实现 3 | 4 | 5 | 第一章 [最小二乘法](https://github.com/wzyonggege/statistical-learning-method/blob/master/LeastSquaresMethod/least_sqaure_method.ipynb) 6 | 7 | 第二章 [感知机](https://github.com/wzyonggege/statistical-learning-method/blob/master/Perceptron/Iris_perceptron.ipynb) 8 | 9 | 第三章 [k近邻法](https://github.com/wzyonggege/statistical-learning-method/blob/master/KNearestNeighbors/KNN.ipynb) 10 | 11 | 第四章 [朴素贝叶斯](https://github.com/wzyonggege/statistical-learning-method/blob/master/NaiveBayes/GaussianNB.ipynb) 12 | 13 | 第五章 [决策树](https://github.com/wzyonggege/statistical-learning-method/blob/master/DecisonTree/DT.ipynb) 14 | 15 | 第六章 [逻辑斯谛回归](https://github.com/wzyonggege/statistical-learning-method/blob/master/LogisticRegression/LR.ipynb) 16 | 17 | 第七章 [支持向量机](https://github.com/wzyonggege/statistical-learning-method/blob/master/SVM/support-vector-machine.ipynb) 18 | 19 | 第八章 [AdaBoost](https://github.com/wzyonggege/statistical-learning-method/blob/master/AdaBoost/Adaboost.ipynb) 20 | 21 | 第九章 [EM算法](https://github.com/wzyonggege/statistical-learning-method/blob/master/EM/em.ipynb) -------------------------------------------------------------------------------- /SVM/.ipynb_checkpoints/support-vector-machine-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 支持向量机" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "----\n", 15 | "分离超平面:$w^Tx+b=0$\n", 16 | "\n", 17 | "点到直线距离:$r=\\frac{|w^Tx+b|}{||w||_2}$\n", 18 | "\n", 19 | "$||w||_2$为2-范数:$||w||_2=\\sqrt[2]{\\sum^m_{i=1}w_i^2}$\n", 20 | "\n", 21 | "直线为超平面,样本可表示为:\n", 22 | "\n", 23 | "$w^Tx+b\\ \\geq+1$\n", 24 | "\n", 25 | "$w^Tx+b\\ \\leq+1$\n", 26 | "\n", 27 | "#### margin:\n", 28 | "\n", 29 | "**函数间隔**:$label(w^Tx+b)\\ or\\ y_i(w^Tx+b)$\n", 30 | "\n", 31 | "**几何间隔**:$r=\\frac{label(w^Tx+b)}{||w||_2}$,当数据被正确分类时,几何间隔就是点到超平面的距离\n", 32 | "\n", 33 | "为了求几何间隔最大,SVM基本问题可以转化为求解:($\\frac{r^*}{||w||}$为几何间隔,(${r^*}$为函数间隔)\n", 34 | "\n", 35 | "$$\\max\\ \\frac{r^*}{||w||}$$\n", 36 | "\n", 37 | "$$(subject\\ to)\\ y_i({w^T}x_i+{b})\\geq {r^*},\\ i=1,2,..,m$$\n", 38 | "\n", 39 | "分类点几何间隔最大,同时被正确分类。但这个方程并非凸函数求解,所以要先①将方程转化为凸函数,②用拉格朗日乘子法和KKT条件求解对偶问题。\n", 40 | "\n", 41 | "①转化为凸函数:\n", 42 | "\n", 43 | "先令${r^*}=1$,方便计算(参照衡量,不影响评价结果)\n", 44 | "\n", 45 | "$$\\max\\ \\frac{1}{||w||}$$\n", 46 | "\n", 47 | "$$s.t.\\ y_i({w^T}x_i+{b})\\geq {1},\\ i=1,2,..,m$$\n", 48 | "\n", 49 | "再将$\\max\\ \\frac{1}{||w||}$转化成$\\min\\ \\frac{1}{2}||w||^2$求解凸函数,1/2是为了求导之后方便计算。\n", 50 | "\n", 51 | "$$\\min\\ \\frac{1}{2}||w||^2$$\n", 52 | "\n", 53 | "$$s.t.\\ y_i(w^Tx_i+b)\\geq 1,\\ i=1,2,..,m$$\n", 54 | "\n", 55 | "②用拉格朗日乘子法和KKT条件求解最优值:\n", 56 | "\n", 57 | "$$\\min\\ \\frac{1}{2}||w||^2$$\n", 58 | "\n", 59 | "$$s.t.\\ -y_i(w^Tx_i+b)+1\\leq 0,\\ i=1,2,..,m$$\n", 60 | "\n", 61 | "整合成:\n", 62 | "\n", 63 | "$$L(w, b, \\alpha) = \\frac{1}{2}||w||^2+\\sum^m_{i=1}\\alpha_i(-y_i(w^Tx_i+b)+1)$$\n", 64 | "\n", 65 | "推导:$\\min\\ f(x)=\\min \\max\\ L(w, b, \\alpha)\\geq \\max \\min\\ L(w, b, \\alpha)$\n", 66 | "\n", 67 | "根据KKT条件:\n", 68 | "\n", 69 | "$$\\frac{\\partial }{\\partial w}L(w, b, \\alpha)=w-\\sum\\alpha_iy_ix_i=0,\\ w=\\sum\\alpha_iy_ix_i$$\n", 70 | "\n", 71 | "$$\\frac{\\partial }{\\partial b}L(w, b, \\alpha)=\\sum\\alpha_iy_i=0$$\n", 72 | "\n", 73 | "带入$ L(w, b, \\alpha)$\n", 74 | "\n", 75 | "$\\min\\ L(w, b, \\alpha)=\\frac{1}{2}||w||^2+\\sum^m_{i=1}\\alpha_i(-y_i(w^Tx_i+b)+1)$\n", 76 | "\n", 77 | "$\\qquad\\qquad\\qquad=\\frac{1}{2}w^Tw-\\sum^m_{i=1}\\alpha_iy_iw^Tx_i-b\\sum^m_{i=1}\\alpha_iy_i+\\sum^m_{i=1}\\alpha_i$\n", 78 | "\n", 79 | "$\\qquad\\qquad\\qquad=\\frac{1}{2}w^T\\sum\\alpha_iy_ix_i-\\sum^m_{i=1}\\alpha_iy_iw^Tx_i+\\sum^m_{i=1}\\alpha_i$\n", 80 | "\n", 81 | "$\\qquad\\qquad\\qquad=\\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i=1}\\alpha_iy_iw^Tx_i$\n", 82 | "\n", 83 | "$\\qquad\\qquad\\qquad=\\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)$\n", 84 | "\n", 85 | "再把max问题转成min问题:\n", 86 | "\n", 87 | "$\\max\\ \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)=\\min \\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)-\\sum^m_{i=1}\\alpha_i$\n", 88 | "\n", 89 | "$s.t.\\ \\sum^m_{i=1}\\alpha_iy_i=0,$\n", 90 | "\n", 91 | "$ \\alpha_i \\geq 0,i=1,2,...,m$\n", 92 | "\n", 93 | "以上为SVM对偶问题的对偶形式\n", 94 | "\n", 95 | "-----\n", 96 | "#### kernel\n", 97 | "\n", 98 | "在低维空间计算获得高维空间的计算结果,也就是说计算结果满足高维(满足高维,才能说明高维下线性可分)。\n", 99 | "\n", 100 | "#### soft margin & slack variable\n", 101 | "\n", 102 | "引入松弛变量$\\xi\\geq0$,对应数据点允许偏离的functional margin 的量。\n", 103 | "\n", 104 | "目标函数:$\\min\\ \\frac{1}{2}||w||^2+C\\sum\\xi_i\\qquad s.t.\\ y_i(w^Tx_i+b)\\geq1-\\xi_i$ \n", 105 | "\n", 106 | "对偶问题:\n", 107 | "\n", 108 | "$$\\max\\ \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)=\\min \\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)-\\sum^m_{i=1}\\alpha_i$$\n", 109 | "\n", 110 | "$$s.t.\\ C\\geq\\alpha_i \\geq 0,i=1,2,...,m\\quad \\sum^m_{i=1}\\alpha_iy_i=0,$$\n", 111 | "\n", 112 | "-----\n", 113 | "\n", 114 | "#### Sequential Minimal Optimization\n", 115 | "\n", 116 | "首先定义特征到结果的输出函数:$u=w^Tx+b$.\n", 117 | "\n", 118 | "因为$w=\\sum\\alpha_iy_ix_i$\n", 119 | "\n", 120 | "有$u=\\sum y_i\\alpha_iK(x_i, x)-b$\n", 121 | "\n", 122 | "\n", 123 | "----\n", 124 | "\n", 125 | "$\\max \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i=1}\\sum^m_{j=1}\\alpha_i\\alpha_jy_iy_j<\\phi(x_i)^T,\\phi(x_j)>$\n", 126 | "\n", 127 | "$s.t.\\ \\sum^m_{i=1}\\alpha_iy_i=0,$\n", 128 | "\n", 129 | "$ \\alpha_i \\geq 0,i=1,2,...,m$\n", 130 | "\n", 131 | "-----\n", 132 | "参考资料:\n", 133 | "\n", 134 | "[1] :[Lagrange Multiplier and KKT](http://blog.csdn.net/xianlingmao/article/details/7919597)\n", 135 | "\n", 136 | "[2] :[推导SVM](https://my.oschina.net/dfsj66011/blog/517766)\n", 137 | "\n", 138 | "[3] :[机器学习算法实践-支持向量机(SVM)算法原理](http://pytlab.org/2017/08/15/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95%E5%AE%9E%E8%B7%B5-%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA-SVM-%E7%AE%97%E6%B3%95%E5%8E%9F%E7%90%86/)\n", 139 | "\n", 140 | "[4] :[Python实现SVM](http://blog.csdn.net/wds2006sdo/article/details/53156589)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 1, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "name": "stderr", 150 | "output_type": "stream", 151 | "text": [ 152 | "E:\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", 153 | " \"This module will be removed in 0.20.\", DeprecationWarning)\n" 154 | ] 155 | } 156 | ], 157 | "source": [ 158 | "import numpy as np\n", 159 | "import pandas as pd\n", 160 | "from sklearn.datasets import load_iris\n", 161 | "from sklearn.cross_validation import train_test_split\n", 162 | "import matplotlib.pyplot as plt\n", 163 | "%matplotlib inline" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 2, 169 | "metadata": { 170 | "collapsed": true 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "# data\n", 175 | "def create_data():\n", 176 | " iris = load_iris()\n", 177 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 178 | " df['label'] = iris.target\n", 179 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 180 | " data = np.array(df.iloc[:100, [0, 1, -1]])\n", 181 | " for i in range(len(data)):\n", 182 | " if data[i,-1] == 0:\n", 183 | " data[i,-1] = -1\n", 184 | " # print(data)\n", 185 | " return data[:,:2], data[:,-1]" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 9, 191 | "metadata": { 192 | "collapsed": true 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "X, y = create_data()\n", 197 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 10, 203 | "metadata": {}, 204 | "outputs": [ 205 | { 206 | "data": { 207 | "text/plain": [ 208 | "" 209 | ] 210 | }, 211 | "execution_count": 10, 212 | "metadata": {}, 213 | "output_type": "execute_result" 214 | }, 215 | { 216 | "data": { 217 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZ9JREFUeJzt3X9sHOWdx/H394yv8bWAReMWsJNLCihqSSLSugTICXGg\nXkqaQoRQlAiKQhE5ELpS0aNqKtQfqBJISLRQdEQBdBTBBeVoGihHgjgoKkUklZMg5y4pKhxtY8MV\nE5TQHKYE93t/7Dqx12vvzu6O93me/bwky97Zyfj7zMA3m5nPPGPujoiIpOWvml2AiIg0npq7iEiC\n1NxFRBKk5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSdBx1a5oZm1AHzDo7stL3rsAeBx4\nvbhos7vfOtX2Zs6c6XPmzMlUrIhIq9u5c+fb7t5Vab2qmztwI7APOGGS918obfpTmTNnDn19fRl+\nvYiImNnvq1mvqtMyZtYDfAm4v56iRERkelR7zv1HwDeBv0yxznlm1m9mW83szHIrmNlaM+szs76h\noaGstYqISJUqNnczWw685e47p1htFzDb3RcCPwa2lFvJ3Te4e6+793Z1VTxlJCIiNarmnPsS4BIz\nWwbMAE4ws4fd/crRFdz93TE/P2Vm/2JmM9397caXLCJSnyNHjjAwMMD777/f7FImNWPGDHp6emhv\nb6/pz1ds7u6+DlgHR1Mx/zy2sReXnwz80d3dzM6m8C+CAzVVJCKSs4GBAY4//njmzJmDmTW7nAnc\nnQMHDjAwMMDcuXNr2kaWtMw4ZnZdsYj1wOXA9Wb2ITAMrHI9BUREAvX+++8H29gBzIyPf/zj1HNt\nMlNzd/fngeeLP68fs/we4J6aqxAJ2Jbdg9zx9Cu8cXCYUzs7uHnpPFYs6m52WVKnUBv7qHrrq/mT\nu0gr2LJ7kHWb9zB8ZASAwYPDrNu8B0ANXoKm6QdEpnDH068cbeyjho+McMfTrzSpIknFtm3bmDdv\nHqeffjq33357w7ev5i4yhTcODmdaLlKNkZERbrjhBrZu3crevXvZuHEje/fubejv0GkZkSmc2tnB\nYJlGfmpnRxOqkWZp9HWXX//615x++ul86lOfAmDVqlU8/vjjfOYzn2lUyfrkLjKVm5fOo6O9bdyy\njvY2bl46r0kVyXQbve4yeHAY59h1ly27B2ve5uDgILNmzTr6uqenh8HB2rdXjpq7yBRWLOrmtssW\n0N3ZgQHdnR3cdtkCXUxtIbFed9FpGZEKVizqVjNvYXlcd+nu7mb//v1HXw8MDNDd3dj/xvTJXURk\nCpNdX6nnusvnP/95fvvb3/L666/zwQcf8Oijj3LJJZfUvL1y1NxFRKaQx3WX4447jnvuuYelS5fy\n6U9/mpUrV3LmmWUn0639dzR0ayIiiRk9Jdfou5SXLVvGsmXLGlFiWWruIiIVxHjdRadlREQSpOYu\nIpIgNXcRkQSpuYuIJEjNXUQkQWrukowtuwdZcvtzzP3Wf7Dk9ufqmvtDJG9f/epX+cQnPsH8+fNz\n2b6auyQhj8mdRPK0Zs0atm3bltv21dwlCbFO7iSR6N8EP5wP3+ssfO/fVPcmzz//fE466aQGFFee\nbmKSJOihGpKb/k3w86/BkeJ/S4f2F14DLFzZvLoq0Cd3SUIekzuJAPDsrcca+6gjw4XlAVNzlyTo\noRqSm0MD2ZYHQqdlJAl5Te4kwok9hVMx5ZYHTM1dkhHj5E4SgYu+M/6cO0B7R2F5HVavXs3zzz/P\n22+/TU9PD9///ve55ppr6iz2GDV3qVujHx4sEpTRi6bP3lo4FXNiT6Gx13kxdePGjQ0obnJq7lKX\n0Xz5aAxxNF8OqMFLOhauDDoZU44uqEpdlC8XCZOau9RF+XKJlbs3u4Qp1VufmrvURflyidGMGTM4\ncOBAsA3e3Tlw4AAzZsyoeRs65y51uXnpvHHn3EH5cglfT08PAwMDDA0NNbuUSc2YMYOentrjlmru\nUhflyyVG7e3tzJ07t9ll5Krq5m5mbUAfMOjuy0veM+AuYBnwHrDG3Xc1slAJl/LlIuHJ8sn9RmAf\ncEKZ9y4Gzih+LQbuLX4XaSnK/EsoqrqgamY9wJeA+ydZ5VLgIS/YDnSa2SkNqlEkCppTXkJSbVrm\nR8A3gb9M8n43MHbyhYHiMpGWocy/hKRiczez5cBb7r6z3l9mZmvNrM/M+kK+Si1SC2X+JSTVfHJf\nAlxiZr8DHgUuNLOHS9YZBGaNed1TXDaOu29w91537+3q6qqxZJEwKfMvIanY3N19nbv3uPscYBXw\nnLtfWbLaE8BVVnAOcMjd32x8uSLh0pzyEpKac+5mdh2Au68HnqIQg3yVQhTy6oZUJxIRZf4lJNas\n2297e3u9r6+vKb9bRCRWZrbT3Xsrrac7VCVYt2zZw8Yd+xlxp82M1Ytn8YMVC5pdlkgU1NwlSLds\n2cPD2/9w9PWI+9HXavAilWlWSAnSxh1lnlk5xXIRGU/NXYI0Msm1oMmWi8h4au4SpDazTMtFZDw1\ndwnS6sWzMi0XkfF0QVWCNHrRVGkZkdoo5y4iEhHl3KUuV9z3Ei++9s7R10tOO4lHrj23iRU1j+Zo\nlxjpnLtMUNrYAV587R2uuO+lJlXUPJqjXWKl5i4TlDb2SstTpjnaJVZq7iJT0BztEis1d5EpaI52\niZWau0yw5LSTMi1PmeZol1ipucsEj1x77oRG3qppmRWLurntsgV0d3ZgQHdnB7ddtkBpGQmecu4i\nIhFRzl3qkle2O8t2lS8XqZ2au0wwmu0ejQCOZruBupprlu3mVYNIq9A5d5kgr2x3lu0qXy5SHzV3\nmSCvbHeW7SpfLlIfNXeZIK9sd5btKl8uUh81d5kgr2x3lu0qXy5SH11QlQlGL1g2OqmSZbt51SDS\nKpRzFxGJiHLuOYsxgx1jzSJSGzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7dTcaxBjBjvGmkWkdmru\nNYgxgx1jzSJSOzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7Srm3M1sBvBL4CMU/jJ4zN2/W7LOBcDj\nwOvFRZvd/daptqucu4hIdo3Muf8ZuNDdD5tZO/ArM9vq7ttL1nvB3ZfXUqxMj1u27GHjjv2MuNNm\nxurFs/jBigV1rxtKfj6UOkRCULG5e+Gj/eHiy/biV3Nua5Wa3bJlDw9v/8PR1yPuR1+XNu0s64aS\nnw+lDpFQVHVB1czazOxl4C3gGXffUWa188ys38y2mtmZDa1S6rZxx/6ql2dZN5T8fCh1iISiqubu\n7iPufhbQA5xtZvNLVtkFzHb3hcCPgS3ltmNma82sz8z6hoaG6qlbMhqZ5NpKueVZ1g0lPx9KHSKh\nyBSFdPeDwC+AL5Ysf9fdDxd/fgpoN7OZZf78Bnfvdfferq6uOsqWrNrMql6eZd1Q8vOh1CESiorN\n3cy6zKyz+HMH8AXgNyXrnGxW+D/fzM4ubvdA48uVWq1ePKvq5VnWDSU/H0odIqGoJi1zCvATM2uj\n0LQ3ufuTZnYdgLuvBy4HrjezD4FhYJU3ay5hKWv0Qmg1CZgs64aSnw+lDpFQaD53EZGIaD73nOWV\nqc6SL89z21nGF+O+iE7/Jnj2Vjg0ACf2wEXfgYUrm12VBEzNvQZ5Zaqz5Mvz3HaW8cW4L6LTvwl+\n/jU4Ukz+HNpfeA1q8DIpTRxWg7wy1Vny5XluO8v4YtwX0Xn21mONfdSR4cJykUmoudcgr0x1lnx5\nntvOMr4Y90V0Dg1kWy6CmntN8spUZ8mX57ntLOOLcV9E58SebMtFUHOvSV6Z6iz58jy3nWV8Me6L\n6Fz0HWgv+cuyvaOwXGQSuqBag7wy1Vny5XluO8v4YtwX0Rm9aKq0jGSgnLuISESUc5cJQsiuS+SU\nt4+GmnuLCCG7LpFT3j4quqDaIkLIrkvklLePipp7iwghuy6RU94+KmruLSKE7LpETnn7qKi5t4gQ\nsusSOeXto6ILqi0ihOy6RE55+6go5y4iEhHl3Ivyymtn2W4o85Irux6Y1DPjqY8viybsi6Sbe155\n7SzbDWVecmXXA5N6Zjz18WXRpH2R9AXVvPLaWbYbyrzkyq4HJvXMeOrjy6JJ+yLp5p5XXjvLdkOZ\nl1zZ9cCknhlPfXxZNGlfJN3c88prZ9luKPOSK7semNQz46mPL4sm7Yukm3teee0s2w1lXnJl1wOT\nemY89fFl0aR9kfQF1bzy2lm2G8q85MquByb1zHjq48uiSftCOXcRkYgo556zEPLzV9z3Ei++9s7R\n10tOO4lHrj237hpEkvLkTbDzQfARsDb43BpYfmf92w08x5/0Ofe8jGbGBw8O4xzLjG/ZPTht2y1t\n7AAvvvYOV9z3Ul01iCTlyZug74FCY4fC974HCsvrMZpdP7Qf8GPZ9f5NdZfcKGruNQghP1/a2Cst\nF2lJOx/MtrxaEeT41dxrEEJ+XkSq4CPZllcrghy/mnsNQsjPi0gVrC3b8mpFkONXc69BCPn5Jaed\nVHYbky0XaUmfW5NtebUiyPGruddgxaJubrtsAd2dHRjQ3dnBbZctaEh+vtrtPnLtuRMaudIyIiWW\n3wm91xz7pG5thdf1pmUWroQv3w0nzgKs8P3LdweVllHOXUQkIg3LuZvZDOCXwEeK6z/m7t8tWceA\nu4BlwHvAGnffVUvhlWTNl8c2h3mWud9T3xe55oizZJ/zqiPP8QWewa5L1rGlvC+mUM1NTH8GLnT3\nw2bWDvzKzLa6+/Yx61wMnFH8WgzcW/zeUFnnJI9tDvMsc7+nvi9ynQN7NPs8ajT7DBMbfF515Dm+\nlOdSzzq2lPdFBRXPuXvB4eLL9uJX6bmcS4GHiutuBzrN7JTGlpo9Xx7bHOZZ5n5PfV/kmiPOkn3O\nq448xxdBBrtmWceW8r6ooKoLqmbWZmYvA28Bz7j7jpJVuoGxHWiguKx0O2vNrM/M+oaGhjIXmzUH\nHltuPMvc76nvi1xzxFmyz3nVkef4Ishg1yzr2FLeFxVU1dzdfcTdzwJ6gLPNbH4tv8zdN7h7r7v3\ndnV1Zf7zWXPgseXGs8z9nvq+yDVHnCX7nFcdeY4vggx2zbKOLeV9UUGmKKS7HwR+AXyx5K1BYOwE\n5T3FZQ2VNV8e2xzmWeZ+T31f5JojzpJ9zquOPMcXQQa7ZlnHlvK+qKBiczezLjPrLP7cAXwB+E3J\nak8AV1nBOcAhd3+z0cVmzZfnlUfPyw9WLODKc2Yf/aTeZsaV58wum5ZJfV/kmiPOkn3Oq448xxdB\nBrtmWceW8r6ooGLO3cwWAj8B2ij8ZbDJ3W81s+sA3H19MQp5D4VP9O8BV7v7lCF25dxFRLJrWM7d\n3fuBRWWWrx/zswM3ZC1SRETykfzDOqK7cUemR5YbW0K4CSbPG3diu0krhOMRgaSbe3Q37sj0yHJj\nSwg3weR5405sN2mFcDwikfTEYdHduCPTI8uNLSHcBJPnjTux3aQVwvGIRNLNPbobd2R6ZLmxJYSb\nYPK8cSe2m7RCOB6RSLq5R3fjjkyPLDe2hHATTJ437sR2k1YIxyMSSTf36G7ckemR5caWEG6CyfPG\nndhu0grheEQi6eYe3Y07Mj2y3NgSwk0wed64E9tNWiEcj0joYR0iIhFp2E1MIi0vy4M9QhFbzaFk\n10OpowHU3EWmkuXBHqGIreZQsuuh1NEgSZ9zF6lblgd7hCK2mkPJrodSR4OouYtMJcuDPUIRW82h\nZNdDqaNB1NxFppLlwR6hiK3mULLrodTRIGruIlPJ8mCPUMRWcyjZ9VDqaBA1d5GpZHmwRyhiqzmU\n7HoodTSIcu4iIhFRzl2mT4zZ4LxqzitfHuM+lqZSc5f6xJgNzqvmvPLlMe5jaTqdc5f6xJgNzqvm\nvPLlMe5jaTo1d6lPjNngvGrOK18e4z6WplNzl/rEmA3Oq+a88uUx7mNpOjV3qU+M2eC8as4rXx7j\nPpamU3OX+sSYDc6r5rzy5THuY2k65dxFRCJSbc5dn9wlHf2b4Ifz4Xudhe/9m6Z/u3nVIJKRcu6S\nhryy4Fm2qzy6BESf3CUNeWXBs2xXeXQJiJq7pCGvLHiW7SqPLgFRc5c05JUFz7Jd5dElIGrukoa8\nsuBZtqs8ugREzV3SkFcWPMt2lUeXgFTMuZvZLOAh4JOAAxvc/a6SdS4AHgdeLy7a7O5TXkVSzl1E\nJLtGzuf+IfANd99lZscDO83sGXffW7LeC+6+vJZiJUAxzh+epeYYxxcC7bdoVGzu7v4m8Gbx5z+Z\n2T6gGyht7pKKGPPayqPnT/stKpnOuZvZHGARsKPM2+eZWb+ZbTWzMxtQmzRLjHlt5dHzp/0Wlarv\nUDWzjwE/Bb7u7u+WvL0LmO3uh81sGbAFOKPMNtYCawFmz55dc9GSsxjz2sqj50/7LSpVfXI3s3YK\njf0Rd99c+r67v+vuh4s/PwW0m9nMMuttcPded+/t6uqqs3TJTYx5beXR86f9FpWKzd3MDHgA2Ofu\nZecuNbOTi+thZmcXt3ugkYXKNIoxr608ev6036JSzWmZJcBXgD1m9nJx2beB2QDuvh64HLjezD4E\nhoFV3qy5hKV+oxfHYkpFZKk5xvGFQPstKprPXUQkIo3MuUuolDke78mbYOeDhQdSW1vh8Xb1PgVJ\nJFJq7rFS5ni8J2+CvgeOvfaRY6/V4KUFaW6ZWClzPN7OB7MtF0mcmnuslDkez0eyLRdJnJp7rJQ5\nHs/asi0XSZyae6yUOR7vc2uyLRdJnJp7rDR3+HjL74Tea459Ure2wmtdTJUWpZy7iEhElHOvwZbd\ng9zx9Cu8cXCYUzs7uHnpPFYs6m52WY2Tei4+9fGFQPs4GmruRVt2D7Ju8x6GjxTSFYMHh1m3eQ9A\nGg0+9Vx86uMLgfZxVHTOveiOp1852thHDR8Z4Y6nX2lSRQ2Wei4+9fGFQPs4KmruRW8cHM60PDqp\n5+JTH18ItI+jouZedGpnR6bl0Uk9F5/6+EKgfRwVNfeim5fOo6N9/A0vHe1t3Lx0XpMqarDUc/Gp\njy8E2sdR0QXVotGLpsmmZVKfizv18YVA+zgqyrmLiESk2py7TsuIxKB/E/xwPnyvs/C9f1Mc25am\n0WkZkdDlmS9Xdj1Z+uQuEro88+XKridLzV0kdHnmy5VdT5aau0jo8syXK7ueLDV3kdDlmS9Xdj1Z\nau4ioctz7n49FyBZyrmLiEREOXcRkRam5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSZCa\nu4hIgio2dzObZWa/MLO9ZvbfZnZjmXXMzO42s1fNrN/MPptPuVIXzdst0jKqmc/9Q+Ab7r7LzI4H\ndprZM+6+d8w6FwNnFL8WA/cWv0soNG+3SEup+Mnd3d90913Fn/8E7ANKHyx6KfCQF2wHOs3slIZX\nK7XTvN0iLSXTOXczmwMsAnaUvNUN7B/zeoCJfwFgZmvNrM/M+oaGhrJVKvXRvN0iLaXq5m5mHwN+\nCnzd3d+t5Ze5+wZ373X33q6urlo2IbXSvN0iLaWq5m5m7RQa+yPuvrnMKoPArDGve4rLJBSat1uk\npVSTljHgAWCfu985yWpPAFcVUzPnAIfc/c0G1in10rzdIi2lmrTMEuArwB4ze7m47NvAbAB3Xw88\nBSwDXgXeA65ufKlSt4Ur1cxFWkTF5u7uvwKswjoO3NCookREpD66Q1VEJEFq7iIiCVJzFxFJkJq7\niEiC1NxFRBKk5i4ikiA1dxGRBFkhot6EX2w2BPy+Kb+8spnA280uIkcaX7xSHhtofNX4W3evODlX\n05p7yMysz917m11HXjS+eKU8NtD4GkmnZUREEqTmLiKSIDX38jY0u4CcaXzxSnlsoPE1jM65i4gk\nSJ/cRUQS1NLN3czazGy3mT1Z5r0LzOyQmb1c/IrqkUVm9jsz21Osva/M+2Zmd5vZq2bWb2afbUad\ntapifLEfv04ze8zMfmNm+8zs3JL3Yz9+lcYX7fEzs3lj6n7ZzN41s6+XrJP78avmYR0puxHYB5ww\nyfsvuPvyaayn0f7e3SfL1F4MnFH8WgzcW/wek6nGB3Efv7uAbe5+uZn9NfA3Je/HfvwqjQ8iPX7u\n/gpwFhQ+QFJ45OjPSlbL/fi17Cd3M+sBvgTc3+xamuRS4CEv2A50mtkpzS5KwMxOBM6n8HhL3P0D\ndz9Yslq0x6/K8aXiIuA1dy+9YTP349eyzR34EfBN4C9TrHNe8Z9MW83szGmqq1Ec+E8z22lma8u8\n3w3sH/N6oLgsFpXGB/Eev7nAEPCvxdOG95vZR0vWifn4VTM+iPf4jbUK2Fhmee7HryWbu5ktB95y\n951TrLYLmO3uC4EfA1umpbjG+Tt3P4vCP/9uMLPzm11Qg1UaX8zH7zjgs8C97r4I+D/gW80tqaGq\nGV/Mxw+A4ummS4B/b8bvb8nmTuGh35eY2e+AR4ELzezhsSu4+7vufrj481NAu5nNnPZKa+Tug8Xv\nb1E433d2ySqDwKwxr3uKy6JQaXyRH78BYMDddxRfP0ahGY4V8/GrOL7Ij9+oi4Fd7v7HMu/lfvxa\nsrm7+zp373H3ORT+2fScu185dh0zO9nMrPjz2RT21YFpL7YGZvZRMzt+9GfgH4D/KlntCeCq4lX7\nc4BD7v7mNJdak2rGF/Pxc/f/Bfab2bzioouAvSWrRXv8qhlfzMdvjNWUPyUD03D8Wj0tM46ZXQfg\n7uuBy4HrzexDYBhY5fHc8fVJ4GfF/zeOA/7N3beVjO8pYBnwKvAecHWTaq1FNeOL+fgB/BPwSPGf\n9v8DXJ3Q8YPK44v6+BU/dHwB+Mcxy6b1+OkOVRGRBLXkaRkRkdSpuYuIJEjNXUQkQWruIiIJUnMX\nEUmQmruISILU3EVEEqTmLiKSoP8H2fNC9uxjMHwAAAAASUVORK5CYII=\n", 218 | "text/plain": [ 219 | "" 220 | ] 221 | }, 222 | "metadata": {}, 223 | "output_type": "display_data" 224 | } 225 | ], 226 | "source": [ 227 | "plt.scatter(X[:50,0],X[:50,1], label='0')\n", 228 | "plt.scatter(X[50:,0],X[50:,1], label='1')\n", 229 | "plt.legend()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "----\n", 237 | "\n" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 11, 243 | "metadata": { 244 | "collapsed": true 245 | }, 246 | "outputs": [], 247 | "source": [ 248 | "class SVM:\n", 249 | " def __init__(self, max_iter=100, kernel='linear'):\n", 250 | " self.max_iter = max_iter\n", 251 | " self._kernel = kernel\n", 252 | " \n", 253 | " def init_args(self, features, labels):\n", 254 | " self.m, self.n = features.shape\n", 255 | " self.X = features\n", 256 | " self.Y = labels\n", 257 | " self.b = 0.0\n", 258 | " \n", 259 | " # 将Ei保存在一个列表里\n", 260 | " self.alpha = np.ones(self.m)\n", 261 | " self.E = [self._E(i) for i in range(self.m)]\n", 262 | " # 松弛变量\n", 263 | " self.C = 1.0\n", 264 | " \n", 265 | " def _KKT(self, i):\n", 266 | " y_g = self._g(i)*self.Y[i]\n", 267 | " if self.alpha[i] == 0:\n", 268 | " return y_g >= 1\n", 269 | " elif 0 < self.alpha[i] < self.C:\n", 270 | " return y_g == 1\n", 271 | " else:\n", 272 | " return y_g <= 1\n", 273 | " \n", 274 | " # g(x)预测值,输入xi(X[i])\n", 275 | " def _g(self, i):\n", 276 | " r = self.b\n", 277 | " for j in range(self.m):\n", 278 | " r += self.alpha[j]*self.Y[j]*self.kernel(self.X[i], self.X[j])\n", 279 | " return r\n", 280 | " \n", 281 | " # 核函数\n", 282 | " def kernel(self, x1, x2):\n", 283 | " if self._kernel == 'linear':\n", 284 | " return sum([x1[k]*x2[k] for k in range(self.n)])\n", 285 | " elif self._kernel == 'poly':\n", 286 | " return (sum([x1[k]*x2[k] for k in range(self.n)]) + 1)**2\n", 287 | " \n", 288 | " return 0\n", 289 | " \n", 290 | " # E(x)为g(x)对输入x的预测值和y的差\n", 291 | " def _E(self, i):\n", 292 | " return self._g(i) - self.Y[i]\n", 293 | " \n", 294 | " def _init_alpha(self):\n", 295 | " # 外层循环首先遍历所有满足0= 0:\n", 308 | " j = min(range(self.m), key=lambda x: self.E[x])\n", 309 | " else:\n", 310 | " j = max(range(self.m), key=lambda x: self.E[x])\n", 311 | " return i, j\n", 312 | " \n", 313 | " def _compare(self, _alpha, L, H):\n", 314 | " if _alpha > H:\n", 315 | " return H\n", 316 | " elif _alpha < L:\n", 317 | " return L\n", 318 | " else:\n", 319 | " return _alpha \n", 320 | " \n", 321 | " def fit(self, features, labels):\n", 322 | " self.init_args(features, labels)\n", 323 | " \n", 324 | " for t in range(self.max_iter):\n", 325 | " # train\n", 326 | " i1, i2 = self._init_alpha()\n", 327 | " \n", 328 | " # 边界\n", 329 | " if self.Y[i1] == self.Y[i2]:\n", 330 | " L = max(0, self.alpha[i1]+self.alpha[i2]-self.C)\n", 331 | " H = min(self.C, self.alpha[i1]+self.alpha[i2])\n", 332 | " else:\n", 333 | " L = max(0, self.alpha[i2]-self.alpha[i1])\n", 334 | " H = min(self.C, self.C+self.alpha[i2]-self.alpha[i1])\n", 335 | " \n", 336 | " E1 = self.E[i1]\n", 337 | " E2 = self.E[i2]\n", 338 | " # eta=K11+K22-2K12\n", 339 | " eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(self.X[i2], self.X[i2]) - 2*self.kernel(self.X[i1], self.X[i2])\n", 340 | " if eta <= 0:\n", 341 | " # print('eta <= 0')\n", 342 | " continue\n", 343 | " \n", 344 | " alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (E2 - E1) / eta\n", 345 | " alpha2_new = self._compare(alpha2_new_unc, L, H)\n", 346 | " \n", 347 | " alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (self.alpha[i2] - alpha2_new)\n", 348 | " \n", 349 | " b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (alpha1_new-self.alpha[i1]) - self.Y[i2] * self.kernel(self.X[i2], self.X[i1]) * (alpha2_new-self.alpha[i2])+ self.b \n", 350 | " b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (alpha1_new-self.alpha[i1]) - self.Y[i2] * self.kernel(self.X[i2], self.X[i2]) * (alpha2_new-self.alpha[i2])+ self.b \n", 351 | " \n", 352 | " if 0 < alpha1_new < self.C:\n", 353 | " b_new = b1_new\n", 354 | " elif 0 < alpha2_new < self.C:\n", 355 | " b_new = b2_new\n", 356 | " else:\n", 357 | " # 选择中点\n", 358 | " b_new = (b1_new + b2_new) / 2\n", 359 | " \n", 360 | " # 更新参数\n", 361 | " self.alpha[i1] = alpha1_new\n", 362 | " self.alpha[i2] = alpha2_new\n", 363 | " self.b = b_new\n", 364 | " \n", 365 | " self.E[i1] = self._E(i1)\n", 366 | " self.E[i2] = self._E(i2)\n", 367 | " return 'train done!'\n", 368 | " \n", 369 | " def predict(self, data):\n", 370 | " r = self.b\n", 371 | " for i in range(self.m):\n", 372 | " r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i])\n", 373 | " \n", 374 | " return 1 if r > 0 else -1\n", 375 | " \n", 376 | " def score(self, X_test, y_test):\n", 377 | " right_count = 0\n", 378 | " for i in range(len(X_test)):\n", 379 | " result = self.predict(X_test[i])\n", 380 | " if result == y_test[i]:\n", 381 | " right_count += 1\n", 382 | " return right_count / len(X_test)\n", 383 | " \n", 384 | " def _weight(self):\n", 385 | " # linear model\n", 386 | " yx = self.Y.reshape(-1, 1)*self.X\n", 387 | " self.w = np.dot(yx.T, self.alpha)\n", 388 | " return self.w" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 12, 394 | "metadata": { 395 | "collapsed": true 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "svm = SVM(max_iter=200)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 13, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "data": { 409 | "text/plain": [ 410 | "'train done!'" 411 | ] 412 | }, 413 | "execution_count": 13, 414 | "metadata": {}, 415 | "output_type": "execute_result" 416 | } 417 | ], 418 | "source": [ 419 | "svm.fit(X_train, y_train)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 14, 425 | "metadata": {}, 426 | "outputs": [ 427 | { 428 | "data": { 429 | "text/plain": [ 430 | "0.96" 431 | ] 432 | }, 433 | "execution_count": 14, 434 | "metadata": {}, 435 | "output_type": "execute_result" 436 | } 437 | ], 438 | "source": [ 439 | "svm.score(X_test, y_test)" 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "## sklearn.svm.SVC" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 15, 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "data": { 456 | "text/plain": [ 457 | "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 458 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 459 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 460 | " tol=0.001, verbose=False)" 461 | ] 462 | }, 463 | "execution_count": 15, 464 | "metadata": {}, 465 | "output_type": "execute_result" 466 | } 467 | ], 468 | "source": [ 469 | "from sklearn.svm import SVC\n", 470 | "clf = SVC()\n", 471 | "clf.fit(X_train, y_train)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 16, 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "data": { 481 | "text/plain": [ 482 | "1.0" 483 | ] 484 | }, 485 | "execution_count": 16, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | } 489 | ], 490 | "source": [ 491 | "clf.score(X_test, y_test)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "### sklearn.svm.SVC\n", 499 | "\n", 500 | "*(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)*\n", 501 | "\n", 502 | "参数:\n", 503 | "\n", 504 | "- C:C-SVC的惩罚参数C?默认值是1.0\n", 505 | "\n", 506 | "C越大,相当于惩罚松弛变量,希望松弛变量接近0,即对误分类的惩罚增大,趋向于对训练集全分对的情况,这样对训练集测试时准确率很高,但泛化能力弱。C值小,对误分类的惩罚减小,允许容错,将他们当成噪声点,泛化能力较强。\n", 507 | "\n", 508 | "- kernel :核函数,默认是rbf,可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ \n", 509 | " \n", 510 | " – 线性:u'v\n", 511 | " \n", 512 | " – 多项式:(gamma*u'*v + coef0)^degree\n", 513 | "\n", 514 | " – RBF函数:exp(-gamma|u-v|^2)\n", 515 | "\n", 516 | " – sigmoid:tanh(gamma*u'*v + coef0)\n", 517 | "\n", 518 | "\n", 519 | "- degree :多项式poly函数的维度,默认是3,选择其他核函数时会被忽略。\n", 520 | "\n", 521 | "\n", 522 | "- gamma : ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’,则会选择1/n_features\n", 523 | "\n", 524 | "\n", 525 | "- coef0 :核函数的常数项。对于‘poly’和 ‘sigmoid’有用。\n", 526 | "\n", 527 | "\n", 528 | "- probability :是否采用概率估计?.默认为False\n", 529 | "\n", 530 | "\n", 531 | "- shrinking :是否采用shrinking heuristic方法,默认为true\n", 532 | "\n", 533 | "\n", 534 | "- tol :停止训练的误差值大小,默认为1e-3\n", 535 | "\n", 536 | "\n", 537 | "- cache_size :核函数cache缓存大小,默认为200\n", 538 | "\n", 539 | "\n", 540 | "- class_weight :类别的权重,字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)\n", 541 | "\n", 542 | "\n", 543 | "- verbose :允许冗余输出?\n", 544 | "\n", 545 | "\n", 546 | "- max_iter :最大迭代次数。-1为无限制。\n", 547 | "\n", 548 | "\n", 549 | "- decision_function_shape :‘ovo’, ‘ovr’ or None, default=None3\n", 550 | "\n", 551 | "\n", 552 | "- random_state :数据洗牌时的种子值,int值\n", 553 | "\n", 554 | "\n", 555 | "主要调节的参数有:C、kernel、degree、gamma、coef0。" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": null, 561 | "metadata": { 562 | "collapsed": true 563 | }, 564 | "outputs": [], 565 | "source": [] 566 | } 567 | ], 568 | "metadata": { 569 | "kernelspec": { 570 | "display_name": "Python 3", 571 | "language": "python", 572 | "name": "python3" 573 | }, 574 | "language_info": { 575 | "codemirror_mode": { 576 | "name": "ipython", 577 | "version": 3 578 | }, 579 | "file_extension": ".py", 580 | "mimetype": "text/x-python", 581 | "name": "python", 582 | "nbconvert_exporter": "python", 583 | "pygments_lexer": "ipython3", 584 | "version": "3.6.1" 585 | } 586 | }, 587 | "nbformat": 4, 588 | "nbformat_minor": 2 589 | } 590 | -------------------------------------------------------------------------------- /SVM/support-vector-machine.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 支持向量机" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "----\n", 15 | "分离超平面:$w^Tx+b=0$\n", 16 | "\n", 17 | "点到直线距离:$r=\\frac{|w^Tx+b|}{||w||_2}$\n", 18 | "\n", 19 | "$||w||_2$为2-范数:$||w||_2=\\sqrt[2]{\\sum^m_{i=1}w_i^2}$\n", 20 | "\n", 21 | "直线为超平面,样本可表示为:\n", 22 | "\n", 23 | "$w^Tx+b\\ \\geq+1$\n", 24 | "\n", 25 | "$w^Tx+b\\ \\leq+1$\n", 26 | "\n", 27 | "#### margin:\n", 28 | "\n", 29 | "**函数间隔**:$label(w^Tx+b)\\ or\\ y_i(w^Tx+b)$\n", 30 | "\n", 31 | "**几何间隔**:$r=\\frac{label(w^Tx+b)}{||w||_2}$,当数据被正确分类时,几何间隔就是点到超平面的距离\n", 32 | "\n", 33 | "为了求几何间隔最大,SVM基本问题可以转化为求解:($\\frac{r^*}{||w||}$为几何间隔,(${r^*}$为函数间隔)\n", 34 | "\n", 35 | "$$\\max\\ \\frac{r^*}{||w||}$$\n", 36 | "\n", 37 | "$$(subject\\ to)\\ y_i({w^T}x_i+{b})\\geq {r^*},\\ i=1,2,..,m$$\n", 38 | "\n", 39 | "分类点几何间隔最大,同时被正确分类。但这个方程并非凸函数求解,所以要先①将方程转化为凸函数,②用拉格朗日乘子法和KKT条件求解对偶问题。\n", 40 | "\n", 41 | "①转化为凸函数:\n", 42 | "\n", 43 | "先令${r^*}=1$,方便计算(参照衡量,不影响评价结果)\n", 44 | "\n", 45 | "$$\\max\\ \\frac{1}{||w||}$$\n", 46 | "\n", 47 | "$$s.t.\\ y_i({w^T}x_i+{b})\\geq {1},\\ i=1,2,..,m$$\n", 48 | "\n", 49 | "再将$\\max\\ \\frac{1}{||w||}$转化成$\\min\\ \\frac{1}{2}||w||^2$求解凸函数,1/2是为了求导之后方便计算。\n", 50 | "\n", 51 | "$$\\min\\ \\frac{1}{2}||w||^2$$\n", 52 | "\n", 53 | "$$s.t.\\ y_i(w^Tx_i+b)\\geq 1,\\ i=1,2,..,m$$\n", 54 | "\n", 55 | "②用拉格朗日乘子法和KKT条件求解最优值:\n", 56 | "\n", 57 | "$$\\min\\ \\frac{1}{2}||w||^2$$\n", 58 | "\n", 59 | "$$s.t.\\ -y_i(w^Tx_i+b)+1\\leq 0,\\ i=1,2,..,m$$\n", 60 | "\n", 61 | "整合成:\n", 62 | "\n", 63 | "$$L(w, b, \\alpha) = \\frac{1}{2}||w||^2+\\sum^m_{i=1}\\alpha_i(-y_i(w^Tx_i+b)+1)$$\n", 64 | "\n", 65 | "推导:$\\min\\ f(x)=\\min \\max\\ L(w, b, \\alpha)\\geq \\max \\min\\ L(w, b, \\alpha)$\n", 66 | "\n", 67 | "根据KKT条件:\n", 68 | "\n", 69 | "$$\\frac{\\partial }{\\partial w}L(w, b, \\alpha)=w-\\sum\\alpha_iy_ix_i=0,\\ w=\\sum\\alpha_iy_ix_i$$\n", 70 | "\n", 71 | "$$\\frac{\\partial }{\\partial b}L(w, b, \\alpha)=\\sum\\alpha_iy_i=0$$\n", 72 | "\n", 73 | "带入$ L(w, b, \\alpha)$\n", 74 | "\n", 75 | "$\\min\\ L(w, b, \\alpha)=\\frac{1}{2}||w||^2+\\sum^m_{i=1}\\alpha_i(-y_i(w^Tx_i+b)+1)$\n", 76 | "\n", 77 | "$\\qquad\\qquad\\qquad=\\frac{1}{2}w^Tw-\\sum^m_{i=1}\\alpha_iy_iw^Tx_i-b\\sum^m_{i=1}\\alpha_iy_i+\\sum^m_{i=1}\\alpha_i$\n", 78 | "\n", 79 | "$\\qquad\\qquad\\qquad=\\frac{1}{2}w^T\\sum\\alpha_iy_ix_i-\\sum^m_{i=1}\\alpha_iy_iw^Tx_i+\\sum^m_{i=1}\\alpha_i$\n", 80 | "\n", 81 | "$\\qquad\\qquad\\qquad=\\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i=1}\\alpha_iy_iw^Tx_i$\n", 82 | "\n", 83 | "$\\qquad\\qquad\\qquad=\\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)$\n", 84 | "\n", 85 | "再把max问题转成min问题:\n", 86 | "\n", 87 | "$\\max\\ \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)=\\min \\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)-\\sum^m_{i=1}\\alpha_i$\n", 88 | "\n", 89 | "$s.t.\\ \\sum^m_{i=1}\\alpha_iy_i=0,$\n", 90 | "\n", 91 | "$ \\alpha_i \\geq 0,i=1,2,...,m$\n", 92 | "\n", 93 | "以上为SVM对偶问题的对偶形式\n", 94 | "\n", 95 | "-----\n", 96 | "#### kernel\n", 97 | "\n", 98 | "在低维空间计算获得高维空间的计算结果,也就是说计算结果满足高维(满足高维,才能说明高维下线性可分)。\n", 99 | "\n", 100 | "#### soft margin & slack variable\n", 101 | "\n", 102 | "引入松弛变量$\\xi\\geq0$,对应数据点允许偏离的functional margin 的量。\n", 103 | "\n", 104 | "目标函数:$\\min\\ \\frac{1}{2}||w||^2+C\\sum\\xi_i\\qquad s.t.\\ y_i(w^Tx_i+b)\\geq1-\\xi_i$ \n", 105 | "\n", 106 | "对偶问题:\n", 107 | "\n", 108 | "$$\\max\\ \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)=\\min \\frac{1}{2}\\sum^m_{i,j=1}\\alpha_i\\alpha_jy_iy_j(x_ix_j)-\\sum^m_{i=1}\\alpha_i$$\n", 109 | "\n", 110 | "$$s.t.\\ C\\geq\\alpha_i \\geq 0,i=1,2,...,m\\quad \\sum^m_{i=1}\\alpha_iy_i=0,$$\n", 111 | "\n", 112 | "-----\n", 113 | "\n", 114 | "#### Sequential Minimal Optimization\n", 115 | "\n", 116 | "首先定义特征到结果的输出函数:$u=w^Tx+b$.\n", 117 | "\n", 118 | "因为$w=\\sum\\alpha_iy_ix_i$\n", 119 | "\n", 120 | "有$u=\\sum y_i\\alpha_iK(x_i, x)-b$\n", 121 | "\n", 122 | "\n", 123 | "----\n", 124 | "\n", 125 | "$\\max \\sum^m_{i=1}\\alpha_i-\\frac{1}{2}\\sum^m_{i=1}\\sum^m_{j=1}\\alpha_i\\alpha_jy_iy_j<\\phi(x_i)^T,\\phi(x_j)>$\n", 126 | "\n", 127 | "$s.t.\\ \\sum^m_{i=1}\\alpha_iy_i=0,$\n", 128 | "\n", 129 | "$ \\alpha_i \\geq 0,i=1,2,...,m$\n", 130 | "\n", 131 | "-----\n", 132 | "参考资料:\n", 133 | "\n", 134 | "[1] :[Lagrange Multiplier and KKT](http://blog.csdn.net/xianlingmao/article/details/7919597)\n", 135 | "\n", 136 | "[2] :[推导SVM](https://my.oschina.net/dfsj66011/blog/517766)\n", 137 | "\n", 138 | "[3] :[机器学习算法实践-支持向量机(SVM)算法原理](http://pytlab.org/2017/08/15/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95%E5%AE%9E%E8%B7%B5-%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA-SVM-%E7%AE%97%E6%B3%95%E5%8E%9F%E7%90%86/)\n", 139 | "\n", 140 | "[4] :[Python实现SVM](http://blog.csdn.net/wds2006sdo/article/details/53156589)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 1, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "name": "stderr", 150 | "output_type": "stream", 151 | "text": [ 152 | "E:\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", 153 | " \"This module will be removed in 0.20.\", DeprecationWarning)\n" 154 | ] 155 | } 156 | ], 157 | "source": [ 158 | "import numpy as np\n", 159 | "import pandas as pd\n", 160 | "from sklearn.datasets import load_iris\n", 161 | "from sklearn.cross_validation import train_test_split\n", 162 | "import matplotlib.pyplot as plt\n", 163 | "%matplotlib inline" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 2, 169 | "metadata": { 170 | "collapsed": true 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "# data\n", 175 | "def create_data():\n", 176 | " iris = load_iris()\n", 177 | " df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", 178 | " df['label'] = iris.target\n", 179 | " df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n", 180 | " data = np.array(df.iloc[:100, [0, 1, -1]])\n", 181 | " for i in range(len(data)):\n", 182 | " if data[i,-1] == 0:\n", 183 | " data[i,-1] = -1\n", 184 | " # print(data)\n", 185 | " return data[:,:2], data[:,-1]" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 9, 191 | "metadata": { 192 | "collapsed": true 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "X, y = create_data()\n", 197 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 10, 203 | "metadata": {}, 204 | "outputs": [ 205 | { 206 | "data": { 207 | "text/plain": [ 208 | "" 209 | ] 210 | }, 211 | "execution_count": 10, 212 | "metadata": {}, 213 | "output_type": "execute_result" 214 | }, 215 | { 216 | "data": { 217 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZ9JREFUeJzt3X9sHOWdx/H394yv8bWAReMWsJNLCihqSSLSugTICXGg\nXkqaQoRQlAiKQhE5ELpS0aNqKtQfqBJISLRQdEQBdBTBBeVoGihHgjgoKkUklZMg5y4pKhxtY8MV\nE5TQHKYE93t/7Dqx12vvzu6O93me/bwky97Zyfj7zMA3m5nPPGPujoiIpOWvml2AiIg0npq7iEiC\n1NxFRBKk5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSdBx1a5oZm1AHzDo7stL3rsAeBx4\nvbhos7vfOtX2Zs6c6XPmzMlUrIhIq9u5c+fb7t5Vab2qmztwI7APOGGS918obfpTmTNnDn19fRl+\nvYiImNnvq1mvqtMyZtYDfAm4v56iRERkelR7zv1HwDeBv0yxznlm1m9mW83szHIrmNlaM+szs76h\noaGstYqISJUqNnczWw685e47p1htFzDb3RcCPwa2lFvJ3Te4e6+793Z1VTxlJCIiNarmnPsS4BIz\nWwbMAE4ws4fd/crRFdz93TE/P2Vm/2JmM9397caXLCJSnyNHjjAwMMD777/f7FImNWPGDHp6emhv\nb6/pz1ds7u6+DlgHR1Mx/zy2sReXnwz80d3dzM6m8C+CAzVVJCKSs4GBAY4//njmzJmDmTW7nAnc\nnQMHDjAwMMDcuXNr2kaWtMw4ZnZdsYj1wOXA9Wb2ITAMrHI9BUREAvX+++8H29gBzIyPf/zj1HNt\nMlNzd/fngeeLP68fs/we4J6aqxAJ2Jbdg9zx9Cu8cXCYUzs7uHnpPFYs6m52WVKnUBv7qHrrq/mT\nu0gr2LJ7kHWb9zB8ZASAwYPDrNu8B0ANXoKm6QdEpnDH068cbeyjho+McMfTrzSpIknFtm3bmDdv\nHqeffjq33357w7ev5i4yhTcODmdaLlKNkZERbrjhBrZu3crevXvZuHEje/fubejv0GkZkSmc2tnB\nYJlGfmpnRxOqkWZp9HWXX//615x++ul86lOfAmDVqlU8/vjjfOYzn2lUyfrkLjKVm5fOo6O9bdyy\njvY2bl46r0kVyXQbve4yeHAY59h1ly27B2ve5uDgILNmzTr6uqenh8HB2rdXjpq7yBRWLOrmtssW\n0N3ZgQHdnR3cdtkCXUxtIbFed9FpGZEKVizqVjNvYXlcd+nu7mb//v1HXw8MDNDd3dj/xvTJXURk\nCpNdX6nnusvnP/95fvvb3/L666/zwQcf8Oijj3LJJZfUvL1y1NxFRKaQx3WX4447jnvuuYelS5fy\n6U9/mpUrV3LmmWUn0639dzR0ayIiiRk9Jdfou5SXLVvGsmXLGlFiWWruIiIVxHjdRadlREQSpOYu\nIpIgNXcRkQSpuYuIJEjNXUQkQWrukowtuwdZcvtzzP3Wf7Dk9ufqmvtDJG9f/epX+cQnPsH8+fNz\n2b6auyQhj8mdRPK0Zs0atm3bltv21dwlCbFO7iSR6N8EP5wP3+ssfO/fVPcmzz//fE466aQGFFee\nbmKSJOihGpKb/k3w86/BkeJ/S4f2F14DLFzZvLoq0Cd3SUIekzuJAPDsrcca+6gjw4XlAVNzlyTo\noRqSm0MD2ZYHQqdlJAl5Te4kwok9hVMx5ZYHTM1dkhHj5E4SgYu+M/6cO0B7R2F5HVavXs3zzz/P\n22+/TU9PD9///ve55ppr6iz2GDV3qVujHx4sEpTRi6bP3lo4FXNiT6Gx13kxdePGjQ0obnJq7lKX\n0Xz5aAxxNF8OqMFLOhauDDoZU44uqEpdlC8XCZOau9RF+XKJlbs3u4Qp1VufmrvURflyidGMGTM4\ncOBAsA3e3Tlw4AAzZsyoeRs65y51uXnpvHHn3EH5cglfT08PAwMDDA0NNbuUSc2YMYOentrjlmru\nUhflyyVG7e3tzJ07t9ll5Krq5m5mbUAfMOjuy0veM+AuYBnwHrDG3Xc1slAJl/LlIuHJ8sn9RmAf\ncEKZ9y4Gzih+LQbuLX4XaSnK/EsoqrqgamY9wJeA+ydZ5VLgIS/YDnSa2SkNqlEkCppTXkJSbVrm\nR8A3gb9M8n43MHbyhYHiMpGWocy/hKRiczez5cBb7r6z3l9mZmvNrM/M+kK+Si1SC2X+JSTVfHJf\nAlxiZr8DHgUuNLOHS9YZBGaNed1TXDaOu29w91537+3q6qqxZJEwKfMvIanY3N19nbv3uPscYBXw\nnLtfWbLaE8BVVnAOcMjd32x8uSLh0pzyEpKac+5mdh2Au68HnqIQg3yVQhTy6oZUJxIRZf4lJNas\n2297e3u9r6+vKb9bRCRWZrbT3Xsrrac7VCVYt2zZw8Yd+xlxp82M1Ytn8YMVC5pdlkgU1NwlSLds\n2cPD2/9w9PWI+9HXavAilWlWSAnSxh1lnlk5xXIRGU/NXYI0Msm1oMmWi8h4au4SpDazTMtFZDw1\ndwnS6sWzMi0XkfF0QVWCNHrRVGkZkdoo5y4iEhHl3KUuV9z3Ei++9s7R10tOO4lHrj23iRU1j+Zo\nlxjpnLtMUNrYAV587R2uuO+lJlXUPJqjXWKl5i4TlDb2SstTpjnaJVZq7iJT0BztEis1d5EpaI52\niZWau0yw5LSTMi1PmeZol1ipucsEj1x77oRG3qppmRWLurntsgV0d3ZgQHdnB7ddtkBpGQmecu4i\nIhFRzl3qkle2O8t2lS8XqZ2au0wwmu0ejQCOZruBupprlu3mVYNIq9A5d5kgr2x3lu0qXy5SHzV3\nmSCvbHeW7SpfLlIfNXeZIK9sd5btKl8uUh81d5kgr2x3lu0qXy5SH11QlQlGL1g2OqmSZbt51SDS\nKpRzFxGJiHLuOYsxgx1jzSJSGzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7dTcaxBjBjvGmkWkdmru\nNYgxgx1jzSJSOzX3GsSYwY6xZhGpnS6o1iDGDHaMNYtI7Srm3M1sBvBL4CMU/jJ4zN2/W7LOBcDj\nwOvFRZvd/daptqucu4hIdo3Muf8ZuNDdD5tZO/ArM9vq7ttL1nvB3ZfXUqxMj1u27GHjjv2MuNNm\nxurFs/jBigV1rxtKfj6UOkRCULG5e+Gj/eHiy/biV3Nua5Wa3bJlDw9v/8PR1yPuR1+XNu0s64aS\nnw+lDpFQVHVB1czazOxl4C3gGXffUWa188ys38y2mtmZDa1S6rZxx/6ql2dZN5T8fCh1iISiqubu\n7iPufhbQA5xtZvNLVtkFzHb3hcCPgS3ltmNma82sz8z6hoaG6qlbMhqZ5NpKueVZ1g0lPx9KHSKh\nyBSFdPeDwC+AL5Ysf9fdDxd/fgpoN7OZZf78Bnfvdfferq6uOsqWrNrMql6eZd1Q8vOh1CESiorN\n3cy6zKyz+HMH8AXgNyXrnGxW+D/fzM4ubvdA48uVWq1ePKvq5VnWDSU/H0odIqGoJi1zCvATM2uj\n0LQ3ufuTZnYdgLuvBy4HrjezD4FhYJU3ay5hKWv0Qmg1CZgs64aSnw+lDpFQaD53EZGIaD73nOWV\nqc6SL89z21nGF+O+iE7/Jnj2Vjg0ACf2wEXfgYUrm12VBEzNvQZ5Zaqz5Mvz3HaW8cW4L6LTvwl+\n/jU4Ukz+HNpfeA1q8DIpTRxWg7wy1Vny5XluO8v4YtwX0Xn21mONfdSR4cJykUmoudcgr0x1lnx5\nntvOMr4Y90V0Dg1kWy6CmntN8spUZ8mX57ntLOOLcV9E58SebMtFUHOvSV6Z6iz58jy3nWV8Me6L\n6Fz0HWgv+cuyvaOwXGQSuqBag7wy1Vny5XluO8v4YtwX0Rm9aKq0jGSgnLuISESUc5cJQsiuS+SU\nt4+GmnuLCCG7LpFT3j4quqDaIkLIrkvklLePipp7iwghuy6RU94+KmruLSKE7LpETnn7qKi5t4gQ\nsusSOeXto6ILqi0ihOy6RE55+6go5y4iEhHl3Ivyymtn2W4o85Irux6Y1DPjqY8viybsi6Sbe155\n7SzbDWVecmXXA5N6Zjz18WXRpH2R9AXVvPLaWbYbyrzkyq4HJvXMeOrjy6JJ+yLp5p5XXjvLdkOZ\nl1zZ9cCknhlPfXxZNGlfJN3c88prZ9luKPOSK7semNQz46mPL4sm7Yukm3teee0s2w1lXnJl1wOT\nemY89fFl0aR9kfQF1bzy2lm2G8q85MquByb1zHjq48uiSftCOXcRkYgo556zEPLzV9z3Ei++9s7R\n10tOO4lHrj237hpEkvLkTbDzQfARsDb43BpYfmf92w08x5/0Ofe8jGbGBw8O4xzLjG/ZPTht2y1t\n7AAvvvYOV9z3Ul01iCTlyZug74FCY4fC974HCsvrMZpdP7Qf8GPZ9f5NdZfcKGruNQghP1/a2Cst\nF2lJOx/MtrxaEeT41dxrEEJ+XkSq4CPZllcrghy/mnsNQsjPi0gVrC3b8mpFkONXc69BCPn5Jaed\nVHYbky0XaUmfW5NtebUiyPGruddgxaJubrtsAd2dHRjQ3dnBbZctaEh+vtrtPnLtuRMaudIyIiWW\n3wm91xz7pG5thdf1pmUWroQv3w0nzgKs8P3LdweVllHOXUQkIg3LuZvZDOCXwEeK6z/m7t8tWceA\nu4BlwHvAGnffVUvhlWTNl8c2h3mWud9T3xe55oizZJ/zqiPP8QWewa5L1rGlvC+mUM1NTH8GLnT3\nw2bWDvzKzLa6+/Yx61wMnFH8WgzcW/zeUFnnJI9tDvMsc7+nvi9ynQN7NPs8ajT7DBMbfF515Dm+\nlOdSzzq2lPdFBRXPuXvB4eLL9uJX6bmcS4GHiutuBzrN7JTGlpo9Xx7bHOZZ5n5PfV/kmiPOkn3O\nq448xxdBBrtmWceW8r6ooKoLqmbWZmYvA28Bz7j7jpJVuoGxHWiguKx0O2vNrM/M+oaGhjIXmzUH\nHltuPMvc76nvi1xzxFmyz3nVkef4Ishg1yzr2FLeFxVU1dzdfcTdzwJ6gLPNbH4tv8zdN7h7r7v3\ndnV1Zf7zWXPgseXGs8z9nvq+yDVHnCX7nFcdeY4vggx2zbKOLeV9UUGmKKS7HwR+AXyx5K1BYOwE\n5T3FZQ2VNV8e2xzmWeZ+T31f5JojzpJ9zquOPMcXQQa7ZlnHlvK+qKBiczezLjPrLP7cAXwB+E3J\nak8AV1nBOcAhd3+z0cVmzZfnlUfPyw9WLODKc2Yf/aTeZsaV58wum5ZJfV/kmiPOkn3Oq448xxdB\nBrtmWceW8r6ooGLO3cwWAj8B2ij8ZbDJ3W81s+sA3H19MQp5D4VP9O8BV7v7lCF25dxFRLJrWM7d\n3fuBRWWWrx/zswM3ZC1SRETykfzDOqK7cUemR5YbW0K4CSbPG3diu0krhOMRgaSbe3Q37sj0yHJj\nSwg3weR5405sN2mFcDwikfTEYdHduCPTI8uNLSHcBJPnjTux3aQVwvGIRNLNPbobd2R6ZLmxJYSb\nYPK8cSe2m7RCOB6RSLq5R3fjjkyPLDe2hHATTJ437sR2k1YIxyMSSTf36G7ckemR5caWEG6CyfPG\nndhu0grheEQi6eYe3Y07Mj2y3NgSwk0wed64E9tNWiEcj0joYR0iIhFp2E1MIi0vy4M9QhFbzaFk\n10OpowHU3EWmkuXBHqGIreZQsuuh1NEgSZ9zF6lblgd7hCK2mkPJrodSR4OouYtMJcuDPUIRW82h\nZNdDqaNB1NxFppLlwR6hiK3mULLrodTRIGruIlPJ8mCPUMRWcyjZ9VDqaBA1d5GpZHmwRyhiqzmU\n7HoodTSIcu4iIhFRzl2mT4zZ4LxqzitfHuM+lqZSc5f6xJgNzqvmvPLlMe5jaTqdc5f6xJgNzqvm\nvPLlMe5jaTo1d6lPjNngvGrOK18e4z6WplNzl/rEmA3Oq+a88uUx7mNpOjV3qU+M2eC8as4rXx7j\nPpamU3OX+sSYDc6r5rzy5THuY2k65dxFRCJSbc5dn9wlHf2b4Ifz4Xudhe/9m6Z/u3nVIJKRcu6S\nhryy4Fm2qzy6BESf3CUNeWXBs2xXeXQJiJq7pCGvLHiW7SqPLgFRc5c05JUFz7Jd5dElIGrukoa8\nsuBZtqs8ugREzV3SkFcWPMt2lUeXgFTMuZvZLOAh4JOAAxvc/a6SdS4AHgdeLy7a7O5TXkVSzl1E\nJLtGzuf+IfANd99lZscDO83sGXffW7LeC+6+vJZiJUAxzh+epeYYxxcC7bdoVGzu7v4m8Gbx5z+Z\n2T6gGyht7pKKGPPayqPnT/stKpnOuZvZHGARsKPM2+eZWb+ZbTWzMxtQmzRLjHlt5dHzp/0Wlarv\nUDWzjwE/Bb7u7u+WvL0LmO3uh81sGbAFOKPMNtYCawFmz55dc9GSsxjz2sqj50/7LSpVfXI3s3YK\njf0Rd99c+r67v+vuh4s/PwW0m9nMMuttcPded+/t6uqqs3TJTYx5beXR86f9FpWKzd3MDHgA2Ofu\nZecuNbOTi+thZmcXt3ugkYXKNIoxr608ev6036JSzWmZJcBXgD1m9nJx2beB2QDuvh64HLjezD4E\nhoFV3qy5hKV+oxfHYkpFZKk5xvGFQPstKprPXUQkIo3MuUuolDke78mbYOeDhQdSW1vh8Xb1PgVJ\nJFJq7rFS5ni8J2+CvgeOvfaRY6/V4KUFaW6ZWClzPN7OB7MtF0mcmnuslDkez0eyLRdJnJp7rJQ5\nHs/asi0XSZyae6yUOR7vc2uyLRdJnJp7rDR3+HjL74Tea459Ure2wmtdTJUWpZy7iEhElHOvwZbd\ng9zx9Cu8cXCYUzs7uHnpPFYs6m52WY2Tei4+9fGFQPs4GmruRVt2D7Ju8x6GjxTSFYMHh1m3eQ9A\nGg0+9Vx86uMLgfZxVHTOveiOp1852thHDR8Z4Y6nX2lSRQ2Wei4+9fGFQPs4KmruRW8cHM60PDqp\n5+JTH18ItI+jouZedGpnR6bl0Uk9F5/6+EKgfRwVNfeim5fOo6N9/A0vHe1t3Lx0XpMqarDUc/Gp\njy8E2sdR0QXVotGLpsmmZVKfizv18YVA+zgqyrmLiESk2py7TsuIxKB/E/xwPnyvs/C9f1Mc25am\n0WkZkdDlmS9Xdj1Z+uQuEro88+XKridLzV0kdHnmy5VdT5aau0jo8syXK7ueLDV3kdDlmS9Xdj1Z\nau4ioctz7n49FyBZyrmLiEREOXcRkRam5i4ikiA1dxGRBKm5i4gkSM1dRCRBau4iIglScxcRSZCa\nu4hIgio2dzObZWa/MLO9ZvbfZnZjmXXMzO42s1fNrN/MPptPuVIXzdst0jKqmc/9Q+Ab7r7LzI4H\ndprZM+6+d8w6FwNnFL8WA/cWv0soNG+3SEup+Mnd3d90913Fn/8E7ANKHyx6KfCQF2wHOs3slIZX\nK7XTvN0iLSXTOXczmwMsAnaUvNUN7B/zeoCJfwFgZmvNrM/M+oaGhrJVKvXRvN0iLaXq5m5mHwN+\nCnzd3d+t5Ze5+wZ373X33q6urlo2IbXSvN0iLaWq5m5m7RQa+yPuvrnMKoPArDGve4rLJBSat1uk\npVSTljHgAWCfu985yWpPAFcVUzPnAIfc/c0G1in10rzdIi2lmrTMEuArwB4ze7m47NvAbAB3Xw88\nBSwDXgXeA65ufKlSt4Ur1cxFWkTF5u7uvwKswjoO3NCookREpD66Q1VEJEFq7iIiCVJzFxFJkJq7\niEiC1NxFRBKk5i4ikiA1dxGRBFkhot6EX2w2BPy+Kb+8spnA280uIkcaX7xSHhtofNX4W3evODlX\n05p7yMysz917m11HXjS+eKU8NtD4GkmnZUREEqTmLiKSIDX38jY0u4CcaXzxSnlsoPE1jM65i4gk\nSJ/cRUQS1NLN3czazGy3mT1Z5r0LzOyQmb1c/IrqkUVm9jsz21Osva/M+2Zmd5vZq2bWb2afbUad\ntapifLEfv04ze8zMfmNm+8zs3JL3Yz9+lcYX7fEzs3lj6n7ZzN41s6+XrJP78avmYR0puxHYB5ww\nyfsvuPvyaayn0f7e3SfL1F4MnFH8WgzcW/wek6nGB3Efv7uAbe5+uZn9NfA3Je/HfvwqjQ8iPX7u\n/gpwFhQ+QFJ45OjPSlbL/fi17Cd3M+sBvgTc3+xamuRS4CEv2A50mtkpzS5KwMxOBM6n8HhL3P0D\ndz9Yslq0x6/K8aXiIuA1dy+9YTP349eyzR34EfBN4C9TrHNe8Z9MW83szGmqq1Ec+E8z22lma8u8\n3w3sH/N6oLgsFpXGB/Eev7nAEPCvxdOG95vZR0vWifn4VTM+iPf4jbUK2Fhmee7HryWbu5ktB95y\n951TrLYLmO3uC4EfA1umpbjG+Tt3P4vCP/9uMLPzm11Qg1UaX8zH7zjgs8C97r4I+D/gW80tqaGq\nGV/Mxw+A4ummS4B/b8bvb8nmTuGh35eY2e+AR4ELzezhsSu4+7vufrj481NAu5nNnPZKa+Tug8Xv\nb1E433d2ySqDwKwxr3uKy6JQaXyRH78BYMDddxRfP0ahGY4V8/GrOL7Ij9+oi4Fd7v7HMu/lfvxa\nsrm7+zp373H3ORT+2fScu185dh0zO9nMrPjz2RT21YFpL7YGZvZRMzt+9GfgH4D/KlntCeCq4lX7\nc4BD7v7mNJdak2rGF/Pxc/f/Bfab2bzioouAvSWrRXv8qhlfzMdvjNWUPyUD03D8Wj0tM46ZXQfg\n7uuBy4HrzexDYBhY5fHc8fVJ4GfF/zeOA/7N3beVjO8pYBnwKvAecHWTaq1FNeOL+fgB/BPwSPGf\n9v8DXJ3Q8YPK44v6+BU/dHwB+Mcxy6b1+OkOVRGRBLXkaRkRkdSpuYuIJEjNXUQkQWruIiIJUnMX\nEUmQmruISILU3EVEEqTmLiKSoP8H2fNC9uxjMHwAAAAASUVORK5CYII=\n", 218 | "text/plain": [ 219 | "" 220 | ] 221 | }, 222 | "metadata": {}, 223 | "output_type": "display_data" 224 | } 225 | ], 226 | "source": [ 227 | "plt.scatter(X[:50,0],X[:50,1], label='0')\n", 228 | "plt.scatter(X[50:,0],X[50:,1], label='1')\n", 229 | "plt.legend()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "----\n", 237 | "\n" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 11, 243 | "metadata": { 244 | "collapsed": true 245 | }, 246 | "outputs": [], 247 | "source": [ 248 | "class SVM:\n", 249 | " def __init__(self, max_iter=100, kernel='linear'):\n", 250 | " self.max_iter = max_iter\n", 251 | " self._kernel = kernel\n", 252 | " \n", 253 | " def init_args(self, features, labels):\n", 254 | " self.m, self.n = features.shape\n", 255 | " self.X = features\n", 256 | " self.Y = labels\n", 257 | " self.b = 0.0\n", 258 | " \n", 259 | " # 将Ei保存在一个列表里\n", 260 | " self.alpha = np.ones(self.m)\n", 261 | " self.E = [self._E(i) for i in range(self.m)]\n", 262 | " # 松弛变量\n", 263 | " self.C = 1.0\n", 264 | " \n", 265 | " def _KKT(self, i):\n", 266 | " y_g = self._g(i)*self.Y[i]\n", 267 | " if self.alpha[i] == 0:\n", 268 | " return y_g >= 1\n", 269 | " elif 0 < self.alpha[i] < self.C:\n", 270 | " return y_g == 1\n", 271 | " else:\n", 272 | " return y_g <= 1\n", 273 | " \n", 274 | " # g(x)预测值,输入xi(X[i])\n", 275 | " def _g(self, i):\n", 276 | " r = self.b\n", 277 | " for j in range(self.m):\n", 278 | " r += self.alpha[j]*self.Y[j]*self.kernel(self.X[i], self.X[j])\n", 279 | " return r\n", 280 | " \n", 281 | " # 核函数\n", 282 | " def kernel(self, x1, x2):\n", 283 | " if self._kernel == 'linear':\n", 284 | " return sum([x1[k]*x2[k] for k in range(self.n)])\n", 285 | " elif self._kernel == 'poly':\n", 286 | " return (sum([x1[k]*x2[k] for k in range(self.n)]) + 1)**2\n", 287 | " \n", 288 | " return 0\n", 289 | " \n", 290 | " # E(x)为g(x)对输入x的预测值和y的差\n", 291 | " def _E(self, i):\n", 292 | " return self._g(i) - self.Y[i]\n", 293 | " \n", 294 | " def _init_alpha(self):\n", 295 | " # 外层循环首先遍历所有满足0= 0:\n", 308 | " j = min(range(self.m), key=lambda x: self.E[x])\n", 309 | " else:\n", 310 | " j = max(range(self.m), key=lambda x: self.E[x])\n", 311 | " return i, j\n", 312 | " \n", 313 | " def _compare(self, _alpha, L, H):\n", 314 | " if _alpha > H:\n", 315 | " return H\n", 316 | " elif _alpha < L:\n", 317 | " return L\n", 318 | " else:\n", 319 | " return _alpha \n", 320 | " \n", 321 | " def fit(self, features, labels):\n", 322 | " self.init_args(features, labels)\n", 323 | " \n", 324 | " for t in range(self.max_iter):\n", 325 | " # train\n", 326 | " i1, i2 = self._init_alpha()\n", 327 | " \n", 328 | " # 边界\n", 329 | " if self.Y[i1] == self.Y[i2]:\n", 330 | " L = max(0, self.alpha[i1]+self.alpha[i2]-self.C)\n", 331 | " H = min(self.C, self.alpha[i1]+self.alpha[i2])\n", 332 | " else:\n", 333 | " L = max(0, self.alpha[i2]-self.alpha[i1])\n", 334 | " H = min(self.C, self.C+self.alpha[i2]-self.alpha[i1])\n", 335 | " \n", 336 | " E1 = self.E[i1]\n", 337 | " E2 = self.E[i2]\n", 338 | " # eta=K11+K22-2K12\n", 339 | " eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(self.X[i2], self.X[i2]) - 2*self.kernel(self.X[i1], self.X[i2])\n", 340 | " if eta <= 0:\n", 341 | " # print('eta <= 0')\n", 342 | " continue\n", 343 | " \n", 344 | " alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (E2 - E1) / eta\n", 345 | " alpha2_new = self._compare(alpha2_new_unc, L, H)\n", 346 | " \n", 347 | " alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (self.alpha[i2] - alpha2_new)\n", 348 | " \n", 349 | " b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (alpha1_new-self.alpha[i1]) - self.Y[i2] * self.kernel(self.X[i2], self.X[i1]) * (alpha2_new-self.alpha[i2])+ self.b \n", 350 | " b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (alpha1_new-self.alpha[i1]) - self.Y[i2] * self.kernel(self.X[i2], self.X[i2]) * (alpha2_new-self.alpha[i2])+ self.b \n", 351 | " \n", 352 | " if 0 < alpha1_new < self.C:\n", 353 | " b_new = b1_new\n", 354 | " elif 0 < alpha2_new < self.C:\n", 355 | " b_new = b2_new\n", 356 | " else:\n", 357 | " # 选择中点\n", 358 | " b_new = (b1_new + b2_new) / 2\n", 359 | " \n", 360 | " # 更新参数\n", 361 | " self.alpha[i1] = alpha1_new\n", 362 | " self.alpha[i2] = alpha2_new\n", 363 | " self.b = b_new\n", 364 | " \n", 365 | " self.E[i1] = self._E(i1)\n", 366 | " self.E[i2] = self._E(i2)\n", 367 | " return 'train done!'\n", 368 | " \n", 369 | " def predict(self, data):\n", 370 | " r = self.b\n", 371 | " for i in range(self.m):\n", 372 | " r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i])\n", 373 | " \n", 374 | " return 1 if r > 0 else -1\n", 375 | " \n", 376 | " def score(self, X_test, y_test):\n", 377 | " right_count = 0\n", 378 | " for i in range(len(X_test)):\n", 379 | " result = self.predict(X_test[i])\n", 380 | " if result == y_test[i]:\n", 381 | " right_count += 1\n", 382 | " return right_count / len(X_test)\n", 383 | " \n", 384 | " def _weight(self):\n", 385 | " # linear model\n", 386 | " yx = self.Y.reshape(-1, 1)*self.X\n", 387 | " self.w = np.dot(yx.T, self.alpha)\n", 388 | " return self.w" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 12, 394 | "metadata": { 395 | "collapsed": true 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "svm = SVM(max_iter=200)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 13, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "data": { 409 | "text/plain": [ 410 | "'train done!'" 411 | ] 412 | }, 413 | "execution_count": 13, 414 | "metadata": {}, 415 | "output_type": "execute_result" 416 | } 417 | ], 418 | "source": [ 419 | "svm.fit(X_train, y_train)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 14, 425 | "metadata": {}, 426 | "outputs": [ 427 | { 428 | "data": { 429 | "text/plain": [ 430 | "0.96" 431 | ] 432 | }, 433 | "execution_count": 14, 434 | "metadata": {}, 435 | "output_type": "execute_result" 436 | } 437 | ], 438 | "source": [ 439 | "svm.score(X_test, y_test)" 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "## sklearn.svm.SVC" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 15, 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "data": { 456 | "text/plain": [ 457 | "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 458 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 459 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 460 | " tol=0.001, verbose=False)" 461 | ] 462 | }, 463 | "execution_count": 15, 464 | "metadata": {}, 465 | "output_type": "execute_result" 466 | } 467 | ], 468 | "source": [ 469 | "from sklearn.svm import SVC\n", 470 | "clf = SVC()\n", 471 | "clf.fit(X_train, y_train)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 16, 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "data": { 481 | "text/plain": [ 482 | "1.0" 483 | ] 484 | }, 485 | "execution_count": 16, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | } 489 | ], 490 | "source": [ 491 | "clf.score(X_test, y_test)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "### sklearn.svm.SVC\n", 499 | "\n", 500 | "*(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)*\n", 501 | "\n", 502 | "参数:\n", 503 | "\n", 504 | "- C:C-SVC的惩罚参数C?默认值是1.0\n", 505 | "\n", 506 | "C越大,相当于惩罚松弛变量,希望松弛变量接近0,即对误分类的惩罚增大,趋向于对训练集全分对的情况,这样对训练集测试时准确率很高,但泛化能力弱。C值小,对误分类的惩罚减小,允许容错,将他们当成噪声点,泛化能力较强。\n", 507 | "\n", 508 | "- kernel :核函数,默认是rbf,可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ \n", 509 | " \n", 510 | " – 线性:u'v\n", 511 | " \n", 512 | " – 多项式:(gamma*u'*v + coef0)^degree\n", 513 | "\n", 514 | " – RBF函数:exp(-gamma|u-v|^2)\n", 515 | "\n", 516 | " – sigmoid:tanh(gamma*u'*v + coef0)\n", 517 | "\n", 518 | "\n", 519 | "- degree :多项式poly函数的维度,默认是3,选择其他核函数时会被忽略。\n", 520 | "\n", 521 | "\n", 522 | "- gamma : ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’,则会选择1/n_features\n", 523 | "\n", 524 | "\n", 525 | "- coef0 :核函数的常数项。对于‘poly’和 ‘sigmoid’有用。\n", 526 | "\n", 527 | "\n", 528 | "- probability :是否采用概率估计?.默认为False\n", 529 | "\n", 530 | "\n", 531 | "- shrinking :是否采用shrinking heuristic方法,默认为true\n", 532 | "\n", 533 | "\n", 534 | "- tol :停止训练的误差值大小,默认为1e-3\n", 535 | "\n", 536 | "\n", 537 | "- cache_size :核函数cache缓存大小,默认为200\n", 538 | "\n", 539 | "\n", 540 | "- class_weight :类别的权重,字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)\n", 541 | "\n", 542 | "\n", 543 | "- verbose :允许冗余输出?\n", 544 | "\n", 545 | "\n", 546 | "- max_iter :最大迭代次数。-1为无限制。\n", 547 | "\n", 548 | "\n", 549 | "- decision_function_shape :‘ovo’, ‘ovr’ or None, default=None3\n", 550 | "\n", 551 | "\n", 552 | "- random_state :数据洗牌时的种子值,int值\n", 553 | "\n", 554 | "\n", 555 | "主要调节的参数有:C、kernel、degree、gamma、coef0。" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": null, 561 | "metadata": { 562 | "collapsed": true 563 | }, 564 | "outputs": [], 565 | "source": [] 566 | } 567 | ], 568 | "metadata": { 569 | "kernelspec": { 570 | "display_name": "Python 3", 571 | "language": "python", 572 | "name": "python3" 573 | }, 574 | "language_info": { 575 | "codemirror_mode": { 576 | "name": "ipython", 577 | "version": 3 578 | }, 579 | "file_extension": ".py", 580 | "mimetype": "text/x-python", 581 | "name": "python", 582 | "nbconvert_exporter": "python", 583 | "pygments_lexer": "ipython3", 584 | "version": "3.6.1" 585 | } 586 | }, 587 | "nbformat": 4, 588 | "nbformat_minor": 2 589 | } 590 | --------------------------------------------------------------------------------