├── A gd.ipynb ├── 3 kNN.ipynb ├── 6.1 logistic.ipynb ├── D bp.ipynb ├── 4 nb.ipynb ├── 9.2 gmm.ipynb ├── C lagrange.ipynb ├── 2 perceptron.ipynb ├── 1 introduction.ipynb ├── .ipynb_checkpoints └── 1 introduction-checkpoint.ipynb ├── 8 boosting.ipynb ├── B newton.ipynb ├── 9.1 em.ipynb ├── 5 dt.ipynb ├── 6.2 me.ipynb ├── 10 hmm.ipynb ├── 11 crf.ipynb └── 7 svm.ipynb /A gd.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "假设$f\\left(x\\right)$是$R^{n}$上具有一阶连续偏导数的函数。求解 \n", 10 | "\\begin{align*} \\\\ & \\min_{x \\in R^{n}} f \\left( x \\right) \\end{align*} \n", 11 | "无约束最优化问题。$f^{*}$表示目标函数$f\\left(x\\right)$的极小点。" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "由于$f\\left(x\\right)$具有一阶连续偏导数,若第$k$次迭代值为$x^{\\left(k\\right)}$,则可将$f\\left(x\\right)$在$x^{\\left(k\\right)}$附近进行一阶泰勒展开\n", 19 | "\\begin{align*} \\\\ & f\\left(x\\right) = f\\left(x^{\\left(k\\right)}\\right) + g_{k}^{T} \\left(x-x^{\\left(k\\right)}\\right)\\end{align*}\n", 20 | "其中,$g_{k}=g\\left( x^{\\left(k\\right)} \\right)=\\nabla f \\left( x^{\\left(k\\right)}\\right)$是$f\\left(x\\right)$在$x^{\\left(k\\right)}$的梯度。" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "第$k+1$次迭代值\n", 28 | "\\begin{align*} \\\\ & x^{\\left( k+1 \\right)} \\leftarrow x^{\\left( k \\right)} + \\lambda_{k} p_{k} \\end{align*} \n", 29 | "其中,$p_{k}$是搜索方向,取负梯度方向$p_{k}= - \\nabla f \\left( x^{\\left(k\\right)} \\right)$,$\\lambda_{k}$是步长,由一维搜索确定,即$\\lambda_{k}$使得\n", 30 | "\\begin{align*} \\\\ & f \\left( x^{\\left(k\\right)}+\\lambda_{k}p_{k} \\right)=\\min_{\\lambda \\geq 0} f \\left( x^{\\left(k\\right)}+\\lambda p_{k} \\right) \\end{align*} " 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "梯度下降算法: \n", 38 | "输入:目标函数$f \\left( x \\right) $,梯度函数$g\\left( x^{\\left(k\\right)} \\right)=\\nabla f \\left( x^{\\left(k\\right)}\\right)$,计算精度$\\varepsilon$ \n", 39 | "输出:$f \\left( x \\right) $的极小点$x^{*}$\n", 40 | "1. 取初值$x^{\\left(0\\right)} \\in R^{n}$,置$k=0$\n", 41 | "2. 计算$f \\left( x^{\\left(k\\right)}\\right)$\n", 42 | "3. 计算梯度$g_{k}=g\\left( x^{\\left(k\\right)} \\right)$,当$\\| g_{k} \\| < \\varepsilon $时,停止迭代,令$x^{*}=x^{\\left(k\\right)}$;否则,令$p_{k}=-g\\left(x^{\\left(k\\right)}\\right)$,求$\\lambda_{k}$,使\n", 43 | "\\begin{align*} \\\\ & f \\left( x^{\\left(k\\right)}+\\lambda_{k}p_{k} \\right)=\\min_{\\lambda \\geq 0} f \\left( x^{\\left(k\\right)}+\\lambda p_{k} \\right) \\end{align*} \n", 44 | "4. 置$x^{\\left(k+1\\right)}= x^{\\left(k\\right)}+\\lambda_{k}p_{k}$,计算$f \\left( x^{\\left(k+1\\right)} \\right)$ \n", 45 | "当$\\| f \\left( x^{\\left(k+1\\right)} \\right) - f \\left( x^{\\left(k\\right)} \\right) \\| < \\varepsilon $或$\\| x^{\\left(k+1\\right)} - x^{\\left(k\\right)} \\| < \\varepsilon $时,停止迭代,令$x^{*}=x^{k+1}$\n", 46 | "5. 否则,置$k=k+1$,转3." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "collapsed": true 54 | }, 55 | "outputs": [], 56 | "source": [] 57 | } 58 | ], 59 | "metadata": { 60 | "kernelspec": { 61 | "display_name": "Python 2", 62 | "language": "python", 63 | "name": "python2" 64 | }, 65 | "language_info": { 66 | "codemirror_mode": { 67 | "name": "ipython", 68 | "version": 2 69 | }, 70 | "file_extension": ".py", 71 | "mimetype": "text/x-python", 72 | "name": "python", 73 | "nbconvert_exporter": "python", 74 | "pygments_lexer": "ipython2", 75 | "version": "2.7.11" 76 | } 77 | }, 78 | "nbformat": 4, 79 | "nbformat_minor": 0 80 | } 81 | -------------------------------------------------------------------------------- /3 kNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "kNN算法: \n", 10 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} \\subseteq R^{n}$是实例的特征向量,$ y_{i} \\in \\mathcal{Y} = \\left\\{ c_{1}, c_{2}, \\cdots, c_{K} \\right\\}$是实例的类别,$ i = 1, 2, \\cdots, N$;实例特征向量$x$ \n", 11 | "输出:实例$x$所属的类$y$ \n", 12 | "1. 根据给定的距离度量,在训练集$T$中找出与$x$最近邻的$k$个点,涵盖这$k$点的$x$的邻域记作$N_{k} \\left( x \\right)$; \n", 13 | "2. 在$N_{k} \\left( x \\right)$中根据分类决策规则决定$x$的类别$y$:\n", 14 | "\\begin{align*} \\\\ & y = \\arg \\max_{c_{j}} \\sum_{x_{i} \\in N_{k} \\left( x \\right)} I \\left( y_{i} = c_{j} \\right), \\quad i=1,2, \\cdots, N; \\quad j=1,2,\\cdots,K \\end{align*} " 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "设特征空间$\\mathcal{X}$是$n$维实数向量空间$R^{n}$,$x_{i},x_{j} \\in \\mathcal{X},x_{i} = \\left( x_{i}^{\\left( 1 \\right)},x_{i}^{\\left( 2 \\right) },\\cdots,x_{i}^{\\left( n \\right) } \\right)^{T},x_{j} = \\left( x_{j}^{\\left( 1 \\right)},x_{j}^{\\left( 2 \\right) },\\cdots,x_{j}^{\\left( n \\right) } \\right)^{T}$,$x_{i},x_{j}$的$L_{p}$距离\n", 22 | "\\begin{align*} \\\\ & L_{p} \\left( x_{i},x_{j} \\right) = \\left( \\sum_{l=1}^{N} \\left| x_{i}^{\\left(l\\right)} - x_{j}^{\\left( l \\right)} \\right|^{p} \\right)^{\\dfrac{1}{p}}\\end{align*} \n", 23 | "其中,$p \\geq 1$。当$p=2$时,称为欧氏距离,即\n", 24 | "\\begin{align*} \\\\ & L_{2} \\left( x_{i},x_{j} \\right) = \\left( \\sum_{l=1}^{N} \\left| x_{i}^{\\left(l\\right)} - x_{j}^{\\left( l \\right)} \\right|^{2} \\right)^{\\dfrac{1}{2}}\\end{align*} \n", 25 | "当$p=1$时,称为曼哈顿距离,即\n", 26 | "\\begin{align*} \\\\ & L_{1} \\left( x_{i},x_{j} \\right) = \\sum_{l=1}^{N} \\left| x_{i}^{\\left(l\\right)} - x_{j}^{\\left( l \\right)} \\right| \\end{align*} \n", 27 | "当$p=\\infty$时,是各个坐标距离的最大值,即\n", 28 | "\\begin{align*} \\\\ & L_{\\infty} \\left( x_{i},x_{j} \\right) = \\max_{l} \\left| x_{i}^{\\left(l\\right)} - x_{j}^{\\left( l \\right)} \\right| \\end{align*} " 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": { 34 | "collapsed": true 35 | }, 36 | "source": [ 37 | "多数表决规则:如果分类的损失函数为0-1损失函数,分类函数\n", 38 | "\\begin{align*} \\\\ & f: R^{n} \\to \\left\\{ c_{1}, c_{2}, \\cdots, c_{K} \\right\\} \\end{align*} \n", 39 | "则误分类的概率\n", 40 | "\\begin{align*} \\\\ & P \\left( Y \\neq f \\left( X \\right) \\right) = 1 - P \\left( Y = f\\left( X \\right) \\right) \\end{align*} \n", 41 | "对给定的实例$x \\in \\mathcal{X}$,其最近邻的$k$个训练实例点构成的集合$N_{k} \\left( x \\right)$。如果涵盖$N_{k} \\left( x \\right)$的区域的类别是$c_{j}$,则误分类率\n", 42 | "\\begin{align*} \\\\ & \\dfrac{1}{k} \\sum_{x_{i} \\in N_{k} \\left( x \\right)} I \\left( y_{i} \\neq c_{j}\\right) = 1 -\\dfrac{1}{k} \\sum_{x_{i} \\in N_{k} \\left( x \\right)} I \\left( y_{i} = c_{j}\\right) \\end{align*} \n", 43 | "即经验风险最小化等价于多数表决规则。" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "平衡kd树构造算法: \n", 51 | "输入:$k$维空间数据集$T = \\left\\{ x_{1}, x_{2}, \\cdots, x_{N} \\right\\}$,其中$x_{i} = \\left( x_{i}^{\\left(1\\right)}, x_{i}^{\\left(1\\right)},\\cdots,x_{i}^{\\left(k\\right)} \\right)^{T}, i = 1, 2, \\cdots, N$; \n", 52 | "输出:kd树 \n", 53 | "1. 开始:构造根结点,根结点对应于包涵$T$的$k$维空间的超矩形区域。 \n", 54 | "选择$x^{\\left( 1 \\right)}$为坐标轴,以$T$中所欲实例的$x^{\\left( 1 \\right)}$坐标的中位数为切分点,将根结点对应的超矩形区域切分成两个子区域。切分由通过切分点并与坐标轴$x^{\\left( 1 \\right)}$垂直的超平面实现。 \n", 55 | "由根结点生成深度为1的左、右子结点:坐子结点对应坐标$x^{\\left( 1 \\right)}$小于切分点的子区域,右子结点对应于坐标$x^{\\left( 1 \\right)}$大与切分点的子区域。 \n", 56 | "将落在切分超平面上的实例点保存在跟结点。\n", 57 | "2. 重复:对深度为j的结点,选择$x^{\\left( l \\right)}$为切分坐标轴,$l = j \\left(\\bmod k \\right) + 1 $,以该结点的区域中所由实例的$x^{\\left( l \\right)}$坐标的中位数为切分点,将该结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{\\left( l \\right)}$垂直的超平面实现。 \n", 58 | "由根结点生成深度为$j+1$的左、右子结点:坐子结点对应坐标$x^{\\left( l \\right)}$小于切分点的子区域,右子结点对应于坐标$x^{\\left( l \\right)}$大与切分点的子区域。 \n", 59 | "将落在切分超平面上的实例点保存在跟结点。\n", 60 | "3. 直到两个子区域没有实例存在时停止。" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "kd树的最近邻搜索算法: \n", 68 | "输入:kd树;目标点$x$ \n", 69 | "输出:$x$的最近邻 \n", 70 | "1. 在kd树中找出包含目标点$x$的叶结点:从跟结点出发,递归地向下访问kd树。若目标点$x$当前维的坐标小于切分点的坐标,则移动到左子结点,否则移动到右子结点。直到子结点为叶结点为止。 \n", 71 | "2. 以此叶结点为“当前最近点”。\n", 72 | "3. 递归地向上回退,在每个结点进行以下操作: \n", 73 | "3.1 如果该结点保存的实例点比当前最近点距离目标点更近,则以该实例点为“当前最近点”。 \n", 74 | "3.2 当前最近点一定存在于该结点一个子结点对应的区域。检查该子结点的父结点的另一子结点对应的区域是否有更近的点。具体地,检查另一子结点对应的区域是否与以目标点为球心、以目标点与“当前最近点”间的距离为半径的超球体相交。 \n", 75 | "如果相交,可能在另一个子结点对应的区域内存在距目标点更近的点,移动到另一个子结点。接着,递归地进行最近邻搜索; \n", 76 | "如果不相交,向上回退。 \n", 77 | "4. 当回退到根结点时,搜索结束。最后的“当前最近点”即为$x$的当前最近邻点。" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": { 84 | "collapsed": true 85 | }, 86 | "outputs": [], 87 | "source": [] 88 | } 89 | ], 90 | "metadata": { 91 | "kernelspec": { 92 | "display_name": "Python 2", 93 | "language": "python", 94 | "name": "python2" 95 | }, 96 | "language_info": { 97 | "codemirror_mode": { 98 | "name": "ipython", 99 | "version": 2 100 | }, 101 | "file_extension": ".py", 102 | "mimetype": "text/x-python", 103 | "name": "python", 104 | "nbconvert_exporter": "python", 105 | "pygments_lexer": "ipython2", 106 | "version": "2.7.11" 107 | } 108 | }, 109 | "nbformat": 4, 110 | "nbformat_minor": 0 111 | } 112 | -------------------------------------------------------------------------------- /6.1 logistic.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "二项逻辑斯谛回归模型是如下的条件概率分布:\n", 10 | "\\begin{align*} \\\\& P \\left( Y = 1 | x \\right) = \\dfrac{1}{1+\\exp{-\\left(w \\cdot x + b \\right)}}\n", 11 | "\\\\ & \\quad\\quad\\quad\\quad = \\dfrac{\\exp{\\left(w \\cdot x + b \\right)}}{\\left( 1+\\exp{-\\left(w \\cdot x + b \\right)}\\right) \\cdot \\exp{\\left(w \\cdot x + b \\right)}}\n", 12 | "\\\\ & \\quad\\quad\\quad\\quad = \\dfrac{\\exp{\\left(w \\cdot x + b \\right)}}{1+\\exp{\\left( w \\cdot x + b \\right)}}\\\\& P \\left( Y = 0 | x \\right) = 1- P \\left( Y = 1 | x \\right)\n", 13 | "\\\\ & \\quad\\quad\\quad\\quad=1- \\dfrac{\\exp{\\left(w \\cdot x + b \\right)}}{1+\\exp{\\left( w \\cdot x + b \\right)}}\n", 14 | "\\\\ & \\quad\\quad\\quad\\quad=\\dfrac{1}{1+\\exp{\\left( w \\cdot x + b \\right)}}\\end{align*}\n", 15 | "其中,$x \\in R^{n}$是输入,$Y \\in \\left\\{ 0, 1 \\right\\}$是输出,$w \\in R^{n}$和$b \\in R$是参数,$w$称为权值向量,$b$称为偏置,$w \\cdot x$为$w$和$b$的内积。" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "source": [ 24 | "可将权值权值向量和输入向量加以扩充,即$w = \\left( w^{\\left(1\\right)},w^{\\left(2\\right)},\\cdots,w^{\\left(n\\right)},b \\right)^{T}$,$x = \\left( x^{\\left(1\\right)},x^{\\left(2\\right)},\\cdots,x^{\\left(n\\right)},1 \\right)^{T}$,则逻辑斯谛回归模型:\n", 25 | "\\begin{align*} \\\\& P \\left( Y = 1 | x \\right) = \\dfrac{\\exp{\\left(w \\cdot x \\right)}}{1+\\exp{\\left( w \\cdot x \\right)}}\\\\& P \\left( Y = 0 | x \\right) =\\dfrac{1}{1+\\exp{\\left( w \\cdot x \\right)}}\\end{align*}" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "一个事件的几率是指事件发生的概率$p$与事件不发生的概率$1-p$的比值,即\n", 33 | "\\begin{align*} \\\\& \\dfrac{p}{1-p}\\end{align*} \n", 34 | "该事件的对数几率(logit函数)\n", 35 | "\\begin{align*} \\\\& logit\\left( p \\right) = \\log \\dfrac{p}{1-p}\\end{align*} " 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "对于逻辑斯谛回归模型\n", 43 | "\\begin{align*} \\\\& \\log \\dfrac{P \\left( Y = 1 | x \\right)}{1-P \\left( Y = 1 | x \\right)} = w \\cdot x\\end{align*} \n", 44 | "即输出$Y=1$的对数几率是输入$x$的线性函数。" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "给定训练数据集\n", 52 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 53 | "其中,$x_{i} \\in R^{n+1}, y_{i} \\in \\left\\{ 0, 1 \\right\\}, i = 1, 2, \\cdots, N$。" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "设:\n", 61 | "\\begin{align*} \\\\& P \\left( Y =1 | x \\right) = \\pi \\left( x \\right) ,\\quad P \\left( Y =0 | x \\right) = 1 - \\pi \\left( x \\right) \\end{align*} \n", 62 | "似然函数\n", 63 | "\\begin{align*} \\\\& l \\left( w \\right) = \\prod_{i=1}^{N} P \\left( y_{i} | x_{i} \\right) \n", 64 | "\\\\ & = P \\left( Y = 1 | x_{i} , w \\right) \\cdot P \\left( Y = 0 | x_{i}, w \\right) \n", 65 | "\\\\ & = \\prod_{i=1}^{N} \\left[ \\pi \\left( x_{i} \\right) \\right]^{y_{i}}\\left[ 1 - \\pi \\left( x_{i} \\right) \\right]^{1 - y_{i}}\\end{align*} " 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "对数似然函数\n", 73 | "\\begin{align*} \\\\& L \\left( w \\right) = \\log l \\left( w \\right) \n", 74 | "\\\\ & = \\sum_{i=1}^{N} \\left[ y_{i} \\log \\pi \\left( x_{i} \\right) + \\left( 1 - y_{i} \\right) \\log \\left( 1 - \\pi \\left( x_{i} \\right) \\right) \\right]\n", 75 | "\\\\ & = \\sum_{i=1}^{N} \\left[ y_{i} \\log \\dfrac{\\pi \\left( x_{i} \\right)}{1- \\pi \\left( x_{i} \\right)} + \\log \\left( 1 - \\pi \\left( x_{i} \\right) \\right) \\right]\n", 76 | "\\\\ & = \\sum_{i=1}^{N} \\left[ y_{i} \\left( w \\cdot x_{i} \\right) - \\log \\left( 1 + \\exp \\left( w \\cdot x \\right) \\right) \\right]\\end{align*} " 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "假设$w$的极大似然估计值是$\\hat{w}$,则学得得逻辑斯谛回归模型\n", 84 | "\\begin{align*} \\\\& P \\left( Y = 1 | x \\right) = \\dfrac{\\exp{\\left(\\hat{w} \\cdot x \\right)}}{1+\\exp{\\left( \\hat{w} \\cdot x \\right)}}\\\\& P \\left( Y = 0 | x \\right) =\\dfrac{1}{1+\\exp{\\left( \\hat{w} \\cdot x \\right)}}\\end{align*}" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": { 90 | "collapsed": true 91 | }, 92 | "source": [ 93 | "假设离散型随机变量$Y$的取值集合$\\left\\{ 1, 2, \\cdots, K \\right\\}$,则多项逻辑斯谛回归模型\n", 94 | "\\begin{align*} \\\\& P \\left( Y = k | x \\right) = \\dfrac{\\exp{\\left(w_{k} \\cdot x \\right)}}{1+ \\sum_{k=1}^{K-1}\\exp{\\left( w_{k} \\cdot x \\right)}}, \\quad k=1,2,\\cdots,K-1\n", 95 | "\\\\ & P \\left( Y = K | x \\right) = 1 - \\sum_{k=1}^{K-1} P \\left( Y = k | x \\right)\n", 96 | "\\\\ & = 1 - \\sum_{k=1}^{K-1} \\dfrac{\\exp{\\left(w_{k} \\cdot x \\right)}}{1+ \\sum_{k=1}^{K-1}\\exp{\\left( w_{k} \\cdot x \\right)}}\n", 97 | "\\\\ & = \\dfrac{1}{1+ \\sum_{k=1}^{K-1}\\exp{\\left( w_{k} \\cdot x \\right)}}\\end{align*}" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": { 104 | "collapsed": true 105 | }, 106 | "outputs": [], 107 | "source": [] 108 | } 109 | ], 110 | "metadata": { 111 | "kernelspec": { 112 | "display_name": "Python 2", 113 | "language": "python", 114 | "name": "python2" 115 | }, 116 | "language_info": { 117 | "codemirror_mode": { 118 | "name": "ipython", 119 | "version": 2 120 | }, 121 | "file_extension": ".py", 122 | "mimetype": "text/x-python", 123 | "name": "python", 124 | "nbconvert_exporter": "python", 125 | "pygments_lexer": "ipython2", 126 | "version": "2.7.11" 127 | } 128 | }, 129 | "nbformat": 4, 130 | "nbformat_minor": 0 131 | } 132 | -------------------------------------------------------------------------------- /D bp.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "输入 \\begin{align*} & x=\\left( x_{1},x_{2},\\ldots ,x_{j},\\ldots ,x_{n}\\right) ^{T} \\end{align*} " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "输入层(层 1) \\begin{align*} & a^{1}=\\left( a_{1}^{1},a_{2}^{1},\\ldots ,a_{j}^{1},\\ldots \\ldots ,a_{n}^{1}\\right) ^{T}\\\\ & a_{j}^{1}=x_{j}\\quad\\left( j=1,2,\\ldots ,n\\right) \\end{align*}" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "隐藏层(层 2) \\begin{align*} & a^{2}=\\left( a_{1}^{2},a_{2}^{2},\\ldots ,a_{j}^{2},\\ldots \\ldots ,a_{m}^{2}\\right) ^{T}\\\\ & a_{j}^{2}=\\sigma \\left( z_{j}^{2}\\right) \\\\ & z_{j}^{2}= \\sum _{k}w_{jk}^{2}\\cdot a_{k}^{1}+b_{j}^{2}\\quad\\left( j=1,2,\\ldots ,m\\right) \\end{align*}" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "输出层(层 3) \\begin{align*} & a^{3}=\\left( a_{1}^{3},a_{2}^{3},\\ldots ,a_{j}^{3},\\ldots \\ldots ,a_{p}^{3}\\right) ^{T}\\\\ & a_{j}^{3}=\\sigma \\left( z_{j}^{3}\\right) \\\\ & z_{j}^{3}= \\sum _{k}w_{jk}^{3}\\cdot a_{k}^{2}+b_{j}^{3}\\quad\\left( j=1,2,\\ldots ,p\\right) \\end{align*}" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "预测输出 \\begin{align*} & \\hat y=\\left( \\hat y_{1},\\hat y_{2},\\ldots ,\\hat y_{j},\\ldots ,\\hat y_{p}\\right) ^{T}\\\\ & \\hat y_{j}=a_{j}^{3}\\quad\\left( j=1,2,\\ldots ,p\\right)\\end{align*} " 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "实际输出 \\begin{align*} & y=\\left( y_{1},y_{2},\\ldots ,y_{j},\\ldots ,y_{p}\\right) ^{T} \\quad\\left( j=1,2,\\ldots ,p\\right) \\end{align*} " 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "单个样本x的损失\\begin{align*} & C_{x}=\\dfrac {1} {2}\\left\\| y-\\widehat {y}\\right\\| ^{2}=\\dfrac {1} {2}\\sum _{j}\\left( y_{j}-\\widehat {y}_{j}\\right) ^{2} \\end{align*} " 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": { 55 | "collapsed": true 56 | }, 57 | "source": [ 58 | "经验损失 \\begin{align*} & C=\\dfrac {1} {N}\\sum _{x}C_{x} \\end{align*} " 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "第l层的第j个神经元上的误差 \\begin{align*} & \\delta _{j}^{l}\\equiv \\dfrac {\\partial C_{x}} {\\partial z_{j}^{l}} \\quad\\left( l=2,3\\right)\\end{align*} " 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "输出层误差 \\begin{align*} & \\delta _{j}^{3}=\\dfrac {\\partial C_{x}} {\\partial z_{j}^{3}} \\\\ & =\\dfrac {\\partial C_{x}} {\\partial a_{j}^{3}}\\cdot\\dfrac {\\partial a_{j}^{3}} {\\partial z_{j}^{3}} \\\\ & =\\dfrac {\\partial C_{x}} {\\partial a_{j}^{3}}\\cdot \\sigma '\\left( z_{j}^{3}\\right) \\\\& =\\dfrac {\\partial \\left( \\dfrac {1} {2}\\sum _{j}\\left( y_{j}-\\widehat {y}_{j}\\right) ^{2}\\right) } {\\partial a_{j}^{3}}\\cdot \\sigma'\\left( z_{j}^{3}\\right) \\\\& = \\left(y_{j}-a_{j}^3 \\right) \\cdot \\sigma'\\left( z_{j}^{3} \\right)\\quad\\left( j=1,2,\\ldots ,p\\right)\\quad\\left( k=1,2,\\ldots ,p\\right)\\end{align*} " 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "隐藏层误差\\begin{align*} & \\delta _{j}^{2}=\\dfrac {\\partial C_{x}} {\\partial z_{j}^{2}}\\\\ & =\\sum _{k}\\dfrac {\\partial C_{x}} {\\partial z_{k}^{3}}\\cdot \\dfrac {\\partial z_{k}^{3}} {\\partial z_{j}^{2}} \\\\ & = \\sum _{k} \\dfrac {\\partial z_{k}^{3}} {\\partial z_{j}^{2}}\\cdot\\delta _{k}^{3}\\\\ & = \\sum _{k} \\dfrac {\\partial \\left( \\sum _{j}w_{kj}^{3}\\cdot a_{j}^{2}+b_{k}^{3}\\right)} {\\partial z_{j}^{2}}\\cdot\\delta _{k}^{3}\\\\ & = \\sum _{k} \\dfrac {\\partial \\left( \\sum _{j}w_{kj}^{3}\\cdot \\sigma \\left( z_{j}^{2}\\right)+b_{k}^{3}\\right)} {\\partial z_{j}^{2}}\\cdot\\delta _{k}^{3}\\\\ & = \\sum _{k} w_{kj}^{3}\\cdot \\sigma '\\left( z_{j}^{2}\\right) \\cdot\\delta _{k}^{3} \\\\ & = \\sigma '\\left( z_{j}^{2}\\right) \\cdot\\sum _{k} w_{kj}^{3} \\delta _{k}^{3} \\quad\\left( j=1,2,\\ldots ,m\\right)\\quad\\left( k=1,2,\\ldots ,p\\right)\\end{align*} " 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": { 85 | "collapsed": true 86 | }, 87 | "source": [ 88 | "经验损失在隐藏层(层2)/输出层(层3)关于偏置的梯度 \\begin{align*} & \\dfrac {\\partial C_{x}} {\\partial b_{j}^{l}}=\\dfrac {\\partial C_{x}} {\\partial z_{j}^{l}}\\cdot \\dfrac {\\partial z_{j}^{l}} {\\partial b_{j}^{l}}=\\delta _{j}^{l}\\cdot \\dfrac {\\partial \\left( \\sum _{k}w_{jk}^{l}a_{k}^{l-1}+b_{j}^{l}\\right) } {\\partial b_{j}^{l}}=\\delta _{j}^{l}\\quad\\left( l=2,3\\right)\\end{align*} " 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "经验损失在隐藏层(层2)/输出层(层3)关于权值的梯度\\begin{align*} & \\dfrac {\\partial C_{x}} {\\partial w_{jk}^{l}}=\\dfrac {\\partial C_{x}} {\\partial z_{j}^{l}}\\cdot \\dfrac {\\partial z_{j}^{l}} {\\partial w_{jk}^{l}}=\\delta _{j}^{l}\\cdot \\dfrac {\\partial \\left( \\sum _{k}w_{jk}^{l}a_{k}^{l-1}+b_{j}^{l}\\right) } {\\partial w_{jk}^{l}}=\\delta _{j}^{l}\\cdot a_{k}^{l-1}\\quad\\left( l=2,3\\right)\\end{align*} " 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "误差反向传播算法: \n", 103 | "1. 输入x: 为输入层设置对应的激活值$a^{1}$; \n", 104 | "2. 前向传播:对每个l(l=2,3)计算\\begin{align*} &a_{j}^{l}=\\sigma \\left( z_{j}^{l}\\right) \\\\ & z_{j}^{l}= \\sum _{k}w_{jk}^{l}\\cdot a_{k}^{l-1}+b_{j}^{l}\\end{align*} \n", 105 | "3. 输出层误差$\\delta _{j}^{3}$; \n", 106 | "4. 反向误差传播:隐藏层误差$\\delta _{j}^{2}$; \n", 107 | "5. 输出:经验损失在隐藏层(层2)/输出层(层3)关于偏置及权值的梯度$\\dfrac {\\partial C_{x}} {\\partial b_{j}^{l}}$和$\\dfrac {\\partial C_{x}} {\\partial w_{jk}^{l}}$。" 108 | ] 109 | } 110 | ], 111 | "metadata": { 112 | "kernelspec": { 113 | "display_name": "Python 2", 114 | "language": "python", 115 | "name": "python2" 116 | }, 117 | "language_info": { 118 | "codemirror_mode": { 119 | "name": "ipython", 120 | "version": 2 121 | }, 122 | "file_extension": ".py", 123 | "mimetype": "text/x-python", 124 | "name": "python", 125 | "nbconvert_exporter": "python", 126 | "pygments_lexer": "ipython2", 127 | "version": "2.7.11" 128 | } 129 | }, 130 | "nbformat": 4, 131 | "nbformat_minor": 0 132 | } 133 | -------------------------------------------------------------------------------- /4 nb.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "训练数据集\n", 8 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 9 | "由$P \\left( X, Y \\right)$独立同分布产生。其中,$x_{i} \\in \\mathcal{X} \\subseteq R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ c_{1}, c_{2}, \\cdots, c_{K} \\right\\}, i = 1, 2, \\cdots, N$,$x_{i}$为第$i$个特征向量(实例),$y_{i}$为$x_{i}$的类标记,\n", 10 | "$X$是定义在输入空间$\\mathcal{X}$上的随机向量,$Y$是定义在输出空间$\\mathcal{Y}$上的随机变量。$P \\left( X, Y \\right)$是$X$和$Y$的联合概率分布。" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "条件独立性假设\n", 18 | "\\begin{align*} \\\\& P \\left( X = x | Y = c_{k} \\right) = P \\left( X^{\\left( 1 \\right)} = x^{\\left( 1 \\right)} , \\cdots, X^{\\left( n \\right)} = x^{\\left( n \\right)} | Y = c_{k}\\right) \n", 19 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad = \\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right) \\end{align*} \n", 20 | "即,用于分类的特征在类确定的条件下都是条件独立的。" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "由\n", 28 | "\\begin{align*} \\\\& P \\left( X = x, Y = c_{k} \\right) = P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right)\n", 29 | "\\\\ & P \\left( X = x, Y = c_{k} \\right) = P \\left( Y = c_{k}| X = x \\right) P \\left( X = x \\right)\\end{align*} \n", 30 | "得\n", 31 | "\\begin{align*} \\\\& P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right) = P \\left( Y = c_{k}| X = x \\right) P \\left( X = x \\right)\n", 32 | "\\\\ & P \\left( Y = c_{k}| X = x \\right) = \\dfrac{P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right)}{P \\left( X = x \\right)} \n", 33 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad = \\dfrac{P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right)}{\\sum_{Y} P \\left( X = x, Y = c_{k} \\right)}\n", 34 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad = \\dfrac{P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right)}{\\sum_{Y} P \\left(X = x | Y = c_{k} \\right) P \\left( Y = c_{k} \\right)}\n", 35 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad = \\dfrac{ P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right)}{\\sum_{Y} P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right)}\\end{align*} \n", 36 | "朴素贝叶斯分类器可表示为\n", 37 | "\\begin{align*} \\\\& y = f \\left( x \\right) = \\arg \\max_{c_{k}} \\dfrac{ P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right)}{\\sum_{Y} P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right)}\n", 38 | "\\\\ & \\quad\\quad\\quad = \\arg \\max_{c_{k}} P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right)\\end{align*} " 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "朴素贝叶斯模型参数的极大似然估计 \n", 46 | "1. 先验概率$P \\left( Y = c_{k} \\right)$的极大似然估计 \n", 47 | "\\begin{align*} \\\\& P \\left( Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right)}{N} \\quad k = 1, 2, \\cdots, K\\end{align*} \n", 48 | "2. 设第$j$个特征$x^{\\left( j \\right)}$可能取值的集合为$\\left\\{ a_{j1}, a_{j2}, \\cdots, a_{j S_{j}} \\right\\}$,条件概率$P \\left( X^{\\left( j \\right)} = a_{jl} | Y = c_{k} \\right)$的极大似然估计\n", 49 | "\\begin{align*} \\\\& P \\left( X^{\\left( j \\right)} = a_{jl} | Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left(x_{i}^{\\left( j \\right)}=a_{jl}, y_{i} = c_{k} \\right)}{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right)}\n", 50 | "\\\\ & j = 1, 2, \\cdots, n;\\quad l = 1, 2, \\cdots, S_{j};\\quad k = 1, 2, \\cdots, K\\end{align*} \n", 51 | "其中,$x_{i}^{\\left( j \\right)}$是第$i$个样本的第$j$个特征;$a_{jl}$是第$j$个特征可能取的第$l$个值;$I$是指示函数。" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "朴素贝叶斯算法: \n", 59 | "输入:线性可分训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i}= \\left( x_{i}^{\\left(1\\right)},x_{i}^{\\left(2\\right)},\\cdots, x_{i}^{\\left(n\\right)} \\right)^{T}$,$x_{i}^{\\left( j \\right)}$是第$i$个样本的第$j$个特征,$x_{i}^{\\left( j \\right)} \\in \\left\\{ a_{j1}, a_{j2}, \\cdots, a_{j S_{j}} \\right\\}$,$a_{jl}$是第$j$个特征可能取的第$l$个值,$j = 1, 2, \\cdots, n; l = 1, 2, \\cdots, S_{j},y_{i} \\in \\left\\{ c_{1}, c_{2}, \\cdots, c_{K} \\right\\}$;实例$x$; \n", 60 | "输出:实例$x$的分类\n", 61 | "1. 计算先验概率及条件概率\n", 62 | "\\begin{align*} \\\\ & P \\left( Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right)}{N} \\quad k = 1, 2, \\cdots, K\n", 63 | "\\\\ & P \\left( X^{\\left( j \\right)} = a_{jl} | Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left(x_{i}^{\\left( j \\right)}=a_{jl}, y_{i} = c_{k} \\right)}{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right)}\n", 64 | "\\\\ & j = 1, 2, \\cdots, n;\\quad l = 1, 2, \\cdots, S_{j};\\quad k = 1, 2, \\cdots, K\\end{align*} \n", 65 | "2. 对于给定的实例$x=\\left( x^{\\left( 1 \\right)}, x^{\\left( 2 \\right)}, \\cdots, x^{\\left( n \\right)}\\right)^{T}$,计算\n", 66 | "\\begin{align*} \\\\ & P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right) \\quad k=1,2,\\cdots,K\\end{align*} \n", 67 | "3. 确定实例$x$的类别 \n", 68 | "\\begin{align*} \\\\& y = f \\left( x \\right) = \\arg \\max_{c_{k}} P \\left( Y = c_{k} \\right)\\prod_{j=1}^{n} P \\left( X^{\\left( j \\right)} = x^{\\left( j \\right)} | Y = c_{k} \\right) \\end{align*} " 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "朴素贝叶斯模型参数的贝叶斯估计 \n", 76 | "1. 条件概率的贝叶斯估计\n", 77 | "\\begin{align*} \\\\& P_{\\lambda} \\left( X^{\\left( j \\right)} = a_{jl} | Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left(x_{i}^{\\left( j \\right)}=a_{jl}, y_{i} = c_{k} \\right) + \\lambda}{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right) + S_{j} \\lambda} \\end{align*} \n", 78 | "式中$\\lambda \\geq 0$。当$\\lambda = 0$时,是极大似然估计;当$\\lambda = 1$时,称为拉普拉斯平滑。 \n", 79 | "2. 先验概率的贝叶斯估计\n", 80 | "\\begin{align*} \\\\& P \\left( Y = c_{k} \\right) = \\dfrac{\\sum_{i=1}^{N} I \\left( y_{i} = c_{k} \\right) + \\lambda}{N + K \\lambda}\\end{align*} " 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": { 87 | "collapsed": true 88 | }, 89 | "outputs": [], 90 | "source": [] 91 | } 92 | ], 93 | "metadata": { 94 | "kernelspec": { 95 | "display_name": "Python 2", 96 | "language": "python", 97 | "name": "python2" 98 | }, 99 | "language_info": { 100 | "codemirror_mode": { 101 | "name": "ipython", 102 | "version": 2 103 | }, 104 | "file_extension": ".py", 105 | "mimetype": "text/x-python", 106 | "name": "python", 107 | "nbconvert_exporter": "python", 108 | "pygments_lexer": "ipython2", 109 | "version": "2.7.11" 110 | } 111 | }, 112 | "nbformat": 4, 113 | "nbformat_minor": 0 114 | } 115 | -------------------------------------------------------------------------------- /9.2 gmm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "高斯混合模型\\begin{align*} \\\\& P \\left( y | \\theta \\right) = \\sum_{k=1}^{K} \\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) \\end{align*} \n", 8 | "其中,$\\alpha_{k}$是系数,$\\alpha_{k} \\geq 0 $,$\\sum_{k=1}^{K} \\alpha_{k} = 1$; $\\phi \\left( y | \\theta_{k} \\right)$是高斯分布密度,$\\theta_{k} = \\left( \\mu_{k} , \\sigma_{k}^{2} \\right)$,\\begin{align*} \\\\& \\phi \\left( y | \\theta_{k} \\right) = \\dfrac{1}{\\sqrt{2 \\pi} \\sigma_{k}} \\exp \\left( - \\dfrac{\\left( y - \\mu_{k} \\right)^2}{2 \\sigma_{k}^{2}} \\right)\\end{align*} 称为第$k$个分模型。" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "假设观测数据$\\left( y_{1}, y_{2}, \\cdots, y_{N} \\right)$由高斯混合模型\\begin{align*} \\\\& P \\left( y | \\theta \\right) = \\sum_{k=1}^{K} \\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) \\end{align*} \n", 16 | "生成,其中,$\\theta = \\left( \\alpha_{1}, \\alpha_{2}, \\cdots, \\alpha_{K}; \\theta_{1}, \\theta_{2}, \\cdots, \\theta_{K}\\right)$是模型参数。" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "隐变量$\\gamma_{jk}$是0-1变量,表示观测数据$y_{j}$来自第$k$个分模型\\begin{align*} \\\\& \\gamma_{jk} = \\begin{cases} 1,第j个观测数据来自第k个分模型\\\\ 0,否则\\end{cases} \\quad \\quad \\quad \\quad \\quad \\left( j = 1, 2, \\cdots, N; k = 1, 2, \\cdots, K \\right)\\end{align*}" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "完全数据\\begin{align*} \\\\& \\left( y_{j}, \\gamma_{j1}, \\gamma_{j2}, \\cdots, \\gamma_{jk}\\right) \\quad j = 1,2, \\cdots, N\\end{align*} " 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "完全数据似然函数\\begin{align*} \\\\& P \\left( y, \\gamma | \\theta \\right) = \\prod_{j=1}^{N} P \\left( y_{j}, \\gamma_{j1}, \\gamma_{j2}, \\cdots, \\gamma_{jK} | \\theta \\right) \\\\ & = \\prod_{k=1}^{K} \\prod_{j=1}^{N} \\left[ \\alpha_{k} \\phi \\left( y_{j} | \\theta_{k} \\right)\\right]^{\\gamma_{jk}} \\\\ & = \\prod_{k=1}^{K} \\alpha_{k}^{n_{k}}\\prod_{j=1}^{N} \\left[ \\phi \\left( y_{j} | \\theta_{k} \\right)\\right]^{\\gamma_{jk}} \\\\& = \\prod_{k=1}^{K} \\alpha_{k}^{n_{k}}\\prod_{j=1}^{N} \\left[ \\dfrac{1}{\\sqrt{2 \\pi} \\sigma_{k}} \\exp \\left( - \\dfrac{\\left( y - \\mu_{k} \\right)^2}{2 \\sigma_{k}^{2}} \\right) \\right]^{\\gamma_{jk}} \\end{align*} \n", 38 | "式中,$n_{k} = \\sum_{j=1}^{N} \\gamma_{jk}$。" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "完全数据的对数似然函数\\begin{align*} \\\\& \\log P \\left( y, \\gamma | \\theta \\right) \n", 46 | "= \\sum_{k=1}^{K} \\left\\{ \\sum_{j=1}^{K} \\gamma_{jk} \\log \\alpha_{k} + \\sum_{j=1}^{K} \\gamma_{jk}\\left[ \\log \\left( \\dfrac{1}{ \\sqrt{2 \\pi} } \\right) - \\log \\sigma_{k} - \\dfrac{1}{ 2 \\sigma_{k}^{2} } \\left( y_{j} - \\mu_{k} \\right)^{2} \\right]\\right\\} \\end{align*} " 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "$Q\\left( \\theta, \\theta^{\\left( i \\right)} \\right)$函数 \\begin{align*} \\\\& Q \\left( \\theta , \\theta^{\\left( i \\right)} \\right) \n", 54 | "= E \\left[ \\log P \\left( y, \\gamma | \\theta \\right) | y, \\theta^{ \\left( i \\right) }\\right] \n", 55 | "\\\\ & = E \\left\\{ \\sum_{k=1}^{K} \\left\\{ \\sum_{j=1}^{K} \\gamma_{jk} \\log \\alpha_{k} + \\sum_{j=1}^{K} \\gamma_{jk}\\left[ \\log \\left( \\dfrac{1}{ \\sqrt{2 \\pi} } \\right) - \\log \\sigma_{k} - \\dfrac{1}{ 2 \\sigma_{k}^{2} } \\left( y_{j} - \\mu_{k} \\right)^{2} \\right]\\right\\}\\right\\} \n", 56 | "\\\\ & = \\sum_{k=1}^{K} \\left\\{ \\sum_{j=1}^{K} E \\left( \\gamma_{jk} \\right) \\log \\alpha_{k} + \\sum_{j=1}^{K} E \\left( \\gamma_{jk} \\right)\\left[ \\log \\left( \\dfrac{1}{ \\sqrt{2 \\pi} } \\right) - \\log \\sigma_{k} - \\dfrac{1}{ 2 \\sigma_{k}^{2} } \\left( y_{j} - \\mu_{k} \\right)^{2} \\right]\\right\\} \n", 57 | "\\\\ & =\\sum_{k=1}^{K} \\left\\{ \\sum_{j=1}^{K} \\hat{\\gamma}_{jk} \\log \\alpha_{k} + \\sum_{j=1}^{K} \\hat{\\gamma}_{jk}\\left[ \\log \\left( \\dfrac{1}{ \\sqrt{2 \\pi} } \\right) - \\log \\sigma_{k} - \\dfrac{1}{ 2 \\sigma_{k}^{2} } \\left( y_{j} - \\mu_{k} \\right)^{2} \\right]\\right\\} \\end{align*} " 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "其中,分模型$k$对观测数据$y_{j}$的响应度$\\hat{\\gamma}_{jk}$是在当前模型参数下第$j$个观测数据来自第$k$个分模型的概率。\n", 65 | "\\begin{align*} \\\\& \\hat{\\gamma}_{jk} = E \\left( \\gamma_{jk} | y, \\theta \\right) = P \\left( \\gamma_{jk} = 1 | y, \\theta \\right) \n", 66 | "\\\\ & = \\dfrac{P \\left( \\gamma_{jk} = 1, y_{j} | \\theta \\right)}{ \\sum_{k=1}^{K} P \\left( \\gamma_{jk} = 1, y_{j} | \\theta \\right)}\n", 67 | "\\\\ & = \\dfrac{\\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) }{\\sum_{k=1}^{K} \\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) } \\quad \\quad \\quad \\left( j = 1, 2, \\cdots, N; k = 1, 2, \\cdots, K \\right) \\end{align*} " 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "求$Q\\left( \\theta, \\theta^{\\left( i \\right)} \\right)$函数对$\\theta$的极大值\n", 75 | "\\begin{align*} \\theta^{\\left( i+1 \\right)} = \\arg \\max Q\\left(\\theta, \\theta^\\left( i \\right) \\right) \\end{align*} \n", 76 | "得 \\begin{align*} \\\\ & \\hat{\\mu}_{k} = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} y_{j}}{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk}}, \\quad k = 1, 2, \\cdots, K \n", 77 | "\\\\ & \\hat{\\sigma}_{k}^2 = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} \\left( y_{j} - \\mu_{k}\\right)^2}{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk}}, \\quad k = 1, 2, \\cdots, K\n", 78 | "\\\\ & \\hat{\\alpha}_{k} = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} }{N}, \\quad k = 1, 2, \\cdots, K\\end{align*} " 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "高斯混合模型参数估计得EM算法: \n", 86 | "输入:观测数据$y_{1}, y_{2}, \\cdots, y_{N}$,高斯混合模型; \n", 87 | "输出:高斯混合模型参数\n", 88 | "1. 取参数的初始值开始迭代 \n", 89 | "2. $E$步:计算分模型$k$对观测数据$y_{i}$的响应度\n", 90 | "\\begin{align*} \\\\& \\hat{\\gamma}_{jk} = \\dfrac{\\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) }{\\sum_{k=1}^{K} \\alpha_{k} \\phi \\left( y | \\theta_{k} \\right) } \\quad \\quad \\quad j = 1, 2, \\cdots, N; k = 1, 2, \\cdots, K \n", 91 | " \\end{align*} \n", 92 | "3. $M$步:计算新迭代的模型参数\n", 93 | "\\begin{align*} \\\\ & \\hat{\\mu}_{k} = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} y_{j}}{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk}}, \\quad k = 1, 2, \\cdots, K \n", 94 | "\\\\ & \\hat{\\sigma}_{k}^2 = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} \\left( y_{j} - \\mu_{k}\\right)^2}{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk}}, \\quad k = 1, 2, \\cdots, K\n", 95 | "\\\\ & \\hat{\\alpha}_{k} = \\dfrac{\\sum_{j=1}^{N} \\hat{\\gamma}_{jk} }{N}, \\quad k = 1, 2, \\cdots, K\\end{align*} \n", 96 | "4. 重复2.步和3.步,直到收敛。" 97 | ] 98 | } 99 | ], 100 | "metadata": { 101 | "kernelspec": { 102 | "display_name": "Python 2", 103 | "language": "python", 104 | "name": "python2" 105 | }, 106 | "language_info": { 107 | "codemirror_mode": { 108 | "name": "ipython", 109 | "version": 2 110 | }, 111 | "file_extension": ".py", 112 | "mimetype": "text/x-python", 113 | "name": "python", 114 | "nbconvert_exporter": "python", 115 | "pygments_lexer": "ipython2", 116 | "version": "2.7.11" 117 | } 118 | }, 119 | "nbformat": 4, 120 | "nbformat_minor": 0 121 | } 122 | -------------------------------------------------------------------------------- /C lagrange.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "假设$f \\left( x \\right), c_{i} \\left( x \\right), h_{j} \\left( x \\right)$是定义在$R^{n}$上的连续可微函数。 \n", 10 | "称约束最优化问题\n", 11 | "\\begin{align*} \\\\& \\min_{x \\in R^{n}} \\quad f \\left( x \\right) \n", 12 | "\\\\ & s.t. \\quad c_{i} \\left( x \\right) \\leq 0, \\quad i = 1,2, \\cdots, k\n", 13 | "\\\\ & \\quad \\quad h_{j} \\left( x \\right) = 0, \\quad j=1,2, \\cdots, l\\end{align*} \n", 14 | "为原始最优化问题或原始问题。" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "引入拉格朗日函数\n", 22 | "\\begin{align*} \\\\& L \\left( x, \\alpha, \\beta \\right) = f \\left( x \\right) + \\sum_{i=1}^{k} \\alpha_{i} c_{i} \\left( x \\right) + \\sum_{j=1}^{l} \\beta_{j} h_{j}\\left( x \\right) \\end{align*} \n", 23 | "其中,$x=\\left(x^{\\left( 1 \\right)}, x^{\\left( 2 \\right)}, \\cdots, x^{\\left( n \\right) } \\right)^{T} \\in R^{n}, \\alpha_{i}, \\beta_{j}$是拉格朗日乘子,$\\alpha_{i} \\geq 0$。" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "构建关于$x$的函数\n", 31 | "\\begin{align*} \\\\& \\theta_{P} \\left( x \\right) = \\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} L \\left( x, \\alpha, \\beta \\right) \\end{align*} \n", 32 | "假设给定某个违反原始问题约束条件的$x$,即存在某个$i$使得$c_{i} \\left( x \\right) > 0$或$h_{j} \\left( x \\right) \\neq 0$。若$c_{i} \\left( x \\right) > 0$,可令$\\alpha_{i} \\to +\\infty$,使得$\\theta_{P} \\left( x \\right)=+\\infty$;若$h_{j} \\left( x \\right) \\neq 0$,可令$\\beta_{j}$使$ \\beta_{j} h_{j} \\left( x \\right) \\to +\\infty$,使得$\\theta_{P} \\left( x \\right)=+\\infty$。将其余$\\alpha_{i}, \\beta_{j}$均取值为0。 \n", 33 | "即\n", 34 | "\\begin{align*} \\\\& \\theta_{P} \\left( x \\right) = \\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} \\left[ f \\left( x \\right) + \\sum_{i=1}^{k} \\alpha_{i} c_{i} \\left( x \\right) + \\sum_{j=1}^{l} \\beta_{j} h_{j}\\left( x \\right)\\right] = +\\infty \\end{align*} \n", 35 | "假设给定某个符合原始问题约束条件的$x$,即$c_{i} \\left( x \\right) \\leq 0$且$h_{j} \\left( x \\right) = 0$, \n", 36 | "则\n", 37 | "\\begin{align*} \\\\& \\theta_{P} \\left( x \\right) =\\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} \\left[ f \\left( x \\right) + \\sum_{i=1}^{k} \\alpha_{i} c_{i} \\left( x \\right) + \\sum_{j=1}^{l} \\beta_{j} h_{j}\\left( x \\right)\\right]= f \\left( x \\right) \\end{align*} \n", 38 | "由以上,得\n", 39 | "\\begin{align*} \\theta_{P} \\left( x \\right) = \\left\\{\n", 40 | "\\begin{aligned} \n", 41 | "\\ & f \\left( x \\right), x满足原始问题约束\n", 42 | "\\\\ & +\\infty, 否则\n", 43 | "\\end{aligned}\n", 44 | "\\right.\\end{align*} \n", 45 | "则极小化问题\n", 46 | "\\begin{align*} \\\\& \\min_{x} \\theta_{P} \\left( x \\right) = \\min_{x} \\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} L \\left( x, \\alpha, \\beta \\right)\\end{align*} \n", 47 | "与原始最优化问题等价,即有相同的解。" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "\\begin{align*} \\\\& \\min_{x} \\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} L \\left( x, \\alpha, \\beta \\right)\\end{align*} 称为广义拉格朗日函数的极小极大问题。" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "定义原始问题的最优值\n", 62 | "\\begin{align*} \\\\& p^{*} = \\min_{x} \\theta_{P} \\left( x \\right) \\end{align*} \n", 63 | "称为原始问题的值。" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "构建关于$\\alpha, \\beta$的函数\n", 71 | "\\begin{align*} \\\\& \\theta_{D} \\left( \\alpha, \\beta \\right) = \\min_{x} L \\left( x, \\alpha, \\beta \\right)\\end{align*} \n", 72 | "则极大化问题\n", 73 | "\\begin{align*} \\\\& \\max_{\\alpha,\\beta;\\alpha_{i} \\geq 0} \\theta_{D} \\left( \\alpha, \\beta \\right) = \\max_{\\alpha,\\beta;\\alpha_{i} \\geq 0} \\min_{x} L \\left( x, \\alpha, \\beta \\right) \\end{align*} \n", 74 | "称为广义拉格朗日函数的极大极小问题。" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "collapsed": true 81 | }, 82 | "source": [ 83 | "将广义拉格朗日函数的极大极小问题表示为约束最优化问题\n", 84 | "\\begin{align*} \\\\& \\max_{\\alpha,\\beta;\\alpha_{i} \\geq 0} \\theta_{D} \\left( \\alpha, \\beta \\right) = \\max_{\\alpha,\\beta;\\alpha_{i} \\geq 0} \\min_{x} L \\left( x, \\alpha, \\beta \\right) \n", 85 | "\\\\ & \\quad s.t. \\quad \\alpha_{i} \\geq 0, \\quad i =1,2, \\cdots, k \\end{align*} \n", 86 | "称为原始问题的对偶问题。" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "定义对偶问题的最优值\n", 94 | "\\begin{align*} \\\\& d^{*} = \\max_{\\alpha, \\beta;\\alpha_{i} \\geq 0} \\theta_{D} \\left( \\alpha, \\beta \\right) \\end{align*} \n", 95 | "称为对偶问题的值。" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": { 101 | "collapsed": true 102 | }, 103 | "source": [ 104 | "若原始问题与对偶问题都有最优解,则\n", 105 | "\\begin{align*} \\\\& d^{*} = \\max_{\\alpha,\\beta;\\alpha_{i} \\geq 0} \\min_{x} L \\left( x, \\alpha, \\beta \\right) \\leq \\min_{x} \\max_{\\alpha, \\beta; \\alpha_{i} \\geq 0} L \\left( x, \\alpha, \\beta \\right) = p^{*}\\end{align*}" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "对于原始问题及其对偶问题,假设函数$f \\left( x \\right)$和$c_{i} \\left( x \\right)$是凸函数,$h_{j} \\left( x \\right)$是仿射函数,且不等式约束$c_{i} \\left( x \\right)$是严格可行的,即存在$x$,对所有$i$有$c_{i} \\left( x \\right) < 0$,则存在$x^{*}, \\alpha^{*}, \\beta^{*}$,使$x^{*}$是原始问题的解,$\\alpha^{*}, \\beta^{*}$是对偶问题的解,并且\n", 113 | "\\begin{align*} \\\\& p^{*}=d^{*} = L \\left( x^{*}, \\alpha^{*}, \\beta^{*} \\right) \\end{align*}" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "对于原始问题及其对偶问题,假设函数$f \\left( x \\right)$和$c_{i} \\left( x \\right)$是凸函数,$h_{j} \\left( x \\right)$是仿射函数,且不等式约束$c_{i} \\left( x \\right)$是严格可行的,即存在$x$,对所有$i$有$c_{i} \\left( x \\right) < 0$,则存在$x^{*}, \\alpha^{*}, \\beta^{*}$,使$x^{*}$是原始问题的解,$\\alpha^{*}, \\beta^{*}$是对偶问题的解的充分必要条件是$x^{*}, \\alpha^{*}, \\beta^{*} $满足下面的Karush-Kuhn-Tucker(KKT)条件:\n", 121 | "\\begin{align*} \\\\& \\nabla _{x} L \\left( x^{*}, \\alpha^{*}, \\beta^{*} \\right) = 0 \n", 122 | "\\\\ & \\nabla _{\\alpha} L \\left( x^{*}, \\alpha^{*}, \\beta^{*} \\right) = 0 \n", 123 | "\\\\ & \\nabla _{\\beta} L \\left( x^{*}, \\alpha^{*}, \\beta^{*} \\right) = 0 \n", 124 | "\\\\ & \\alpha_{i}^{*} c_{i} \\left( x^{*} \\right) = 0,\\quad i= 1, 2, \\cdots, k \n", 125 | "\\\\ & c_{i} \\left( x^{*} \\right) \\leq 0, \\quad i=1,2, \\cdots, k\n", 126 | "\\\\ & \\alpha_{i}^{*} \\geq 0, \\quad i=1,2, \\cdots, k\n", 127 | "\\\\ & h_{j} \\left( x^{*} \\right) = 0, \\quad j=1,2, \\cdots, l\\end{align*}" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": { 134 | "collapsed": true 135 | }, 136 | "outputs": [], 137 | "source": [] 138 | } 139 | ], 140 | "metadata": { 141 | "kernelspec": { 142 | "display_name": "Python 2", 143 | "language": "python", 144 | "name": "python2" 145 | }, 146 | "language_info": { 147 | "codemirror_mode": { 148 | "name": "ipython", 149 | "version": 2 150 | }, 151 | "file_extension": ".py", 152 | "mimetype": "text/x-python", 153 | "name": "python", 154 | "nbconvert_exporter": "python", 155 | "pygments_lexer": "ipython2", 156 | "version": "2.7.11" 157 | } 158 | }, 159 | "nbformat": 4, 160 | "nbformat_minor": 0 161 | } 162 | -------------------------------------------------------------------------------- /2 perceptron.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "假设输入空间$\\mathcal{X} \\subseteq R^{n}$,输出空间$\\mathcal{Y} = \\left\\{+1, -1 \\right\\}$。输入$x \\in \\mathcal{X}$表示实例的特征向量,对应于输入空间的点;输出$y \\in \\mathcal{Y}$表示实例的类别。由输入空间到输出空间的函数\n", 10 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w \\cdot x + b \\right) \\end{align*} \n", 11 | "称为感知机。其中,$w$和$b$为感知机模型参数,$w \\in R^{n}$叫做权值或权值向量,$b \\in R$叫偏置,$w \\cdot x$表示$w$和$x$的内积。$sign$是符号函数,即 \n", 12 | "\\begin{align*} sign \\left( x \\right) = \\left\\{\n", 13 | "\\begin{aligned} \n", 14 | "\\ & +1, x \\geq 0\n", 15 | "\\\\ & -1, x<0\n", 16 | "\\end{aligned}\n", 17 | "\\right.\\end{align*} " 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "感知机是一种线性分类模型,属于判别模型。感知机模型的假设空间是定义在特征空间中的所有线性分类模型或线性分类器,即函数集合$\\left\\{ f | f \\left( x \\right) = w \\cdot x + b \\right\\}$。" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "线性方程\n", 32 | "\\begin{align*} \\\\& w \\cdot x + b = 0 \\end{align*} \n", 33 | "对应于特征空间$R^{n}$中的一个超平面$S$,其中$w$是超平面的法向量,$b$是超平面的截距。超平面$S$将特征空间划分为两部分,位于其中的点被分为正、负两类,超平面$S$称为分离超平面。" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "给定数据集\n", 41 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 42 | "其中,$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$,如果存在某个超平面$S$\n", 43 | "\\begin{align*} \\\\& w \\cdot x + b = 0 \\end{align*} \n", 44 | "能够将数据集的正实例和负实例完全正确地划分到超平面的两侧,即对所有$y_{i}=+1$的实例$x_{i}$,有$w \\cdot x_{i} + b > 0$,对所有$y_{i}=-1$的实例$x_{i}$,有$w \\cdot x_{i} + b < 0$,则称数据集$T$为线性可分数据集;否则,称数据集$T$线性不可分。" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "输入空间$R^{n}$中的任一点$x_{0}$到超平面$S$的距离:\n", 52 | "\\begin{align*} \\\\& \\dfrac{1}{\\| w \\|} \\left| w \\cdot x_{0} + b \\right| \\end{align*} \n", 53 | "其中$\\| w \\|$是$w$的$L_{2}$范数。" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "对于误分类数据$\\left( x_{i}, y_{i} \\right)$,当$w \\cdot x + b > 0$时,$y_{i}=-1$,当$w \\cdot x + b < 0$时,$y_{i}=+1$,有\n", 61 | "\\begin{align*} \\\\& -y_{i} \\left( w \\cdot x_{i} + b \\right) > 0 \\end{align*} \n", 62 | "误分类点$x_{i}$到分离超平面的距离:\n", 63 | "\\begin{align*} \\\\& -\\dfrac{1}{\\| w \\|} y_{i}\\left( w \\cdot x_{i} + b \\right) \\end{align*}" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "假设超平面$S$的误分类点集合为$M$,则所有误分类点到超平面$S$的总距离:\n", 71 | "\\begin{align*} \\\\& -\\dfrac{1}{\\| w \\|} \\sum_{x_{i} \\in M} y_{i} \\left( w \\cdot x_{i} + b \\right) \\end{align*}" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "给定训练数据集\n", 79 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 80 | "其中,$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$。感知机$sign \\left( w \\cdot x + b \\right)$的损失函数定义为\n", 81 | "\\begin{align*} \\\\& L \\left( w, b \\right) = -\\sum_{x_{i} \\in M} y_{i} \\left( w \\cdot x_{i} + b \\right) \\end{align*} \n", 82 | "其中,$M$为误分类点的集合。" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "给定训练数据集\n", 90 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 91 | "其中,$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$。求参数$w$和$b$,使其为以下损失函数极小化问题的解\n", 92 | "\\begin{align*} \\\\& \\min_{w,b} L \\left( w, b \\right) = -\\sum_{x_{i} \\in M} y_{i} \\left( w \\cdot x_{i} + b \\right) \\end{align*} \n", 93 | "其中,$M$为误分类点的集合。" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "假设误分类点集合$M$是固定的,则损失函数$L \\left( w, b \\right)$的梯度\n", 101 | "\\begin{align*} \\\\& \\nabla _{w} L \\left( w, b \\right) = -\\sum_{x_{i} \\in M} y_{i} x_{i} \n", 102 | "\\\\ & \\nabla _{b} L \\left( w, b \\right) = -\\sum_{x_{i} \\in M} y_{i} \\end{align*} " 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "随机选取一个误分类点$\\left( x_{i}, y_{i} \\right)$,对$w, b$进行更新:\n", 110 | "\\begin{align*} \\\\& w \\leftarrow w + \\eta y_{i} x_{i} \n", 111 | "\\\\ & b \\leftarrow b + \\eta y_{i} \\end{align*} \n", 112 | "其中,$\\eta \\left( 0 < \\eta \\leq 1 \\right)$是步长,称为学习率。" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "感知机算法(原始形式): \n", 120 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N $;学习率$\\eta \\left( 0 < \\eta \\leq 1 \\right)$。 \n", 121 | "输出:$w,b$;感知机模型$f \\left( x \\right) = sign \\left( w \\cdot x + b \\right)$\n", 122 | "1. 选取初值$w_{0},b_{0}$ \n", 123 | "2. 在训练集中选取数据$\\left( x_{i}, y_{i} \\right)$ \n", 124 | "3. 如果$y_{i} \\left( w \\cdot x_{i} + b \\right) \\leq 0$ \n", 125 | "\\begin{align*} \\\\& w \\leftarrow w + \\eta y_{i} x_{i} \n", 126 | "\\\\ & b \\leftarrow b + \\eta y_{i} \\end{align*} \n", 127 | "4. 转至2,直至训练集中没有误分类点。" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "设$w,b$修改n次,则$w,b$关于$\\left( x_{i}, y_{i} \\right)$的增量分别是$\\alpha_{i} y_{i} x_{i}$和$\\alpha_{i} y_{i}$,其中$\\alpha_{i} = n_{i} \\eta$。$w,b$可表示为\n", 135 | "\\begin{align*} \\\\& w = \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} \n", 136 | "\\\\ & b = \\sum_{i=1}^{N} \\alpha_{i} y_{i} \\end{align*} \n", 137 | "其中,$\\alpha_{i} \\geq 0, i=1,2, \\cdots, N$" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "感知机算法(对偶形式): \n", 145 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N $;学习率$\\eta \\left( 0 < \\eta \\leq 1 \\right)$。 \n", 146 | "输出:$\\alpha,b$;感知机模型$f \\left( x \\right) = sign \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} x_{j} \\cdot x + b \\right)$ ,其中$\\alpha = \\left( \\alpha_{1}, \\alpha_{2}, \\cdots, \\alpha_{N} \\right)^{T}$\n", 147 | "1. $\\alpha \\leftarrow 0, b \\leftarrow 0$ \n", 148 | "2. 在训练集中选取数据$\\left( x_{i}, y_{i} \\right)$ \n", 149 | "3. 如果$y_{i} \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} x_{j} \\cdot x_{i} + b \\right) \\leq 0$ \n", 150 | "\\begin{align*} \\\\& \\alpha_{i} \\leftarrow \\alpha_{i} + \\eta\n", 151 | "\\\\ & b \\leftarrow b + \\eta y_{i} \\end{align*} \n", 152 | "4. 转至2,直至训练集中没有误分类点。" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "对偶形式中训练实例仅以内积形式出现,可预先计算$Gram$矩阵存储实例间内积\n", 160 | "\\begin{align*} \\\\& G = \\left[ x_{i} \\cdot x_{j} \\right]_{N \\times N} \\end{align*} " 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": { 167 | "collapsed": true 168 | }, 169 | "outputs": [], 170 | "source": [] 171 | } 172 | ], 173 | "metadata": { 174 | "kernelspec": { 175 | "display_name": "Python 2", 176 | "language": "python", 177 | "name": "python2" 178 | }, 179 | "language_info": { 180 | "codemirror_mode": { 181 | "name": "ipython", 182 | "version": 2 183 | }, 184 | "file_extension": ".py", 185 | "mimetype": "text/x-python", 186 | "name": "python", 187 | "nbconvert_exporter": "python", 188 | "pygments_lexer": "ipython2", 189 | "version": "2.7.11" 190 | } 191 | }, 192 | "nbformat": 4, 193 | "nbformat_minor": 0 194 | } 195 | -------------------------------------------------------------------------------- /1 introduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第$j$个输入实例$x$的特征向量\n", 8 | "\\begin{align*} \\\\& x_{j} = \\left( x_{j}^{\\left(1\\right)},x_{j}^{\\left(2\\right)}, \\cdots, x_{j}^{\\left(i\\right)}, \\cdots, x_{j}^{\\left(n\\right)} \\right)^{T}, \\quad i=1,2,\\cdots,n; \\quad j=1,2,\\cdots,N \\end{align*} \n", 9 | "其中,$x_{j}^{\\left(i\\right)}$表示第$j$个输入实例的第$i$个特征。" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "监督学习的训练数据集合由输入(特征向量)与输出对组成\n", 17 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} " 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "假设空间$\\mathcal{F}$定义为决策函数的集合\n", 25 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ f | Y = f \\left( X \\right) \\right\\} \\end{align*}\n", 26 | "其中,$X$是定义在输入空间$\\mathcal{X}$上的变量,$Y$是定义在输入空间$\\mathcal{}$上的变量。" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "collapsed": true 33 | }, 34 | "source": [ 35 | "假设空间$\\mathcal{F}$通常是由一个参数向量决定的函数族\n", 36 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ f | Y = f_{\\theta} \\left( X \\right), \\theta \\in R^{n} \\right\\} \\end{align*}\n", 37 | "其中,参数向量$\\theta$取值于$n$维向量空间$R^{n}$,称为参数空间。" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "假设空间$\\mathcal{F}$也可定义为条件概率的集合\n", 45 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ P | P \\left( Y | X \\right) \\right\\} \\end{align*}\n", 46 | "其中,$X$是定义在输入空间$\\mathcal{X}$上的随机变量,$Y$是定义在输入空间$\\mathcal{}$上的随机变量。" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "假设空间$\\mathcal{F}$通常是由一个参数向量决定的概率分布族\n", 54 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ P | P_{\\theta} \\left( Y | X \\right), \\theta \\in R^{n} \\right\\} \\end{align*}\n", 55 | "其中,参数向量$\\theta$取值于$n$维向量空间$R^{n}$,称为参数空间。" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "损失函数(代价函数)来度量预测错误的程度,是预测输出$f\\left(X\\right)$和实际输出$Y$的非负实值函数,记作$L \\left(Y, f \\left( X \\right) \\right)$。" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "0-1损失函数\n", 70 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left\\{\n", 71 | "\\begin{aligned} \n", 72 | "\\ & 1, Y \\neq f \\left( X \\right)\n", 73 | "\\\\ & 0, Y = f \\left( X \\right)\n", 74 | "\\end{aligned}\n", 75 | "\\right.\\end{align*} " 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "平方损失函数\n", 83 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left( Y - f \\left( X \\right) \\right)^{2} \\end{align*} " 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "绝对值损失函数\n", 91 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left| Y - f \\left( X \\right) \\right| \\end{align*} " 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "绝对值损失函数(对数似然损失函数)\n", 99 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = - \\log P \\left( Y | X \\right) \\end{align*} " 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "风险损失(期望损失)是模型$f\\left(X\\right)$关于联合概率分布$P\\left(X,Y\\right)$的平均意义下的损失\n", 107 | "\\begin{align*} R_{exp} \\left( f \\right) = E_{P} \\left[L \\left(Y, f \\left( X \\right) \\right) \\right] = \\int_{\\mathcal{X} \\times \\mathcal{Y}} L \\left(Y, f \\left( X \\right) \\right) P \\left(x,y\\right) dxdy \\end{align*} " 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "经验风险(经验损失)是模型$f\\left(X\\right)$关于训练数据集\n", 115 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 116 | "的平均损失\n", 117 | "\\begin{align*} R_{emp} \\left( f \\right) = \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) \\end{align*} " 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "经验风险最小化\n", 125 | "\\begin{align*} \\min_{f \\in \\mathcal{F}} \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) \\end{align*}\n", 126 | "其中,$\\mathcal{F}$是假设空间。" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "结构风险最小化\n", 134 | "\\begin{align*} \\min_{f \\in \\mathcal{F}} \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) + \\lambda J \\left(f\\right) \\end{align*}\n", 135 | "其中,$J \\left(f\\right)$是模型复杂度,是增则化项,是定义在建设空间$\\mathcal{F}$上的泛函;$\\lambda \\geq 0$是系数,用以权衡风险和模型复杂度。" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "正则化项可以是参数向量的$L_{2}$范数\n", 143 | "\\begin{align*} L_{2} = \\| w \\|\\end{align*} \n", 144 | "其中,$\\|w\\|$表示参数向量$w$的$L_{2}$范数。 \n", 145 | "正则化项可以是参数向量的$L_{1}$范数\n", 146 | "\\begin{align*} L_{1} = \\| w \\|_{1} \\end{align*} \n", 147 | "其中,$\\|w\\|_{1}$表示参数向量$w$的$L_{1}$范数。" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "训练误差是模型$Y = \\hat f \\left(X\\right)$关于训练数据集的平均损失\n", 155 | "\\begin{align*} R_{emp} \\left( \\hat f \\right) = \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, \\hat f \\left( x_{i} \\right) \\right) \\end{align*} \n", 156 | "其中,$N$是训练样本容量。" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "测试误差是模型$Y = \\hat f \\left(X\\right)$关于测试数据集的平均损失\n", 164 | "\\begin{align*} e_{test} = \\dfrac{1}{N'} \\sum_{i=1}^{N'} L \\left(y_{i}, \\hat f \\left( x_{i} \\right) \\right) \\end{align*} \n", 165 | "其中,$N'$是测试样本容量。" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "当损失函数是0-1损失,测试误差即测试集上的误差率\n", 173 | "\\begin{align*} e_{test} = \\dfrac{1}{N‘} \\sum_{i=1}^{N’} I \\left( y_{i} \\neq \\hat f \\left(x_{i} \\right) \\right) \\end{align*} \n", 174 | "其中,$I$是指示函数,即$y \\neq \\hat f \\left( x \\right)$时为1,否则为0。" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "测试集上的准确率\n", 182 | "\\begin{align*} r_{test} = \\dfrac{1}{N‘} \\sum_{i=1}^{N’} I \\left( y_{i} = \\hat f \\left(x_{i} \\right) \\right) \\end{align*} \n", 183 | "则,$r_{test} + e_{test} = 1 $。" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "生成方法由数据学习联合概率分布$P\\left(X,Y\\right)$,然后求出条件概率分布$P\\left(Y|X\\right)$作为预测的模型,即生成模型\n", 191 | "\\begin{align*} P\\left(Y|X\\right) = \\dfrac{P\\left(X,Y\\right)}{P\\left(X\\right)}\\end{align*} \n", 192 | "判别方法由数据直接学习决策函数$f\\left(X\\right)$或者条件概率分布$P\\left(Y|X\\right)$作为预测的模型,即判别模型。" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "TP——将正类预测为正类;\n", 200 | "FN——将正类预测为负类;\n", 201 | "FP——将负类预测为正类;\n", 202 | "TN——将负类预测为负类。 \n", 203 | "精确率\n", 204 | "\\begin{align*} P = \\dfrac{TP}{TP+FP}\\end{align*} \n", 205 | "召回率\n", 206 | "\\begin{align*} R = \\dfrac{TP}{TP+FN}\\end{align*} \n", 207 | "$F_{1}$值是精确率和召回率的调和均值\n", 208 | "\\begin{align*} \\\\ & \\dfrac{2}{F_{1}} = \\dfrac{1}{P} + \\dfrac{1}{R} \n", 209 | "\\\\ & F_{1} = \\dfrac{2TP}{2TP+FP+FN}\\end{align*} " 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true 217 | }, 218 | "outputs": [], 219 | "source": [] 220 | } 221 | ], 222 | "metadata": { 223 | "kernelspec": { 224 | "display_name": "Python 3", 225 | "language": "python", 226 | "name": "python3" 227 | }, 228 | "language_info": { 229 | "codemirror_mode": { 230 | "name": "ipython", 231 | "version": 3 232 | }, 233 | "file_extension": ".py", 234 | "mimetype": "text/x-python", 235 | "name": "python", 236 | "nbconvert_exporter": "python", 237 | "pygments_lexer": "ipython3", 238 | "version": "3.7.1" 239 | } 240 | }, 241 | "nbformat": 4, 242 | "nbformat_minor": 1 243 | } 244 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/1 introduction-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第$j$个输入实例$x$的特征向量\n", 8 | "\\begin{align*} \\\\& x_{j} = \\left( x_{j}^{\\left(1\\right)},x_{j}^{\\left(2\\right)}, \\cdots, x_{j}^{\\left(i\\right)}, \\cdots, x_{j}^{\\left(n\\right)} \\right)^{T}, \\quad i=1,2,\\cdots,n; \\quad j=1,2,\\cdots,N \\end{align*} \n", 9 | "其中,$x_{j}^{\\left(i\\right)}$表示第$j$个输入实例的第$i$个特征。" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "监督学习的训练数据集合由输入(特征向量)与输出对组成\n", 17 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} " 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "假设空间$\\mathcal{F}$定义为决策函数的集合\n", 25 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ f | Y = f \\left( X \\right) \\right\\} \\end{align*}\n", 26 | "其中,$X$是定义在输入空间$\\mathcal{X}$上的变量,$Y$是定义在输入空间$\\mathcal{}$上的变量。" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "collapsed": true 33 | }, 34 | "source": [ 35 | "假设空间$\\mathcal{F}$通常是由一个参数向量决定的函数族\n", 36 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ f | Y = f_{\\theta} \\left( X \\right), \\theta \\in R^{n} \\right\\} \\end{align*}\n", 37 | "其中,参数向量$\\theta$取值于$n$维向量空间$R^{n}$,称为参数空间。" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "假设空间$\\mathcal{F}$也可定义为条件概率的集合\n", 45 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ P | P \\left( Y | X \\right) \\right\\} \\end{align*}\n", 46 | "其中,$X$是定义在输入空间$\\mathcal{X}$上的随机变量,$Y$是定义在输入空间$\\mathcal{}$上的随机变量。" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "假设空间$\\mathcal{F}$通常是由一个参数向量决定的概率分布族\n", 54 | "\\begin{align*} \\\\& \\mathcal{F} = \\left\\{ P | P_{\\theta} \\left( Y | X \\right), \\theta \\in R^{n} \\right\\} \\end{align*}\n", 55 | "其中,参数向量$\\theta$取值于$n$维向量空间$R^{n}$,称为参数空间。" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "损失函数(代价函数)来度量预测错误的程度,是预测输出$f\\left(X\\right)$和实际输出$Y$的非负实值函数,记作$L \\left(Y, f \\left( X \\right) \\right)$。" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "0-1损失函数\n", 70 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left\\{\n", 71 | "\\begin{aligned} \n", 72 | "\\ & 1, Y \\neq f \\left( X \\right)\n", 73 | "\\\\ & 0, Y = f \\left( X \\right)\n", 74 | "\\end{aligned}\n", 75 | "\\right.\\end{align*} " 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "平方损失函数\n", 83 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left( Y - f \\left( X \\right) \\right)^{2} \\end{align*} " 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "绝对值损失函数\n", 91 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = \\left| Y - f \\left( X \\right) \\right| \\end{align*} " 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "绝对值损失函数(对数似然损失函数)\n", 99 | "\\begin{align*} L \\left(Y, f \\left( X \\right) \\right) = - \\log P \\left( Y | X \\right) \\end{align*} " 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "风险损失(期望损失)是模型$f\\left(X\\right)$关于联合概率分布$P\\left(X,Y\\right)$的平均意义下的损失\n", 107 | "\\begin{align*} R_{exp} \\left( f \\right) = E_{P} \\left[L \\left(Y, f \\left( X \\right) \\right) \\right] = \\int_{\\mathcal{X} \\times \\mathcal{Y}} L \\left(Y, f \\left( X \\right) \\right) P \\left(x,y\\right) dxdy \\end{align*} " 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "经验风险(经验损失)是模型$f\\left(X\\right)$关于训练数据集\n", 115 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 116 | "的平均损失\n", 117 | "\\begin{align*} R_{emp} \\left( f \\right) = \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) \\end{align*} " 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "经验风险最小化\n", 125 | "\\begin{align*} \\min_{f \\in \\mathcal{F}} \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) \\end{align*}\n", 126 | "其中,$\\mathcal{F}$是假设空间。" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "结构风险最小化\n", 134 | "\\begin{align*} \\min_{f \\in \\mathcal{F}} \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, f \\left( x_{i} \\right) \\right) + \\lambda J \\left(f\\right) \\end{align*}\n", 135 | "其中,$J \\left(f\\right)$是模型复杂度,是增则化项,是定义在建设空间$\\mathcal{F}$上的泛函;$\\lambda \\geq 0$是系数,用以权衡风险和模型复杂度。" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "正则化项可以是参数向量的$L_{2}$范数\n", 143 | "\\begin{align*} L_{2} = \\| w \\|\\end{align*} \n", 144 | "其中,$\\|w\\|$表示参数向量$w$的$L_{2}$范数。 \n", 145 | "正则化项可以是参数向量的$L_{1}$范数\n", 146 | "\\begin{align*} L_{1} = \\| w \\|_{1} \\end{align*} \n", 147 | "其中,$\\|w\\|_{1}$表示参数向量$w$的$L_{1}$范数。" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "训练误差是模型$Y = \\hat f \\left(X\\right)$关于训练数据集的平均损失\n", 155 | "\\begin{align*} R_{emp} \\left( \\hat f \\right) = \\dfrac{1}{N} \\sum_{i=1}^{N} L \\left(y_{i}, \\hat f \\left( x_{i} \\right) \\right) \\end{align*} \n", 156 | "其中,$N$是训练样本容量。" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "测试误差是模型$Y = \\hat f \\left(X\\right)$关于测试数据集的平均损失\n", 164 | "\\begin{align*} e_{test} = \\dfrac{1}{N'} \\sum_{i=1}^{N'} L \\left(y_{i}, \\hat f \\left( x_{i} \\right) \\right) \\end{align*} \n", 165 | "其中,$N'$是测试样本容量。" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "当损失函数是0-1损失,测试误差即测试集上的误差率\n", 173 | "\\begin{align*} e_{test} = \\dfrac{1}{N‘} \\sum_{i=1}^{N’} I \\left( y_{i} \\neq \\hat f \\left(x_{i} \\right) \\right) \\end{align*} \n", 174 | "其中,$I$是指示函数,即$y \\neq \\hat f \\left( x \\right)$时为1,否则为0。" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "测试集上的准确率\n", 182 | "\\begin{align*} r_{test} = \\dfrac{1}{N‘} \\sum_{i=1}^{N’} I \\left( y_{i} = \\hat f \\left(x_{i} \\right) \\right) \\end{align*} \n", 183 | "则,$r_{test} + e_{test} = 1 $。" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "生成方法由数据学习联合概率分布$P\\left(X,Y\\right)$,然后求出条件概率分布$P\\left(Y|X\\right)$作为预测的模型,即生成模型\n", 191 | "\\begin{align*} P\\left(Y|X\\right) = \\dfrac{P\\left(X,Y\\right)}{P\\left(X\\right)}\\end{align*} \n", 192 | "判别方法由数据直接学习决策函数$f\\left(X\\right)$或者条件概率分布$P\\left(Y|X\\right)$作为预测的模型,即判别模型。" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "TP——将正类预测为正类;\n", 200 | "FN——将正类预测为负类;\n", 201 | "FP——将负类预测为正类;\n", 202 | "TN——将负类预测为负类。 \n", 203 | "精确率\n", 204 | "\\begin{align*} P = \\dfrac{TP}{TP+FP}\\end{align*} \n", 205 | "召回率\n", 206 | "\\begin{align*} R = \\dfrac{TP}{TP+FN}\\end{align*} \n", 207 | "$F_{1}$值是精确率和召回率的调和均值\n", 208 | "\\begin{align*} \\\\ & \\dfrac{2}{F_{1}} = \\dfrac{1}{P} + \\dfrac{1}{R} \n", 209 | "\\\\ & F_{1} = \\dfrac{2TP}{2TP+FP+FN}\\end{align*} " 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true 217 | }, 218 | "outputs": [], 219 | "source": [] 220 | } 221 | ], 222 | "metadata": { 223 | "kernelspec": { 224 | "display_name": "Python 3", 225 | "language": "python", 226 | "name": "python3" 227 | }, 228 | "language_info": { 229 | "codemirror_mode": { 230 | "name": "ipython", 231 | "version": 3 232 | }, 233 | "file_extension": ".py", 234 | "mimetype": "text/x-python", 235 | "name": "python", 236 | "nbconvert_exporter": "python", 237 | "pygments_lexer": "ipython3", 238 | "version": "3.7.1" 239 | } 240 | }, 241 | "nbformat": 4, 242 | "nbformat_minor": 1 243 | } 244 | -------------------------------------------------------------------------------- /8 boosting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "AdaBoost算法: \n", 10 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} \\subseteq R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$;弱学习算法 \n", 11 | "输出:分类器$G\\left(x\\right)$ \n", 12 | "1. 初始化训练数据的权值分布\n", 13 | "\\begin{align*} \\\\ & D_{1}=\\left(w_{11},w_{12},\\cdots,w_{1N}\\right), \\quad w_{1i} = \\dfrac{1}{N}, \\quad i=1,2,\\cdots,N\\end{align*} \n", 14 | "2. 对$m=1,2,\\cdots,M$ \n", 15 | "2.1 使用具有权值分布$D_{m}$的训练数据集学习,得到基本分类器\n", 16 | "\\begin{align*} \\\\ & G_{m}\\left(x\\right): \\mathcal{X} \\to \\left\\{ -1, +1\\right\\} \\end{align*} \n", 17 | "2.2 计算$G_{m}\\left(x\\right)$在训练数据集上的分类误差率 \n", 18 | "\\begin{align*} \\\\& e_{m} = P\\left(G_{m}\\left(x_{i}\\right) \\neq y_{i}\\right)\n", 19 | "\\\\ & = \\sum_{i=1}^{N} w_{mi} I \\left(G_{m}\\left(x_{i}\\right) \\neq y_{i} \\right) \\end{align*} \n", 20 | "2.3 计算$G_{m} \\left(x\\right)$的系数 \n", 21 | "\\begin{align*} \\\\ & \\alpha_{m} = \\dfrac{1}{2} \\log \\dfrac{1-e_{m}}{e_{m}} \\end{align*}\n", 22 | "2.4 更新训练数据集的权值分布\n", 23 | "\\begin{align*} \\\\ & D_{m+1}=\\left(w_{m+1,1},\\cdots,w_{m+1,i},\\cdots,w_{m+1,N}\\right)\n", 24 | "\\\\ & w_{m+1,i} = \\dfrac{w_{mi}}{Z_{m}} \\exp \\left(- \\alpha_{m} y_{i} G_{m}\\left(x_{i}\\right)\\right), \n", 25 | "\\\\ & \\quad \\quad = \\left\\{\n", 26 | "\\begin{aligned} \n", 27 | "\\ & \\dfrac{w_{mi}}{Z_{m}} \\exp \\left(- \\alpha_{m} \\right), G_{m}\\left(x_{i}\\right) = y_{i}\n", 28 | "\\\\ & \\dfrac{w_{mi}}{Z_{m}} \\exp \\left( \\alpha_{m} \\right), G_{m}\\left(x_{i}\\right) \\neq y_{i}\n", 29 | "\\end{aligned}\n", 30 | "\\right. \\quad i=1,2,\\cdots,N \\end{align*}\n", 31 | "其中,$Z_{m}$是规范化因子\n", 32 | "\\begin{align*} \\\\ & Z_{m}= \\sum_{i=1}^{N} w_{mi} \\exp \\left(- \\alpha_{m} y_{i}, G_{m}\\left(x_{i}\\right)\\right)\\end{align*} \n", 33 | "3. 构建基本分类器的线性组合\n", 34 | "\\begin{align*} \\\\ & f \\left( x \\right) = \\sum_{m=1}^{M} \\alpha_{m} G_{m} \\left( x \\right) \\end{align*} \n", 35 | "得到最终分类器\n", 36 | "\\begin{align*} \\\\ & G\\left(x\\right) = sign\\left(f\\left(x\\right)\\right)=sign\\left(\\sum_{m=1}^{M} \\alpha_{m} G_{m} \\left( x \\right)\\right) \\end{align*} " 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "加法模型\n", 44 | "\\begin{align*} \\\\ & f \\left( x \\right) = \\sum_{m=1}^{M} \\beta_{m} b\\left(x;\\gamma_{m}\\right) \\end{align*} \n", 45 | "其中,$b\\left(x;\\gamma_{m}\\right)$为基函数,$\\beta_{m}$为基函数系数,$\\gamma_{m}$为基函数参数。" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "在给定训练数据及损失函数$L\\left(y,f\\left(x\\right)\\right)$的条件下,学习加法模型$f\\left(x\\right)$成为经验风险极小化问题\n", 53 | "\\begin{align*} \\\\ & \\min_{\\beta_{m},\\gamma_{m}} \\sum_{i=1}^{N} L \\left( y_{i}, \\sum_{m=1}^{M} \\beta_{m} b\\left(x_{i};\\gamma_{m}\\right) \\right) \\end{align*} " 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "学习加法模型,从前向后每一步只学习一个基函数及其系数,即每步只优化\n", 61 | "\\begin{align*} \\\\ & \\min_{\\beta,\\gamma} \\sum_{i=1}^{N} L \\left( y_{i}, \\beta b\\left(x_{i};\\gamma\\right) \\right) \\end{align*} " 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "前向分布算法: \n", 69 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,损失函数$L\\left(y,f\\left(x\\right)\\right)$;基函数集$\\left\\{b\\left(x;\\gamma\\right)\\right\\}$ \n", 70 | "输出:加法模型$f\\left(x\\right)$ \n", 71 | "1. 初始化$f_{0}\\left(x\\right)=0$ \n", 72 | "2. 对$m=1,2,\\cdots,M$ \n", 73 | "2.1 极小化损失函数\n", 74 | "\\begin{align*} \\\\ & \\left(\\beta_{m},\\gamma_{m}\\right) = \\arg \\min_{\\beta,\\gamma} \\sum_{i=1}^{N} L \\left( y_{i},f_{m-1} \\left(x_{i}\\right) + \\beta b\\left(x_{i};\\gamma \\right)\\right) \\end{align*} \n", 75 | "得到参数$\\beta_{m},\\gamma_{m}$ \n", 76 | "2.2 更新 \n", 77 | "\\begin{align*} \\\\& f_{m} \\left(x\\right) = f_{m-1} \\left(x\\right) + \\beta_{m} b\\left(x;\\gamma_{m}\\right) \\end{align*} \n", 78 | "3. 得到加法模型\n", 79 | "\\begin{align*} \\\\ & f \\left( x \\right) = f_{M} \\left( x \\right) = \\sum_{m=1}^{M} \\beta_{m} b \\left( x; \\gamma_{m} \\right) \\end{align*} " 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "训练数据集\n", 87 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 88 | "其中,$x_{i} \\in \\mathcal{X} \\subseteq R^{n}, y_{i} \\in \\mathcal{Y} \\subseteq R, i = 1, 2, \\cdots, N$。" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "将输入空间$\\mathcal{X}$划分为$J$个互不相交的区域$R_{1},R_{2},\\cdots,R_{J}$,且在每个区域上确定输出的常量$c_{j}$,则回归树\n", 96 | "\\begin{align*} \\\\& T \\left(x; \\varTheta\\right) = \\sum_{j=1}^{J} c_{j} I \\left(x \\in R_{j}\\right) \\end{align*} \n", 97 | "其中,参数$\\varTheta = \\left\\{ \\left(R_{1}, c_{1}\\right),\\left(R_{2}, c_{2}\\right),\\cdots,\\left(R_{J}, c_{J}\\right) \\right\\}$表示树的区域划分和各区域上的常数。$J$是回归树的负责度即叶结点个数。" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "回归提升树使用前向分布算法\n", 105 | "\\begin{align*} \\\\& f_{0}=0\n", 106 | "\\\\ & f_{m}\\left(x\\right) = f_{m-1}\\left(x\\right) + T \\left(x; \\varTheta_{m}\\right) \n", 107 | "\\\\ & f_{M} = \\sum_{m=1}^{M} T \\left(x; \\varTheta_{m}\\right) \\end{align*} " 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "在前向分布算法的第$m$步给定当前模型$f_{m-1}\\left(x\\right)$,模型参数\n", 115 | "\\begin{align*} \\\\& \\hat \\varTheta_{m} = \\arg \\min_{\\varTheta_{m}} \\sum_{i=1}^{N} L \\left( y_{i}, f_{m-1}\\left(x_{i}\\right) + T \\left( x_{i}; \\varTheta_{m} \\right) \\right) \\end{align*} \n", 116 | "得到第$m$棵树的参数$\\hat \\varTheta_{m}$" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "当采用平方误差损失函数\n", 124 | "\\begin{align*} \\\\& L \\left( y, f_{m-1}\\left(x\\right)+T\\left(x;\\varTheta_{m}\\right)\\right) \n", 125 | "\\\\ & = \\left[y-f_{m-1}\\left(x\\right)-T\\left(x;\\varTheta_{m}\\right)\\right]^{2} \n", 126 | "\\\\ & = \\left[r-T\\left(x;\\varTheta_{m}\\right)\\right]^{2}\\end{align*} \n", 127 | "其中,$r=y-f_{m-1}\\left(x\\right)$是当前模型拟合数据的残差。" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "回归提升树算法: \n", 135 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\},x_{i} \\in \\mathcal{X} \\subseteq R^{n}, y_{i} \\in \\mathcal{Y} \\subseteq R, i = 1, 2, \\cdots, N$ \n", 136 | "输出:回归提升树$f_{M}\\left(x\\right)$ \n", 137 | "1. 初始化$f_{0}\\left(x\\right)=0$ \n", 138 | "2. 对$m=1,2,\\cdots,M$ \n", 139 | "2.1 计算残差\n", 140 | "\\begin{align*} \\\\ & r_{mi}=y_{i}-f_{m-1}\\left(x_{i}\\right),\\quad i=1,2,\\cdots,N \\end{align*} \n", 141 | "2.2 拟合残差$r_{mi}$学习一个回归树,得到$T\\left(x;\\varTheta_{m}\\right)$ \n", 142 | "2.3 更新$f_{m}=f_{m-1}\\left(x\\right)+T\\left(x;\\varTheta_{m}\\right)$ \n", 143 | "3. 得到回归提升树\n", 144 | "\\begin{align*} \\\\ & f_{M} \\left( x \\right) = \\sum_{m=1}^{M} T \\left(x;\\varTheta_{m}\\right) \\end{align*} " 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "梯度提升算法: \n", 152 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\},x_{i} \\in \\mathcal{X} \\subseteq R^{n}, y_{i} \\in \\mathcal{Y} \\subseteq R, i = 1, 2, \\cdots, N$,损失函数$L\\left(y,f\\left(x\\right)\\right)$ \n", 153 | "输出:回归树$\\hat f\\left(x\\right)$ \n", 154 | "1. 初始化\n", 155 | "\\begin{align*} \\\\ & f_{0}\\left(x\\right) = \\arg \\min_{c} \\sum_{i=1}^{N} L \\left(y_{i},c\\right) \\end{align*} \n", 156 | "2. 对$m=1,2,\\cdots,M$ \n", 157 | "2.1 对$i=1,2,\\cdots,N$计算\n", 158 | "\\begin{align*} \\\\ & r_{mi}=- \\left[ \\dfrac {\\partial L \\left(y_{i},f\\left(x_{i}\\right) \\right)}{\\partial f \\left(x_{i} \\right)}\\right]_{f\\left(x\\right)=f_{m-1}\\left(x\\right)} \\end{align*} \n", 159 | "2.2 对$r_{mi}$拟合回归树,得到第$m$棵树的叶结点区域$R_{mj},j=1,2,\\cdots,J$ \n", 160 | "2.3 对$j=1,2,\\cdots,J$计算\n", 161 | "\\begin{align*} \\\\ & c_{mj}=\\arg \\min_{c} \\sum_{x_{i} \\in R_{mj}} L \\left( y_{i},f_{m-1} \\left(x_{i}\\right)+c \\right) \\end{align*} \n", 162 | "2.4 更新$f_{m}\\left(x\\right)= f_{m-1}\\left(x\\right) + \\sum_{j=1}^{J} c_{mj} I \\left(x \\in R_{mj} \\right)$\n", 163 | "3. 得到回归树\n", 164 | "\\begin{align*} \\\\ & \\hat f \\left( x \\right) = f_{M} \\left( x \\right) = \\sum_{m=1}^{M} \\sum_{j=1}^{J} c_{mj} I \\left( x \\in R_{mj} \\right) \\end{align*} " 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": { 171 | "collapsed": true 172 | }, 173 | "outputs": [], 174 | "source": [] 175 | } 176 | ], 177 | "metadata": { 178 | "kernelspec": { 179 | "display_name": "Python 2", 180 | "language": "python", 181 | "name": "python2" 182 | }, 183 | "language_info": { 184 | "codemirror_mode": { 185 | "name": "ipython", 186 | "version": 2 187 | }, 188 | "file_extension": ".py", 189 | "mimetype": "text/x-python", 190 | "name": "python", 191 | "nbconvert_exporter": "python", 192 | "pygments_lexer": "ipython2", 193 | "version": "2.7.11" 194 | } 195 | }, 196 | "nbformat": 4, 197 | "nbformat_minor": 0 198 | } 199 | -------------------------------------------------------------------------------- /B newton.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "无约束最优化问题\n", 8 | "\\begin{align*} \\\\& \\min_{x \\in R^{n}} f\\left(x\\right)\\end{align*} \n", 9 | "其中$x^{*}$为目标函数的极小点。" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "设$f\\left(x\\right)$具有二阶连续偏导数,若第$k$次迭代值为$x^{\\left(k\\right)}$,则可将$f\\left(x\\right)$在$x^{\\left(k\\right)}$附近进行二阶泰勒展开\n", 17 | "\\begin{align*} \\\\& f\\left(x\\right) = f\\left(x^{\\left(k\\right)}\\right)+g_{k}^{T}\\left(x-x^{\\left(k\\right)}\\right)+\\dfrac{1}{2}\\left(x-x^{\\left(k\\right)}\\right)^{T} H\\left(x^{\\left(k\\right)}\\right)\\left(x-x^{\\left(x\\right)}\\right)\\end{align*} \n", 18 | "其中,$g_{k}=g\\left(x^{\\left(k\\right)}\\right)=\\nabla f\\left(x^{\\left(k\\right)}\\right)$是$f\\left(x\\right)$的梯度向量在点$x^{\\left(k\\right)}$的值,$H\\left(x^{\\left(k\\right)}\\right)$是$f\\left(x\\right)$的海赛矩阵\n", 19 | "\\begin{align*} \\\\& H\\left(x\\right)=\\left[\\dfrac{\\partial^{2}f}{\\partial x_{i} \\partial x_{j}}\\right]_{n \\times n}\\end{align*} \n", 20 | "在点$x^{\\left(k\\right)}$的值。" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "函数$f\\left(x\\right)$有极值的必要条件是在极值点处一阶导数为0,即梯度向量为0。特别的当$H\\left(x^{\\left(k\\right)}\\right)$是正定矩阵时,函数$f\\left(x\\right)$的极值为极小值。" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "假设$x^{\\left(k+1\\right)}$满足\n", 35 | "\\begin{align*} \\\\& \\nabla f\\left(x^{\\left(k+1\\right)}\\right)=0\\end{align*} \n", 36 | "根据二阶泰勒展开,得\n", 37 | "\\begin{align*} \\\\& \\nabla f\\left(x\\right)=g_{k}+H_{k}\\left(x-x^{\\left(x\\right)}\\right)\\end{align*} \n", 38 | "其中,$H_{k}=H\\left(x^{\\left(k\\right)}\\right)$,则\n", 39 | "\\begin{align*} \\\\& g_{k}+H_{k}\\left(x^{\\left(k+1\\right)}-x^{\\left(x\\right)}\\right)=0\n", 40 | "\\\\ & x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-H_{k}^{-1}g_{k}\\end{align*} \n", 41 | "令\n", 42 | "\\begin{align*} \\\\& H_{k}p_{k}=-g_{k}\\end{align*} \n", 43 | "则\n", 44 | "\\begin{align*} \\\\& x^{\\left(k+1\\right)}=x^{\\left(k\\right)}+p_{k}\\end{align*} " 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "source": [ 53 | "牛顿法: \n", 54 | "输入:目标函数$f\\left(x\\right)$,梯度$g\\left(x\\right)=\\nabla f\\left(x\\right)$,海赛矩阵$H\\left(x\\right)$,精度要求$\\varepsilon$ \n", 55 | "输出:$f\\left(x\\right)$的极小点$x^{*}$ \n", 56 | "1. 取初始点$x^{\\left(0\\right)}$,置$k=0$ \n", 57 | "2. 计算$g_{k}=g\\left(x^{\\left(k\\right)}\\right)$ \n", 58 | "3. 若$\\|g_{k}\\| < \\varepsilon$则停止计算,得近似解$x^{*}=x^{\\left(k\\right)}$ \n", 59 | "4. 计算$H_{k}=H\\left(x^{\\left(k\\right)}\\right)$,并求$p_{k}$\n", 60 | "\\begin{align*} \\\\& H_{k}p_{k}=-g_{k}\\end{align*}\n", 61 | "5. 置$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}+p_{k}$\n", 62 | "6. 置$k=k+1$,转2." 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "设牛顿法搜索方向是$p_{k}=-\\lambda g_{k}$\n", 70 | "由\n", 71 | "\\begin{align*} \\\\& x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-H_{k}^{-1}g_{k} \\end{align*} \n", 72 | "有\n", 73 | "\\begin{align*} \\\\& x=x^{\\left(k\\right)}-\\lambda H_{k}^{-1} g_{k}=x^{\\left(k\\right)}+\\lambda p_{k} \\end{align*}\n", 74 | "则$f\\left(x\\right)$在$x^{\\left(k\\right)}$的泰勒展开可近似为\n", 75 | "\\begin{align*} \\\\& f\\left(x\\right)=f\\left(x^{\\left(k\\right)}\\right)-\\lambda g_{k}^{T} H_{k}^{-1} g_{k}\\end{align*}\n", 76 | "由于$H_{k}^{-1}$正定,故$g_{k}^{T} H_{k}^{-1} g_{k} > 0$。当$\\lambda$为一个充分小的正数时,有$f\\left(x\\right) < f\\left(x^{\\left(x\\right)}\\right)$,即搜索方向$p_{k}$是下降方向。" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": { 82 | "collapsed": true 83 | }, 84 | "source": [ 85 | "取$x=x^{\\left(k+1\\right)}$,由\n", 86 | "\\begin{align*} \\\\& \\nabla f\\left(x\\right)=g_{k}+H_{k}\\left(x-x^{\\left(x\\right)}\\right)\\end{align*} \n", 87 | "得\n", 88 | "\\begin{align*} \\\\& g_{k+1}-g_{k}=H_{k}\\left(x^{\\left(k+1\\right)}-x^{\\left(x\\right)}\\right)\\end{align*} \n", 89 | "记$y_{k}=g_{k+1}-g_{k},\\delta_{k}=x^{\\left(k+1\\right)}-x^{\\left(k\\right)}$,则\n", 90 | "\\begin{align*} \\\\& y_{k}=H_{k}\\delta_{k}\n", 91 | "\\\\ & H_{k}^{-1}y_{k}=\\delta_{k}\\end{align*} \n", 92 | "称为拟牛顿条件。" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "DFP算法中选择$G_{k}$作为$H_{k}^{-1}$的近似,假设每一步迭代中矩阵$G_{k+1}$是由$G_{k}$加上两个附加项构成,即\n", 100 | "\\begin{align*} \\\\& G_{k+1}=G_{k}+P_{k}+Q_{k}\\end{align*}\n", 101 | "其中,$P_{k}$与$Q_{k}$是待定矩阵。则\n", 102 | "\\begin{align*} \\\\& G_{k+1}y_{k}=G_{k}y_{k}+P_{k}y_{k}+Q_{k}y_{k}\\end{align*}\n", 103 | "为使$G_{k+1}$满足拟牛顿条件,可使$P_{k}$与$Q_{k}$满足\n", 104 | "\\begin{align*} \\\\& P_{k}y_{k}=\\delta_{k}\n", 105 | "\\\\ & Q_{k}y_{k}=-G_{k}y_{k}\\end{align*}\n", 106 | "可取\n", 107 | "\\begin{align*} \\\\& P_{k}= \\dfrac{\\delta_{k} \\delta_{k}^{T}}{\\delta_{k}^{T} y_{k}}\n", 108 | "\\\\ & Q_{k}=- \\dfrac{G_{k}y_{k}y_{k}^{T}G_{k}}{y_{k}^{T}G_{k}y_{k}}\\end{align*}\n", 109 | "可得矩阵$G_{k+1}$的迭代公式\n", 110 | "\\begin{align*} \\\\& G_{k+1}=G_{k}+\\dfrac{\\delta_{k} \\delta_{k}^{T}}{\\delta_{k}^{T} y_{k}}- \\dfrac{G_{k}y_{k}y_{k}^{T}G_{k}}{y_{k}^{T}G_{k}y_{k}}\\end{align*}\n", 111 | "可以证明,如果初始矩阵$G_{0}$是正定的,则迭代过程中的每个矩阵$G_{k}$都是正定的。" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "DFP算法: \n", 119 | "输入:目标函数$f\\left(x\\right)$,梯度$g\\left(x\\right)=\\nabla f\\left(x\\right)$,精度要求$\\varepsilon$ \n", 120 | "输出:$f\\left(x\\right)$的极小点$x^{*}$ \n", 121 | "1. 取初始点$x^{\\left(0\\right)}$,取$G_{0}为正定矩阵,$置$k=0$ \n", 122 | "2. 计算$g_{k}=g\\left(x^{\\left(k\\right)}\\right)$ 若$\\|g_{k}\\| < \\varepsilon$则停止计算,得近似解$x^{*}=x^{\\left(k\\right)}$;否则,转3. \n", 123 | "3. 置$p_{k}=-G_{k}g_{k}$\n", 124 | "4. 一维搜索,求$\\lambda_{k}$使\n", 125 | "\\begin{align*} \\\\& f\\left(x^{\\left(k\\right)}+\\lambda_{k}p_{k}\\right)=\\min_{\\lambda \\geq 0} f\\left(x^{\\left(k\\right)}+\\lambda p_{k}\\right)\\end{align*}\n", 126 | "5. 置$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}+\\lambda p_{k}$\n", 127 | "6. 计算$g_{k+1}=g\\left(x^{\\left(k+1\\right)}\\right)$,若$\\|g_{k+1}\\| < \\varepsilon$,则停止计算,的近似解$x^{*}=x^{\\left(k+1\\right)}$;否则,计算\n", 128 | "\\begin{align*} \\\\& G_{k+1}=G_{k}+\\dfrac{\\delta_{k} \\delta_{k}^{T}}{\\delta_{k}^{T} y_{k}}- \\dfrac{G_{k}y_{k}y_{k}^{T}G_{k}}{y_{k}^{T}G_{k}y_{k}}\\end{align*}\n", 129 | "7. 置$k=k+1$,转3." 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "BFGS算法中选择$B_{k}$逼近海赛矩阵$H_{k}$,相应的拟牛顿条件\n", 137 | "\\begin{align*} \\\\& B_{k+1} \\delta_{k} = y_{k}\\end{align*}\n", 138 | "假设每一步迭代中矩阵$B_{k+1}$是由$B_{k}$加上两个附加项构成,即\n", 139 | "\\begin{align*} \\\\& B_{k+1}=B_{k}+P_{k}+Q_{k}\\end{align*}\n", 140 | "其中,$P_{k}$与$Q_{k}$是待定矩阵。则\n", 141 | "\\begin{align*} \\\\& B_{k+1}y_{k}=B_{k}y_{k}+P_{k}y_{k}+Q_{k}y_{k}\\end{align*}\n", 142 | "为使$B_{k+1}$满足拟牛顿条件,可使$P_{k}$与$Q_{k}$满足\n", 143 | "\\begin{align*} \\\\& P_{k}\\delta_{k}=y_{k}\n", 144 | "\\\\ & Q_{k}\\delta_{k}=-B_{k}y_{k}\\delta_{k}\\end{align*}\n", 145 | "可取\n", 146 | "\\begin{align*} \\\\& P_{k}= \\dfrac{y_{k}y_{k}^{T}}{y_{k}^{T}\\delta_{k} }\n", 147 | "\\\\ & Q_{k}=- \\dfrac{B_{k}\\delta_{k}\\delta_{k}^{T}B_{k}}{\\delta_{k}^{T}B_{k}\\delta_{k}}\\end{align*}\n", 148 | "可得矩阵$B_{k+1}$的迭代公式\n", 149 | "\\begin{align*} \\\\& B_{k+1}=B_{k}+\\dfrac{y_{k}y_{k}^{T}}{y_{k}^{T}\\delta_{k} }- \\dfrac{B_{k}\\delta_{k}\\delta_{k}^{T}B_{k}}{\\delta_{k}^{T}B_{k}\\delta_{k}}\\end{align*}\n", 150 | "可以证明,如果初始矩阵$B_{0}$是正定的,则迭代过程中的每个矩阵$B_{k}$都是正定的。" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "BFGS算法: \n", 158 | "输入:目标函数$f\\left(x\\right)$,梯度$g\\left(x\\right)=\\nabla f\\left(x\\right)$,精度要求$\\varepsilon$ \n", 159 | "输出:$f\\left(x\\right)$的极小点$x^{*}$ \n", 160 | "1. 取初始点$x^{\\left(0\\right)}$,取$B_{0}为正定矩阵,$置$k=0$ \n", 161 | "2. 计算$g_{k}=g\\left(x^{\\left(k\\right)}\\right)$ 若$\\|g_{k}\\| < \\varepsilon$则停止计算,得近似解$x^{*}=x^{\\left(k\\right)}$;否则,转3. \n", 162 | "3. 由$B_{k}p_{k}=-g_{k}$求出$p_{k}$\n", 163 | "4. 一维搜索,求$\\lambda_{k}$使\n", 164 | "\\begin{align*} \\\\& f\\left(x^{\\left(k\\right)}+\\lambda_{k}p_{k}\\right)=\\min_{\\lambda \\geq 0} f\\left(x^{\\left(k\\right)}+\\lambda p_{k}\\right)\\end{align*}\n", 165 | "5. 置$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}+\\lambda p_{k}$\n", 166 | "6. 计算$g_{k+1}=g\\left(x^{\\left(k+1\\right)}\\right)$,若$\\|g_{k+1}\\| < \\varepsilon$,则停止计算,的近似解$x^{*}=x^{\\left(k+1\\right)}$;否则,计算\n", 167 | "\\begin{align*} \\\\& B_{k+1}=B_{k}+\\dfrac{y_{k}y_{k}^{T}}{y_{k}^{T}\\delta_{k} }- \\dfrac{B_{k}\\delta_{k}\\delta_{k}^{T}B_{k}}{\\delta_{k}^{T}B_{k}\\delta_{k}}\\end{align*}\n", 168 | "7. 置$k=k+1$,转3." 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "记\n", 176 | "\\begin{align*} \\\\& G_{k}=B_{k}^{-1},\\quad G_{k+1}=B_{k+1}^{-1}\\end{align*}\n", 177 | "两次应用Sherman-Morrison公式,得\n", 178 | "\\begin{align*} \\\\& G_{k+1}=\\left(I- \\dfrac{\\delta_{k}y_{k}^{T}}{\\delta_{k}^{T}y_{k}}\\right)G_{k}\\left(I-\\dfrac{\\delta_{k}y_{k}^{T}}{\\delta_{k}^{T}y_{k}}\\right)^{T}+\\dfrac{\\delta_{k}\\delta_{k}^{T}}{\\delta_{k}^{T}y_{k}}\\end{align*}\n", 179 | "称为BFGS算法关于$G_{k}$的迭代公式。" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "令由DFP算法$G_{k}$的迭代公式得到的$G_{k+1}$记作$G^{DFP}$,由BFGS算法$G_{k}$的迭代公式得到的$G_{k+1}$记作$G^{BFGS}$,\n", 187 | "由于$G^{DFP}$和$G^{BFGS}$均满足拟牛顿条件,\n", 188 | "则两者的线性组合\n", 189 | "\\begin{align*} \\\\& G_{k+1}=\\alpha G^{DFP}+\\left(1-\\alpha\\right) G^{BFGS}\\end{align*}\n", 190 | "也满足拟牛顿条件,而且是正定的。其中,$0 \\leq \\alpha \\leq 1$。该类算法称为Broyden类算法。" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": true 198 | }, 199 | "outputs": [], 200 | "source": [] 201 | } 202 | ], 203 | "metadata": { 204 | "kernelspec": { 205 | "display_name": "Python 2", 206 | "language": "python", 207 | "name": "python2" 208 | }, 209 | "language_info": { 210 | "codemirror_mode": { 211 | "name": "ipython", 212 | "version": 2 213 | }, 214 | "file_extension": ".py", 215 | "mimetype": "text/x-python", 216 | "name": "python", 217 | "nbconvert_exporter": "python", 218 | "pygments_lexer": "ipython2", 219 | "version": "2.7.11" 220 | } 221 | }, 222 | "nbformat": 4, 223 | "nbformat_minor": 0 224 | } 225 | -------------------------------------------------------------------------------- /9.1 em.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "不完全数据:观测随机变量$Y$。 \n", 10 | "完全数据:观测随机变量$Y$和隐随机变量$Z$。" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "$Q$函数:完全数据的对数似然函数$\\log P \\left( Y , Z | \\theta \\right)$关于在给定观测数据$Y$和当前参数$\\theta_{\\left( i \\right)}$下对未观测数据$Z$的条件概率分布$P \\left( Z | Y, \\theta_{\\left( i \\right)} \\right)$的期望 \n", 18 | "\\begin{align*} & Q \\left( \\theta, \\theta_{\\left( i \\right)} \\right) = E_{Z} \\left[ \\log P \\left( Y, Z | \\theta \\right) | Y , \\theta_{\\left( i \\right)} \\right] \\end{align*} " 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "含有隐变量$Z$的概率模型,目标是极大化观测变量$Y$关于参数$\\theta$的对数似然函数,即 \\begin{align*} & \\max L \\left( \\theta \\right) = \\log P \\left( Y | \\theta \\right) \\\\ & = \\log \\sum_{Z} P \\left( Y,Z | \\theta \\right) \\\\ & = \\log \\left( \\sum_{Z} P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right) \\right)\\end{align*} " 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "对数似然函数$L \\left( \\theta \\right)$与第$i$次迭代后的对数似然函数估计值$L \\left( \\theta^{\\left( i \\right)} \\right)$的差 \\begin{align*} & L \\left( \\theta \\right) - L \\left( \\theta^{\\left( i \\right)} \\right) = \\log \\left( \\sum_{Z} P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right) \\right) - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right) \\\\ & = \\log \\left( \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\dfrac { P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right)} \\right) - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right)\\\\ &\\geq \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)}{P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right)} - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right) \\\\ & = \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)}\\end{align*} " 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "令\\begin{align*} \\\\& B \\left( \\theta , \\theta^{\\left ( i \\right)} \\right) = L \\left( \\theta^{\\left ( i \\right)} \\right) + \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)} \\end{align*} \n", 40 | "则 \\begin{align*} & L \\left( \\theta \\right) \\geq B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*} \n", 41 | "即函$B \\left( \\theta, \\theta^{\\left( i \\right)} \\right)$ 是$L \\left( \\theta \\right)$ 的一个下界。 \n", 42 | "选择$\\theta^{\\left( i \\right)}$使$B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) $达到极大,即 \\begin{align*} & \\theta^{\\left( i+1 \\right)}= \\arg \\max B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\\\ & = \\arg \\max \\left( L \\left( \\theta^{\\left ( i \\right)} \\right) + \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)} \\right) \\\\ & = \\arg \\max \\left( \\sum_{Z} P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) \\log \\left( P \\left( Y | Z, \\theta \\right) \\right) P \\left( Z | \\theta \\right) \\right) \\\\ & = \\arg \\max \\left( \\sum_{Z} P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) \\log P \\left( Y, Z | \\theta\\right) \\right) \\\\ & = \\arg \\max Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*}" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "EM算法: \n", 50 | "输入:观测随机变量数据$Y$,隐随机变量数据$Z$,联合分布$P\\left(Y,Z|\\theta\\right) $,条件分布$P\\left(Y|Z,\\theta\\right) $; \n", 51 | "输出:模型参数$\\theta$ \n", 52 | "1. 初值$\\theta^{\\left(0\\right)}$ \n", 53 | "2. $E$步:\\begin{align*} & Q\\left(\\theta,\\theta^\\left(i\\right)\\right)=E_{Z}\\left[\\log P\\left(Y,Z|\\theta\\right)|Y,\\theta^{\\left(i\\right)}\\right] \\\\ & = \\sum_{Z} \\log P\\left(Y,Z|\\theta \\right) \\cdot P\\left(Z|Y, \\theta^\\left(i\\right)\\right)\\end{align*} \n", 54 | "3. $M$步:\\begin{align*} & \\theta^{\\left( i+1 \\right)} = \\arg \\max Q\\left(\\theta, \\theta^\\left( i \\right) \\right)\\end{align*} \n", 55 | "4. 重复2. 3.,直到收敛。" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "collapsed": true 62 | }, 63 | "source": [ 64 | "$F$函数:隐变量$Z$的概率分布为$\\tilde{P} \\left( Z \\right)$,关于分布$\\tilde{P}$与参数$\\theta$的函数\n", 65 | "\\begin{align*} \\\\ & F \\left( \\tilde{P}, \\theta \\right) = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P} \\right) \\end{align*} \n", 66 | "其中,$H \\left( \\tilde{P} \\right) = - E_{\\tilde{P}} \\left[ \\log \\tilde{P} \\left( Z\\right)\\right]$是分布$\\tilde{P} \\left( Z \\right)$的熵。" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "对于固定的$\\theta$,极大化$F$函数 \n", 74 | "\\begin{align*} \\\\ & \\max_{\\tilde{P}} F \\left( \\tilde{P}, \\theta \\right) \n", 75 | "\\\\ & s.t. \\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) = 1 \\end{align*} " 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "引入拉格朗日乘子$\\lambda$,构造拉格朗日函数\n", 83 | "\\begin{align*} \\\\ & L = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] - E_{\\tilde{P}} \\left[ \\log \\tilde{P} \\left( Z\\right)\\right] + \\lambda \\left( 1 - \\sum_{Z} \\tilde{P} \\left( Z \\right) \\right) \n", 84 | "\\\\ & = \\sum_{Z} \\log P \\left( Y, Z | \\theta \\right) \\tilde{P} \\left( Z \\right) - \\sum_{Z} \\log P \\left( Z \\right) \\tilde{P} \\left( Z \\right) + \\lambda - \\lambda \\sum_{Z} \\tilde{P} \\left( Z \\right) \\end{align*} " 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "将其对$\\tilde{P} \\left( Z \\right)$求偏导,得\n", 92 | "\\begin{align*} \\\\ & \\dfrac {\\partial L}{\\partial \\tilde{P} \\left( Z \\right) } = \\log P \\left( Y, Z | \\theta \\right) - 1 - \\log P \\left( Z \\right) - \\lambda \\end{align*} " 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "令其等于0,得\n", 100 | "\\begin{align*} \\\\ & \\lambda = \\log P \\left( Y, Z | \\theta \\right) - 1 - \\log P \\left( Z \\right) \n", 101 | "\\\\ & \\dfrac{P \\left( Y, Z | \\theta \\right) }{\\tilde{P}_{\\theta} \\left( Z \\right) } = e^{1 + \\lambda } \n", 102 | "\\\\ & \\sum_{Z} P \\left( Y, Z | \\theta \\right) = e^{1 + \\lambda } \\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) \\end{align*} \n", 103 | "由于$\\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) = 1$,得 \n", 104 | "\\begin{align*} \\\\ & P \\left( Y \\right) = e^{1 + \\lambda } \\end{align*} \n", 105 | "代回,得 \n", 106 | "\\begin{align*} \\\\ & \\tilde{P}_{\\theta} \\left( Z \\right) = P \\left( Z | Y, \\theta \\right) \\end{align*} " 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": { 112 | "collapsed": true 113 | }, 114 | "source": [ 115 | "则\n", 116 | "\\begin{align*} \\\\ & F \\left( \\tilde{P}, \\theta \\right) = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P} \\right) \n", 117 | "\\\\ & = \\sum_{Z} \\log P \\left( Y, Z | \\theta \\right) \\tilde{P} \\left( Z \\right) - \\sum_{Z} \\log P \\left( Z \\right) \\tilde{P} \\left( Z \\right)\n", 118 | "\\\\ & = \\sum_{Z} \\tilde{P} \\left( Z \\right) \\log \\dfrac{P \\left( Y, Z | \\theta \\right) }{\\tilde{P} \\left( Z \\right) }\n", 119 | "\\\\ & = \\sum_{Z} \\tilde{P} \\left( Z \\right) \\log \\dfrac{P \\left( Z | Y, \\theta \\right) P \\left(Y | \\theta \\right) }{\\tilde{P} \\left( Z \\right) }\n", 120 | "\\\\ & = \\log P \\left(Y | \\theta \\right) \\sum_{Z} \\tilde{P} \\left( Z \\right) \n", 121 | "\\\\ & = \\log P \\left(Y | \\theta \\right) \n", 122 | "\\\\ & = L \\left( \\theta \\right) \\end{align*} " 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "对于使$F \\left( \\tilde{P}, \\theta \\right)$达到最大值的参数$\\theta^{*}$,有\n", 130 | "\\begin{align*} L \\left( \\theta^{*} \\right) = F \\left( \\tilde{P}_{\\theta^{*}}, \\theta^{*} \\right) = F \\left( \\tilde{P}^{*}, \\theta^{*} \\right)\\end{align*} \n", 131 | "即,如果$F \\left( \\tilde{P}, \\theta \\right)$在$\\tilde{P}^{*}, \\theta^{*}$达到局部极大值(全局最大值),则$L \\left( \\theta^{*} \\right)$在$\\tilde{P}^{*}, \\theta^{*}$也达到局部极大值(全局最大值)。" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "由$\\tilde{P}_{\\theta} \\left( Z \\right) = P \\left( Z | Y, \\theta \\right)$,对固定的$\\theta^{\\left( i \\right) }$,\n", 139 | "\\begin{align*} \\tilde{P}^{\\left( i + 1 \\right)} \\left( Z \\right) = \\tilde{P}_{\\theta^{\\left( i \\right)}} \\left( Z \\right) = P \\left( Z | Y, \\theta^{\\left( i \\right) } \\right)\\end{align*} \n", 140 | "使$F \\left( \\tilde{P}, \\theta^{\\left( i \\right)} \\right)$极大化, \n", 141 | "则\n", 142 | "\\begin{align*} \\\\ & F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right) = E_{\\tilde{P}^{\\left( i + 1 \\right)}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right) \n", 143 | "\\\\ & = \\sum_{Z} log P \\left(Y , Z | \\theta \\right) P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right) \n", 144 | "\\\\ & =Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right)\\end{align*} " 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "固定$\\tilde{P}^{\\left( i + 1 \\right)}$,求$\\theta^{\\left( i \\right)}$使$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$极大化,得 \n", 152 | "\\begin{align*} \\theta^{\\left( i + 1 \\right)} = \\arg \\max_{\\theta} F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right) = \\arg \\max_{\\theta} Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*} \n", 153 | "即,由$EM$算法与$F$函数的极大-极大算法的到的参数估计序列$\\theta^{\\left( i \\right)},i = 1, 2, \\cdots,$是一致的。" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "$GEM$算法: \n", 161 | "输入:观测数据$Y$,$F$函数; \n", 162 | "输出:模型参数$\\theta$ \n", 163 | "1. 初值$\\theta^{\\left(0\\right)}$ \n", 164 | "2. 第$i+1$次迭代,第1步:记$\\theta^{\\left( i \\right)}$为参数$\\theta$的估计值,$\\tilde{P}^{\\left( i \\right)} $为函数$\\tilde{P}$的估计。求$\\tilde{P}^{\\left( i+1 \\right)} $使$\\tilde{P}$极大化$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$\n", 165 | "3. 第2步:求$\\theta^{\\left( i \\right)}$使$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$极大化\n", 166 | "4. 重复(2)和(3),直到收敛。" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": { 173 | "collapsed": true 174 | }, 175 | "outputs": [], 176 | "source": [] 177 | } 178 | ], 179 | "metadata": { 180 | "kernelspec": { 181 | "display_name": "Python 2", 182 | "language": "python", 183 | "name": "python2" 184 | }, 185 | "language_info": { 186 | "codemirror_mode": { 187 | "name": "ipython", 188 | "version": 2 189 | }, 190 | "file_extension": ".py", 191 | "mimetype": "text/x-python", 192 | "name": "python", 193 | "nbconvert_exporter": "python", 194 | "pygments_lexer": "ipython2", 195 | "version": "2.7.11" 196 | } 197 | }, 198 | "nbformat": 4, 199 | "nbformat_minor": 0 200 | } 201 | -------------------------------------------------------------------------------- /5 dt.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "熵表示随机变量不确定性的度量。 \n", 10 | "设$X$是一个取有限个值的离散随机变量,其概率分布为\n", 11 | "\\begin{align*} P \\left( X = x_{i} \\right) = p_{i}, \\quad i =1, 2, \\cdots, n \\end{align*} \n", 12 | "则随机变量$X$的熵\n", 13 | "\\begin{align*} H \\left( X \\right) = H \\left( p \\right) = - \\sum_{i=1}^{n} p_{i} \\log p_{i} \\end{align*} \n", 14 | "其中,若$p_{i}=0$,则定义$0 \\log 0 = 0$" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "若\n", 22 | "\\begin{align*} p_{i} = \\dfrac{1}{n} \\end{align*} \n", 23 | "则\n", 24 | "\\begin{align*} \\\\ & H \\left( p \\right) = - \\sum_{i=1}^{n} p_{i} \\log p_{i} \n", 25 | "\\\\ & = - \\sum_{i=1}^{n} \\dfrac{1}{n} \\log \\dfrac{1}{n}\n", 26 | "\\\\ & = \\log n\\end{align*} \n", 27 | "由定义,得\n", 28 | "\\begin{align*} \\\\ & 0 \\leq H \\left( p \\right) \\leq \\log n\\end{align*} " 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "设有随机变量$\\left( X , Y \\right)$,其联合分布\n", 36 | "\\begin{align*} \\\\ & P \\left( X = x_{i}, Y = y_{j} \\right) = p_{ij}, \\quad i=1,2, \\cdots, n; \\quad j=1,2, \\cdots, m\\end{align*} " 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "随机变量$X$给定的条件下随机变量$Y$的条件熵\n", 44 | "\\begin{align*} \\\\ & H \\left( Y | X \\right) = \\sum_{i=1}^{n} p_{i} H \\left( Y | X = x_{i} \\right) \\end{align*} \n", 45 | "即,$X$给定条件下$Y$的条件概率分布的熵对$X$的数学期望。其中,$p_{i}=P \\left( X = x_{i} \\right), i= 1,2,\\cdots,n$。 \n", 46 | "条件熵$H \\left( Y | X \\right)$表示在已知随机变量$X$的条件下随机变量$Y$的不确定性。" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "特征$A$对训练集$D$的信息增益\n", 54 | "\\begin{align*} \\\\ & g \\left( D, A \\right) = H \\left( D \\right) - H \\left( D | A \\right) \\end{align*} \n", 55 | "即,集合$D$的经验熵$H \\left( D \\right)$与特征$A$给定条件下$D$的经验条件熵$H \\left( D | A \\right)$之差。 \n", 56 | "其中,当熵和条件熵由数据估计(极大似然估计)得到时,对应的熵和条件熵分别称为经验熵和经验条件熵。 \n", 57 | "信息增益$g \\left( X , Y \\right)$表示已知特征$X$的信息而使得类$Y$的信息的不确定性减少的程度。" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "设训练数据集为$D$,$\\left| D \\right|$表示其样本容量,即样本个数。 \n", 65 | "设有$K$个类$C_{k}, k=1,2,\\cdots,K$,$\\left| C_{k} \\right|$为属于类$C_{k}$的样本的个数,$\\sum_{k=1}^{K} \\left| C_{k} \\right| = \\left| D \\right|$。 \n", 66 | "设特征$A$有$n$个不同的特征取值$\\left\\{ a_{1},a_{2},\\cdots,a_{n}\\right\\}$,根据特征$A$的取值将$D$划分为$n$个子集$D_{1},D_{2},\\cdots,D_{n}$,$\\left| D_{i} \\right|$为$D_{i}$的样本数,$\\sum_{i=1}^{n}\\left| D_{i} \\right| = \\left| D \\right|$。 \n", 67 | "记子集$D_{i}$中属于类$C_{k}$的样本的集合为$D_{ik}$,即$D_{ik} = D_{i} \\cap C_{k}$,$\\left| D_{ik} \\right|$为$D_{ik}$的样本个数。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "信息增益算法: \n", 75 | "输入:训练数据集$D$和特征$A$ \n", 76 | "输出:特征$A$对训练数据集$D$的信息增益$g \\left( D, A \\right) $\n", 77 | "1. 计算数据集$D$的经验熵$H\\left(D\\right)$ \n", 78 | "\\begin{align*} \\\\ & H \\left( D \\right) = -\\sum_{k=1}^{K} \\dfrac{\\left|C_{k}\\right|}{\\left| D \\right|}\\log_{2}\\dfrac{\\left|C_{k}\\right|}{\\left| D \\right|} \\end{align*}\n", 79 | "2. 计算特征$A$对数据集$D$的经验条件熵$H \\left( D | A \\right)$\n", 80 | "\\begin{align*} \\\\ & H \\left( D | A \\right) = \\sum_{i=1}^{n} \\dfrac{\\left| D_{i} \\right|}{\\left| D \\right|} H \\left( D_{i} \\right)\n", 81 | "\\\\ & = \\sum_{i=1}^{n} \\dfrac{\\left| D_{i} \\right|}{\\left| D \\right|} \\sum_{k=1}^{K} \\dfrac{\\left| D_{ik} \\right|}{\\left| D_{i} \\right|} \\log_{2} \\dfrac{\\left| D_{ik} \\right|}{\\left| D_{i} \\right|}\\end{align*}\n", 82 | "3. 计算信息增益\n", 83 | "\\begin{align*} \\\\ & g \\left( D, A \\right) = H \\left( D \\right) - H \\left( D | A \\right) \\end{align*}" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "特征$A$对训练集$D$的信息增益比\n", 91 | "\\begin{align*} \\\\ & g_{R} \\left( D, A \\right) = \\dfrac{g \\left( D, A \\right)}{H_{A} \\left(D\\right)}\\end{align*} \n", 92 | "即,信息增益$g\\left( D, A \\right)$与训练数据集$D$关于特征$A$的经验熵$H_{A}\\left(D\\right)$之比。 \n", 93 | "其中,\n", 94 | "\\begin{align*} \\\\ & H_{A} \\left( D \\right) = -\\sum_{i=1}^{n} \\dfrac{\\left|D_{i}\\right|}{\\left|D\\right|}\\log_{2}\\dfrac{\\left|D_{i}\\right|}{\\left|D\\right|}\\end{align*} " 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": { 100 | "collapsed": true 101 | }, 102 | "source": [ 103 | "ID3算法: \n", 104 | "输入:训练数据集$D$,特征$A$,阈值$\\varepsilon$ \n", 105 | "输出:决策树$T$\n", 106 | "1. 若$D$中所有实例属于同一类$C_{k}$,则$T$为单结点树,并将类$C_{k}$作为该结点的类标记,返回$T$; \n", 107 | "2. 若$A = \\emptyset$,则$T$为单结点树,并将$D$中实例数最大的类$C_{k}$作为该结点的类标记,返回$T$;\n", 108 | "3. 否则,计算$A$中各特征$D$的信息增益,选择信息增益最大的特征$A_{g}$\n", 109 | "\\begin{align*} \\\\ & A_{g} = \\arg \\max_{A} g \\left( D, A \\right) \\end{align*} \n", 110 | "4. 如果$A_{g}$的信息增益小于阈值$\\varepsilon$,则置$T$为单结点树,并将$D$中实例数量最大的类$C_{k}$作为该结点的类标记,返回$T$;\n", 111 | "5. 否则,对$A_{g}$的每一个可能值$a_{i}$,依$A_{g}=a_{i}$将$D$分割为若干非空子集$D_{i}$,将$D_{i}$中实例数对大的类作为标记,构建子结点,由结点及其子结点构成树$T$,返回$T$;\n", 112 | "6. 对第$i$个子结点,以$D_{i}$为训练集,以$A-\\left\\{A_{g}\\right\\}$为特征集,递归地调用步1.~步5.,得到子树$T_{i}$,返回$T_{i}$。" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": { 118 | "collapsed": true 119 | }, 120 | "source": [ 121 | "C4.5算法: \n", 122 | "输入:训练数据集$D$,特征$A$,阈值$\\varepsilon$ \n", 123 | "输出:决策树$T$\n", 124 | "1. 若$D$中所有实例属于同一类$C_{k}$,则$T$为单结点树,并将类$C_{k}$作为该结点的类标记,返回$T$; \n", 125 | "2. 若$A = \\emptyset$,则$T$为单结点树,并将$D$中实例数最大的类$C_{k}$作为该结点的类标记,返回$T$;\n", 126 | "3. 否则,计算$A$中各特征$D$的信息增益,选择信息增益比最大的特征$A_{g}$\n", 127 | "\\begin{align*} \\\\ & A_{g} = \\arg \\max_{A} g_{R} \\left( D, A \\right) \\end{align*} \n", 128 | "4. 如果$A_{g}$的信息增益小于阈值$\\varepsilon$,则置$T$为单结点树,并将$D$中实例数量最大的类$C_{k}$作为该结点的类标记,返回$T$;\n", 129 | "5. 否则,对$A_{g}$的每一个可能值$a_{i}$,依$A_{g}=a_{i}$将$D$分割为若干非空子集$D_{i}$,将$D_{i}$中实例数对大的类作为标记,构建子结点,由结点及其子结点构成树$T$,返回$T$;\n", 130 | "6. 对第$i$个子结点,以$D_{i}$为训练集,以$A-\\left\\{A_{g}\\right\\}$为特征集,递归地调用步1.~步5.,得到子树$T_{i}$,返回$T_{i}$。" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "决策树的剪枝通过极小化决策树整体的损失函数或代价函数来实现。 \n", 138 | "设树$T$的叶结点个数为$\\left| T \\right|$,$t$是树$T$的叶结点,该叶结点有$N_{t}$个样本点,其中$k$类的样本点有$N_{tk}$个,$k=1,2,\\cdots,K$,$H_{t}\\left(T\\right)$为叶结点$t$上的经验熵, \n", 139 | "则决策树的损失函数\n", 140 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( T \\right) = \\sum_{t=1}^{\\left| T \\right|} N_{t} H_{t} \\left( T \\right) + \\alpha \\left| T \\right| \\end{align*} \n", 141 | "其中,$\\alpha \\geq 0$为参数,经验熵\n", 142 | "\\begin{align*} \\\\ & H_{t} \\left( T \\right) = - \\sum_{k} \\dfrac{N_{tk}}{N_{t}} \\log \\dfrac{N_{tk}}{N_{t}} \\end{align*} " 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "损失函数中,记\n", 150 | "\\begin{align*} \\\\ & C \\left( T \\right) = \\sum_{t=1}^{\\left| T \\right|} N_{t} H_{t} \\left( T \\right) = - \\sum_{t=1}^{\\left| T \\right|} \\sum_{k=1}^{K} N_{tk} \\log \\dfrac{N_{tk}}{N_{t}} \\end{align*} \n", 151 | "则\n", 152 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( T \\right) = C \\left( T \\right) + \\alpha \\left| T \\right| \\end{align*} \n", 153 | "其中,$C \\left( T \\right)$表示模型对训练数据的预测误差,即模型与训练数据的拟合程度,$\\left| T \\right|$表示模型复杂度,参数$\\alpha \\geq 0$控制两者之间的影响。" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "树的剪枝算法: \n", 161 | "输入:决策树$T$,参数$\\alpha$ \n", 162 | "输出:修剪后的子树$T_{\\alpha}$\n", 163 | "1. 计算每个结点的经验熵 \n", 164 | "2. 递归地从树的叶结点向上回缩 \n", 165 | "设一组叶结点回缩到其父结点之前与之后的整体树分别为$T_{B}$与$T_{A}$,其对应的损失函数值分别是$C_{\\alpha} \\left( T_{B} \\right)$与$C_{\\alpha} \\left( T_{A} \\right)$,如果\n", 166 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( T_{A} \\right) \\leq C_{\\alpha} \\left( T_{B} \\right) \\end{align*}\n", 167 | "则进行剪枝,即将父结点变为新的叶结点。\n", 168 | "3. 返回2.,直到不能继续为止,得到损失函数最小的子树$T_{\\alpha}$" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "collapsed": true 175 | }, 176 | "source": [ 177 | "假设$X$与$Y$分别为输入和输出变量,并且$Y$是连续变量,给定训练数据集\n", 178 | "\\begin{align*} \\\\ & D = \\left\\{ \\left(x_{1},y_{1}\\right), \\left(x_{2},y_{2}\\right),\\cdots,\\left(x_{N},y_{N}\\right) \\right\\} \\end{align*} \n", 179 | "可选择第$j$个变量$x_{j}$及其取值$s$作为切分变量和切分点,并定义两个区域\n", 180 | "\\begin{align*} \\\\ & R_{1} \\left( j,s \\right) = \\left\\{ x | x_{j} \\leq s \\right\\}, \\quad R_{2} \\left( j,s \\right) = \\left\\{ x | x_{j} > s \\right\\} \\end{align*}\n", 181 | "最优切分变量$x_{j}$及最优切分点$s$\n", 182 | "\\begin{align*} \\\\ & j,s = \\arg \\min_{j,s} \\left[ \\min_{c_{1}} \\sum_{x_{i} \\in R_{1} \\left(j,s\\right)} \\left( y_{i} - c_{1} \\right)^{2} + \\min_{c_{2}} \\sum_{x_{i} \\in R_{2} \\left(j,s\\right)} \\left( y_{i} - c_{2} \\right)^{2}\\right] \\end{align*} \n", 183 | "其中,$c_{m}$是区域$R_{m}$上的回归决策树输出,是区域$R_{m}$上所有输入实例$x_{i}$对应的输出$y_{i}$的均值\n", 184 | "\\begin{align*} \\\\ & c_{m} = ave \\left( y_{i} | x_{i} \\in R_{m} \\right), \\quad m=1,2 \\end{align*} \n", 185 | "对每个区域$R_{1}$和$R_{2}$重复上述过程,将输入空间划分为$M$个区域$R_{1},R_{2},\\cdots,R_{M}$,在每个区域上的输出为$c_{m},m=1,2,\\cdots,M$,最小二乘回归树\n", 186 | "\\begin{align*} \\\\ & f \\left( x \\right) = \\sum_{m=1}^{M} c_{m} I \\left( x \\in R_{m} \\right) \\end{align*} " 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "最小二乘回归树生成算法: \n", 194 | "输入:训练数据集$D$ \n", 195 | "输出:回归树$f \\left( x \\right)$\n", 196 | "1. 选择最优切分变量$x_{j}$与切分点$s$\n", 197 | "\\begin{align*} \\\\ & j,s = \\arg \\min_{j,s} \\left[ \\min_{c_{1}} \\sum_{x_{i} \\in R_{1} \\left(j,s\\right)} \\left( y_{i} - c_{1} \\right)^{2} + \\min_{c_{2}} \\sum_{x_{i} \\in R_{2} \\left(j,s\\right)} \\left( y_{i} - c_{2} \\right)^{2}\\right] \\end{align*} \n", 198 | "2. 用最优切分变量$x_{j}$与切分点$s$划分区域并决定相应的输出值 \n", 199 | "\\begin{align*} \\\\ & R_{1} \\left( j,s \\right) = \\left\\{ x | x_{j} \\leq s \\right\\}, \\quad R_{2} \\left( j,s \\right) = \\left\\{ x | x_{j} > s \\right\\} \n", 200 | "\\\\ & c_{m} = \\dfrac{1}{N} \\sum_{x_{i} \\in R_{m} \\left( j,s \\right)} y_{i}, \\quad m=1,2\\end{align*}\n", 201 | "3. 继续对两个子区域调用步骤1.和2.,直到满足停止条件\n", 202 | "4. 将输入空间划分为$M$个区域$R_{1},R_{2},\\cdots,R_{M}$,生成决策树\n", 203 | "\\begin{align*} \\\\ & f \\left( x \\right) = \\sum_{m=1}^{M} c_{m} I \\left( x \\in R_{m} \\right) \\end{align*} " 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "分类问题中,假设有$K$个类,样本点属于第$k$类的概率为$p_{k}$,则概率分布的基尼指数\n", 211 | "\\begin{align*} \\\\ & Gini \\left( p \\right) = \\sum_{k=1}^{K} p_{k} \\left( 1 - p_{k} \\right) = 1 - \\sum_{k=1}^{K} \\end{align*} " 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "对于二分类问题,若样本点属于第1类的概率为$p$,则概率分布的基尼指数\n", 219 | "\\begin{align*} \\\\ & Gini \\left( p \\right) = \\sum_{k=1}^{2} p_{k} \\left( 1 - p_{k} \\right) = 2p\\left(1-p\\right) \\end{align*} " 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "对于给定样本集和$D$,其基尼指数\n", 227 | "\\begin{align*} \\\\ & Gini \\left( D \\right) = 1 - \\sum_{k=1}^{K} \\left( \\dfrac{\\left| C_{k} \\right|}{\\left| D \\right|} \\right)^{2}\\end{align*} \n", 228 | "其中,$C_{k}$是$D$中属于第$k$类的样本自己,$K$是类别个数。" 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "如果样本集合$D$根据特征$A$是否取某一可能值$a$被分割成$D_{1}$和$D_{2}$两个部分,即\n", 236 | "\\begin{align*} \\\\ & D_{1} = \\left\\{ \\left(x,y\\right) | A\\left(x\\right)=a \\right\\}, \\quad D_{2} = D - D_{1} \\end{align*} \n", 237 | "则在特征$A$的条件下,集合$D$的基尼指数\n", 238 | "\\begin{align*} \\\\ & Gini \\left( D, A \\right) = \\dfrac{\\left| D_{1} \\right|}{\\left| D \\right|} Gini \\left( D_{1} \\right) + \\dfrac{\\left| D_{2} \\right|}{\\left| D \\right|} Gini \\left( D_{2} \\right)\\end{align*} " 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "基尼指数$Gini \\left( D \\right)$表示集合$D$的不确定性,基尼指数$Gini \\left( D,A \\right)$表示经$A=a$分割后集合$D$的不确定性。基尼指数值越大,样本集合的不确定性也越大。" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "CART生成算法: \n", 253 | "输入:训练数据集$D$,特征$A$,阈值$\\varepsilon$ \n", 254 | "输出:CART决策树$T$\n", 255 | "1. 设结点的训练数据集为$D$,对每一个特征$A$,对其可能取的每个值$a$,根据样本点对$A=a$的测试为“是”或“否”将$D$分割成$D_{1}$和$D_{2}$两部分,并计算$Gini\\left(D,A\\right)$\n", 256 | "2. 在所有可能的特征$A$以及其所有可能的切分点$a$中,选择基尼指数最小的特征及其对应的切分点作为最优特征与最优切分点。依此从现结点生成两个子结点,将训练数据集依特征分配到两个子结点中去。\n", 257 | "3. 对两个子结点递归地调用1.和2.,直至满足停止条件\n", 258 | "4. 生成CART决策树$T$" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": { 264 | "collapsed": true 265 | }, 266 | "source": [ 267 | "对整体树$T_{0}$任意内部结点$t$,以$t$为单结点树的损失函数\n", 268 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( t \\right) = C \\left( t \\right) + \\alpha \\end{align*} \n", 269 | "以$t$为根结点的子树$T_{t}$的损失函数\n", 270 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( T_{t} \\right) = C \\left( T_{t} \\right) + \\alpha \\left| T_{t} \\right| \\end{align*} \n", 271 | "当$\\alpha = 0$及$\\alpha$充分小时,有不等式\n", 272 | "\\begin{align*} \\\\ & C_{\\alpha} \\left( T_{t} \\right) < C_{\\alpha} \\left( t \\right) \\end{align*} \n", 273 | "当$\\alpha$增大时,在某一$\\alpha$有\n", 274 | "\\begin{align*} \\\\ & \\quad\\quad C_{\\alpha} \\left( T_{t} \\right) = C_{\\alpha} \\left( t \\right) \n", 275 | "\\\\ & C \\left( T_{t} \\right) + \\alpha \\left| T_{t} \\right| = C \\left( t \\right) + \\alpha\n", 276 | "\\\\ & \\quad\\quad \\alpha = \\dfrac{C\\left( t \\right) - C \\left(T_{t}\\right)} { \\left| T_{t} \\right| -1 }\\end{align*} \n", 277 | "即$T_{t}$与$t$有相同的损失函数值,而$t$的结点少,因此对$T_{t}$进行剪枝。" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "CART剪枝算法: \n", 285 | "输入:CART决策树$T_{0}$ \n", 286 | "输出:最优决策树$T_{\\alpha}$\n", 287 | "1. 设$k=0, T=T_{0}$\n", 288 | "2. 设$\\alpha=+\\infty$\n", 289 | "3. 自下而上地对各内部结点$t$计算$ C\\left(T_{t}\\right),\\left| T_{t} \\right|$,以及\n", 290 | "\\begin{align*} \\\\ & g\\left(t\\right) = \\dfrac{C\\left( t \\right) - C \\left(T_{t}\\right)} { \\left| T_{t} \\right| -1 }\n", 291 | "\\\\ & \\alpha = \\min \\left( \\alpha, g\\left( t \\right) \\right) \\end{align*} \n", 292 | "其中,$T_{t}$表示以$t$为根结点的子树,$ C\\left(T_{t}\\right)$是对训练数据的预测误差,$\\left| T_{t} \\right|$是$T_{t}$的叶结点个数。\n", 293 | "4. 自下而上地访问内部结点$t$,如果有$g\\left(t\\right)=\\alpha$,则进行剪枝,并对叶结点$t$以多数表决法决定其类别,得到树$T$\n", 294 | "5. 设$k=k+1, \\alpha_{k}=\\alpha, T_{k}=T$\n", 295 | "6. 如果$T$不是由根结点单独构成的树,则回到步骤4.\n", 296 | "7. 采用交叉验证法在子树序列$T_{0},T_{1},\\cdots,T_{n}$中选取最优子树$T_{\\alpha}$" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": { 303 | "collapsed": true 304 | }, 305 | "outputs": [], 306 | "source": [] 307 | } 308 | ], 309 | "metadata": { 310 | "kernelspec": { 311 | "display_name": "Python 2", 312 | "language": "python", 313 | "name": "python2" 314 | }, 315 | "language_info": { 316 | "codemirror_mode": { 317 | "name": "ipython", 318 | "version": 2 319 | }, 320 | "file_extension": ".py", 321 | "mimetype": "text/x-python", 322 | "name": "python", 323 | "nbconvert_exporter": "python", 324 | "pygments_lexer": "ipython2", 325 | "version": "2.7.11" 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 0 330 | } 331 | -------------------------------------------------------------------------------- /6.2 me.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "训练数据集\n", 8 | "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", 9 | "假设分类模型是条件概率分布$P \\left( Y | X \\right), X \\in \\mathcal{X} \\subseteq R^{n}$表示输入,$Y \\in \\mathcal{Y}$表示输出。给定输入$X$,以条件概率$P \\left( Y | X \\right)$输出$Y$。" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "特征函数$f \\left( x, y \\right)$描述输入$x$和输出$y$之间的某一事实,\n", 17 | "\\begin{align*} f \\left( x, y \\right) = \\left\\{\n", 18 | "\\begin{aligned} \n", 19 | "\\ & 1, x与y满足某一事实\n", 20 | "\\\\ & 0, 否则\n", 21 | "\\end{aligned}\n", 22 | "\\right.\\end{align*} " 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "特征函数$f \\left( x, y \\right)$关于经验分布$ \\tilde{P} \\left( X, Y \\right) $的期望\n", 30 | "\\begin{align*} \\\\& E_{ \\tilde{P} } \\left( f \\right) = \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f \\left( x, y \\right) \\end{align*} " 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "特征函数$f \\left( x, y \\right)$关于模型$ P \\left( Y | X \\right) $与经验分布$ \\tilde{P} \\left( X \\right) $的期望\n", 38 | "\\begin{align*} \\\\& E_{ P } \\left( f \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) f \\left( x, y \\right) \\end{align*} " 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "最大熵模型:假设满足所有约束条件的模型集合为\n", 46 | "\\begin{align*} \\\\& \\mathcal{C} \\equiv \\left\\{ P \\in \\mathcal{P} | E_{ P } \\left( f_{i} \\right) = E_{ \\tilde{P} } \\left( f_{i} \\right), i = 1,2, \\cdots, n \\right\\}\\end{align*} \n", 47 | "定义在条件概率分布$P \\left( Y | X \\right)$上的条件熵为 \n", 48 | "\\begin{align*} \\\\& H \\left( P \\right) = - \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \\end{align*} \n", 49 | "则模型集合$\\mathcal{C}$中条件熵$H \\left( P \\right)$最大的模型称为最大熵模型。" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "给定训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$以及特征函数$f_{i} \\left( x, y \\right), i = 1, 2, \\cdots, n$,最大熵模型的学习等价于最优化问题: \n", 57 | "\\begin{align*} \\\\& \\max_{P \\in \\mathcal{C} } \\quad H \\left( P \\right) = - \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \n", 58 | "\\\\ & s.t.\\quad E_{ P } \\left( f_{i} \\right) = E_{ \\tilde{P} } \\left( f_{i} \\right), i = 1,2, \\cdots, n \n", 59 | "\\\\ & \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} \n", 60 | "等价的 \n", 61 | "\\begin{align*} \\\\& \\min_{P \\in \\mathcal{C} } \\quad -H \\left( P \\right) = \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \n", 62 | "\\\\ & s.t.\\quad E_{ P } \\left( f_{i} \\right) - E_{ \\tilde{P} } \\left( f_{i} \\right) = 0, i = 1,2, \\cdots, n \n", 63 | "\\\\ & \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} " 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": { 69 | "collapsed": true 70 | }, 71 | "source": [ 72 | "最优化问题的求解: \n", 73 | "1. 引入拉格朗日乘子$w_{i}, i = 0,1, \\cdots, n$,定义拉格朗日函数$L \\left( P, w \\right)$\n", 74 | "\\begin{align*} \\\\& L \\left( P, w \\right) = - H \\left( P \\right) + w_{0} \\left( 1 - \\sum_{y} P \\left( y | x \\right) \\right) + \\sum_{i=1}^{n} w_{i} \\left( E_{P} \\left( f_{i} \\right) - E_{\\tilde{P}} \\left( f_{i} \\right) \\right) \n", 75 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) + w_{0} \\left( 1 - \\sum_{y} P \\left( y | x \\right) \\right) \n", 76 | "\\\\ & \\quad + \\sum_{i=1}^{n} w_{i} \\left( \\sum_{x, y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) f_{i} \\left( x, y \\right) - \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", 77 | "2. 求$\\min_{P \\in \\mathcal{C} } L \\left( P, w \\right)$: \n", 78 | "记对偶函数$\\Psi \\left( w \\right) = min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = L \\left( P_{w}, w \\right)$,其解记$P_{w} = \\arg \\min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = P_{w} \\left( y | x \\right)$\n", 79 | "\\begin{align*} \\\\& \\dfrac {\\partial L \\left( P, w \\right)} {\\partial P \\left( y | x \\right)} = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 \\right) - \\sum_{y} w_{0} - \\sum_{x,y} \\left( \\tilde{P} \\left( x \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \n", 80 | "\\\\ & \\quad = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 \\right) - \\sum_{x,y} P \\left( x \\right) w_{0} - \\sum_{x,y} \\left( \\tilde{P} \\left( x \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \n", 81 | "\\\\ & \\quad = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 - w_{0} - \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) = 0\\end{align*} \n", 82 | "由于$\\tilde{P} \\left( x \\right) > 0 $,得\n", 83 | "\\begin{align*} \\\\ & \\log P \\left( y | x \\right) + 1 - w_{0} - \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right)=0\n", 84 | "\\\\ & P \\left( y | x \\right) = \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) + w_{0} -1 \\right) = \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) }{ \\exp \\left( 1 - w_{0} \\right)}\\end{align*} \n", 85 | "由于 \n", 86 | "\\begin{align*} \\\\& \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} \n", 87 | "则\n", 88 | "\\begin{align*} \\\\ & \\sum_{y} P \\left( y | x \\right) = \\sum_{y} \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) }{ \\exp \\left( 1 - w_{0} \\right)} = 1 \n", 89 | "\\\\ & \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) = \\exp \\left( 1 - w_{0} \\right)\\end{align*} \n", 90 | "代入,得\n", 91 | "\\begin{align*} \\\\ & P \\left( y | x \\right) = \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", 92 | "其中\n", 93 | "\\begin{align*} Z_{w} = \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", 94 | "$Z_{w}$称为规范化因子;$f_{i} \\left( x, y \\right)$是特征函数;$w_{i}$是特征的权值。\n", 95 | "3. 求$\\max_{w} \\Psi \\left( w \\right)$ \n", 96 | "将其解记为$w^{*}$,即\n", 97 | "\\begin{align*} w^{*} = \\arg \\max_{w} \\Psi \\left( w \\right) \\end{align*} " 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "已知训练数据的经验概率分布$\\tilde{P} \\left( X, Y \\right)$,则条件概率分布$P \\left( X | Y \\right)$的对数似然函数\n", 105 | "\\begin{align*} \\\\ & L_{\\tilde{P}} \\left( P_{w} \\right) = \\log \\prod_{x,y} P \\left( y | x \\right)^{\\tilde{P} \\left( x, y \\right)} \n", 106 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P \\left( y | x \\right)\n", 107 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log \\dfrac{\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)}{Z_{w} \\left( x \\right) }\n", 108 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log Z_{w} \\left( x \\right) \n", 109 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right)\\end{align*} \n", 110 | "对偶函数\n", 111 | "\\begin{align*} \\\\ & \\Psi \\left( w \\right) = min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = L \\left( P_{w}, w \\right) \\\\ & = - H \\left( P_{w} \\right) + w_{0} \\left( 1 - \\sum_{y} P_{w} \\left( y | x \\right) \\right) + \\sum_{i=1}^{n} w_{i} \\left( E_{\\tilde{P}} \\left( f_{i} \\right) - E_{P_{w}} \\left( f_{i} \\right) \\right) \n", 112 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) \\log P_{w} \\left( y | x \\right)\n", 113 | "\\\\& \\quad\\quad\\quad + w_{0} \\left( 1 - \\sum_{y} \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\right)\n", 114 | "\\\\ & \\quad\\quad\\quad + \\sum_{i=1}^{n} w_{i} \\left( \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) - \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\right) \n", 115 | "\\\\ & = \\sum_{x, y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) + \\sum_{x,y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) \\left( \\log P_{w} \\left( y | x \\right) - \\sum_{i=1}^{n} w_{i} f_{i} \\left(x, y \\right) \\right) \n", 116 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log Z_{w} \\left( x \\right) \n", 117 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right)\\end{align*} \n", 118 | "得\n", 119 | "\\begin{align*} \\\\ & L_{\\tilde{P}} \\left( P_{w} \\right) = \\Psi \\left( w \\right)\\end{align*} \n", 120 | "即,最大熵模型的极大似然估计等价于对偶函数极大化。" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": { 126 | "collapsed": true 127 | }, 128 | "source": [ 129 | "已知最大熵模型\n", 130 | "\\begin{align*} \\\\ & P_{w} \\left( y | x \\right) = \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", 131 | "其中\n", 132 | "\\begin{align*} Z_{w} = \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", 133 | "$Z_{w}$称为规范化因子;$f_{i} \\left( x, y \\right)$是特征函数;$w_{i}$是特征的权值。" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "对数似然函数\n", 141 | "\\begin{align*} \\\\ & L \\left( w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w} \\left( y | x \\right)\n", 142 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right) \\end{align*} " 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "对于给定的经验分布$\\tilde{P}$,模型参数从$w$到$w + \\delta$,对数似然函数的改变量\n", 150 | "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w + \\delta} \\left( y | x \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w} \\left( y | x \\right)\n", 151 | "\\\\ & = \\left( \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\left( w_{i} + \\delta_{i} \\right) f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w + \\delta} \\left( x \\right) \\right) \n", 152 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right) \\right)\n", 153 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log \\dfrac{Z_{w + \\delta} \\left( x \\right)}{Z_{w} \\left( x \\right)}\\end{align*} " 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "由\n", 161 | "\\begin{align*} - \\log \\alpha \\geq 1 - \\alpha, \\alpha > 0 \\end{align*} \n", 162 | "得\n", 163 | "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\dfrac{Z_{w + \\delta} \\left( x \\right)}{Z_{w} \\left( x \\right)}\n", 164 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\dfrac{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} \\left( w_{i} + \\delta_{i} \\right) f_{i} \\left( x, y \\right) \\right)}{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)} \n", 165 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)}{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)} \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right) \n", 166 | "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right)\\end{align*} " 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "记\n", 174 | "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right)\\end{align*} \n", 175 | "则 \n", 176 | "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq A \\left( \\delta | w \\right)\\end{align*} \n", 177 | "即$ A \\left( \\delta | w \\right)$是对数似然函数改变量的一个下届。" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": { 183 | "collapsed": true 184 | }, 185 | "source": [ 186 | "引入\\begin{align*} \\\\ & f^{\\#} \\left( x, y \\right) = \\sum_{i} f_{i} \\left( x, y \\right) \\end{align*} \n", 187 | "表示所有特征在$\\left( x, y \\right)$出现的次数 \n", 188 | "则\n", 189 | "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( f^{\\#} \\left( x, y \\right) \\sum_{i=1}^{n} \\dfrac{\\delta_{i} f_{i} \\left( x, y \\right) }{f^{\\#} \\left( x, y \\right) } \\right)\\end{align*} " 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "对任意$i$,有$\\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\geq 0$且$\\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} = 1$, \n", 197 | "根据Jensen不等式,得 \n", 198 | "\\begin{align*} \\\\ & \\exp \\left( \\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\delta_{i} f_{\\#} \\left( x, y \\right) ) \\right) \\leq \\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", 199 | "则\n", 200 | "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) \\geq \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\sum_{i=1}^{n} \\left( \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} " 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "记\n", 208 | "\\begin{align*} \\\\ & B \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\sum_{i=1}^{n} \\left( \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", 209 | "则 \n", 210 | "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq A \\left( \\delta | w \\right) \\geq B \\left( \\delta | w \\right)\\end{align*} \n", 211 | "即$ B \\left( \\delta | w \\right)$是对数似然函数改变量的一个新的(相对不紧的)下届。" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "求\n", 219 | "\\begin{align*} \\\\ & \\dfrac {\\partial B \\left( \\delta | w \\right) }{\\partial \\delta_{i}} = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", 220 | "令$ \\dfrac {\\partial B \\left( \\delta | w \\right) }{\\partial \\delta_{i}} = 0 $ \n", 221 | "得 \n", 222 | "\\begin{align*} \\\\ & \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", 223 | "对$\\delta_{i}$求解可解得$\\delta$" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "改进的迭代尺度算法(IIS): \n", 231 | "输入:特征函数$f_{i},i=1, 2, \\cdots, n$,经验分布$\\tilde{P} \\left( x, y \\right)$,模型$P_{w} \\left( y | x \\right)$ \n", 232 | "输出:最优参数值$w_{i}^{*}$;最优模型$P_{w^{*}}$ \n", 233 | "1. 对所有$i \\in \\left\\{ 1, 2, \\cdots, n \\right\\}$,取$w_{i} = 0$;\n", 234 | "2. 对每一$i \\in \\left\\{ 1, 2, \\cdots, n \\right\\}$ \n", 235 | "2.1 令$\\delta_{i}$是方程\n", 236 | "\\begin{align*} \\\\ & \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right) \\end{align*} \n", 237 | "的解 \n", 238 | "2.2 更新$w_{i}$的值\n", 239 | "\\begin{align*} \\\\ & w_{i} \\leftarrow w_{i} + \\delta_{i}\\end{align*} \n", 240 | "3. 如果不是所有$w_{i}$都收敛,重复步骤2." 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": { 247 | "collapsed": true 248 | }, 249 | "outputs": [], 250 | "source": [] 251 | } 252 | ], 253 | "metadata": { 254 | "kernelspec": { 255 | "display_name": "Python 2", 256 | "language": "python", 257 | "name": "python2" 258 | }, 259 | "language_info": { 260 | "codemirror_mode": { 261 | "name": "ipython", 262 | "version": 2 263 | }, 264 | "file_extension": ".py", 265 | "mimetype": "text/x-python", 266 | "name": "python", 267 | "nbconvert_exporter": "python", 268 | "pygments_lexer": "ipython2", 269 | "version": "2.7.11" 270 | } 271 | }, 272 | "nbformat": 4, 273 | "nbformat_minor": 0 274 | } 275 | -------------------------------------------------------------------------------- /10 hmm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "状态集合\\begin{align*} & Q=\\left\\{q_{1},q_{2},\\ldots ,q_{N}\\right\\} \\quad \\left| Q\\right| =N \\end{align*} " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "观测集合\\begin{align*} & V=\\left\\{v_{1},v_{2},\\ldots ,v_{M}\\right\\} \\quad \\left| V\\right| =M \\end{align*} " 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "状态序列\\begin{align*} & I=\\left\\{i_{1},i_{2},\\ldots ,i_{t},\\ldots,i_{T}\\right\\} \\quad i_{t}\\in Q \\quad \\left(t=1,2,\\ldots,T \\right)\\end{align*} " 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "观测序列\\begin{align*} & O=\\left\\{o_{1},o_{2},\\ldots ,o_{t},\\ldots,o_{T}\\right\\} \\quad o_{t}\\in V \\quad \\left(t=1,2,\\ldots,T \\right)\\end{align*} " 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "状态转移矩阵 \\begin{align*} & A=\\left[a_{ij}\\right]_{N\\times N} \\end{align*}" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "在$t$时刻处于状态$q_{i}$的条件下,在$t+1$时刻转移到状态$q_{j}$的概率\\begin{align*} & a_{ij}= P\\left( i_{t+1}=q_{j}|i_{t}=q_{i}\\right) \\quad \\left(i=1,2,\\ldots,N \\right) \\quad \\left(j=1,2,\\ldots,M \\right)\\end{align*}" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "观测概率矩阵\\begin{align*} & B=\\left[b_{j}\\left(k\\right)\\right]_{N\\times M} \\end{align*}" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "在$t$时刻处于状态$q_{i}$的条件下,生成观测$v_{k}$的概率\\begin{align*} & b_{j}\\left(k\\right)= P\\left( o_{t}=v_{k}|i_{t}=q_{j}\\right) \\quad \\left(k=1,2,\\ldots,M \\right) \\quad \\left(j=1,2,\\ldots,N \\right)\\end{align*}" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "初始概率向量\\begin{align*} & \\pi =\\left( \\pi _{i}\\right) \\end{align*}" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "在时刻$t=1$处于状态$q_{i}$的概率\\begin{align*} & \\pi_{i} =P\\left( i_{1}=q_{i}\\right) \\quad \\left(i=1,2,\\ldots,N \\right) \\end{align*}" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "隐马尔科夫模型\\begin{align*} & \\lambda =\\left( A,B.\\pi \\right) \\end{align*}" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "隐马尔科夫模型基本假设:\n", 85 | "1. 齐次马尔科夫性假设:在任意时刻$t$的状态只依赖于时刻$t-1$的状态。\\begin{align*} & P\\left( i_{t}|i_{t-1},o_{t-1},\\ldots,i_{1},o_{1}\\right)=P\\left(i_{t}|i_{t-1}\\right) \\quad \\left(t=1,2,\\ldots,T\\right) \\end{align*}\n", 86 | "2. 观测独立性假设:任意时刻$t$的观测只依赖于时刻$t$的状态。\\begin{align*} & P\\left( o_{t}|i_{T},o_{T},i_{T-1},o_{T-1},\\ldots,i_{t+1},o_{t+1},i_{t},i_{t-1},o_{t-1},\\ldots,i_{1},o_{1}\\right)=P\\left(o_{t}|i_{t}\\right) \\quad \\left(t=1,2,\\ldots,T\\right) \\end{align*}" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "观测序列生成算法: \n", 94 | "输入:隐马尔科夫模型$\\lambda =\\left( A,B.\\pi \\right)$,观测序列长度$T$; \n", 95 | "输出:观测序列$O=\\left\\{o_{1},o_{2},\\ldots ,o_{t},\\ldots,o_{T}\\right\\}$;\n", 96 | "1. 由初始概率向量$\\pi$产生状态$i_{1}$;\n", 97 | "2. $t=1$;\n", 98 | "3. 由状态$i_{t}$的观测概率分布$b_{j}\\left(k\\right)$生成$o_{t}$;\n", 99 | "4. 由状态$i_{t}$的状态转移概率分布$a_{i_{t}i_{t+1}}$生成状态$i_{t+1} \\quad \\left(i_{t+1}=1,2,\\ldots,N\\right)$; \n", 100 | "5. $t=t+1$;如果$t 0$,计算\n", 158 | "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} \\left( x_{i} \\cdot x_{j} \\right) \\end{align*} \n", 159 | "3. 得到分离超平面\n", 160 | "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", 161 | "以及分类决策函数 \n", 162 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} " 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": { 168 | "collapsed": true 169 | }, 170 | "source": [ 171 | "线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题 \n", 172 | "\\begin{align*} \\\\ & \\min_{w,b,\\xi} \\quad \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i}\n", 173 | "\\\\ & s.t. \\quad y_{i} \\left( w \\cdot x_{i} + b \\right) \\geq 1 - \\xi_{i}\n", 174 | "\\\\ & \\xi_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", 175 | "学习得到分离超平面为\n", 176 | "\\begin{align*} \\\\& w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", 177 | "以及相应的分类决策函数\n", 178 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} \n", 179 | "称为线型支持向量机。" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "最优化问题的求解: \n", 187 | "1. 引入拉格朗日乘子$\\alpha_{i} \\geq 0, \\mu_{i} \\geq 0, i = 1, 2, \\cdots, N$构建拉格朗日函数\n", 188 | "\\begin{align*} \\\\ & L \\left( w, b, \\xi, \\alpha, \\mu \\right) = \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i} + \\sum_{i=1}^{N} \\alpha_{i} \\left[- y_{i} \\left( w \\cdot x_{i} + b \\right) + 1 - \\xi_{i} \\right] + \\sum_{i=1}^{N} \\mu_{i} \\left( -\\xi_{i} \\right)\n", 189 | "\\\\ & = \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i} - \\sum_{i=1}^{N} \\alpha_{i} \\left[ y_{i} \\left( w \\cdot x_{i} + b \\right) -1 + \\xi_{i} \\right] - \\sum_{i=1}^{N} \\mu_{i} \\xi_{i} \\end{align*} \n", 190 | "其中,$\\alpha = \\left( \\alpha_{1}, \\alpha_{2}, \\cdots, \\alpha_{N} \\right)^{T}$以及$\\mu = \\left( \\mu_{1}, \\mu_{2}, \\cdots, \\mu_{N} \\right)^{T}$为拉格朗日乘子向量。 \n", 191 | "2. 求$\\min_{w,b}L \\left( w, b, \\xi, \\alpha, \\mu \\right)$:\n", 192 | "\\begin{align*} \\\\ & \\nabla_{w} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = w - \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} = 0 \n", 193 | "\\\\ & \\nabla_{b} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = -\\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \n", 194 | "\\\\ & \\nabla_{\\xi_{i}} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = C - \\alpha_{i} - \\mu_{i} = 0 \\end{align*} \n", 195 | "得 \n", 196 | "\\begin{align*} \\\\ & w = \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} \n", 197 | "\\\\ & \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \n", 198 | "\\\\ & C - \\alpha_{i} - \\mu_{i} = 0\\end{align*} \n", 199 | "代入拉格朗日函数,得\n", 200 | "\\begin{align*} \\\\ & L \\left( w, b, \\xi, \\alpha, \\mu \\right) = \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + C \\sum_{i=1}^{N} \\xi_{i} - \\sum_{i=1}^{N} \\alpha_{i} y_{i} \\left[ \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} x_{j} \\right) \\cdot x_{i} + b \\right] \n", 201 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad + \\sum_{i=1}^{N} \\alpha_{i} - \\sum_{i=1}^{N} \\alpha_{i} \\xi_{i} - \\sum_{i}^{N} \\mu_{i} \\xi_{i}\n", 202 | "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} y_{i} b + \\sum_{i=1}^{N} \\alpha_{i} + \\sum_{i=1}^{N} \\xi_{i} \\left( C - \\alpha_{i} - \\mu_{i} \\right)\n", 203 | "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", 204 | "即\n", 205 | "\\begin{align*} \\\\ & \\min_{w,b,\\xi}L \\left( w, b, \\xi, \\alpha, \\mu \\right) = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", 206 | "3.求$\\max_{\\alpha} \\min_{w,b, \\xi}L \\left( w, b, \\xi, \\alpha, \\mu \\right)$:\n", 207 | "\\begin{align*} \\\\ & \\max_{\\alpha} - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \n", 208 | "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 209 | "\\\\ & C - \\alpha_{i} - \\mu_{i} = 0\n", 210 | "\\\\ & \\alpha_{i} \\geq 0\n", 211 | "\\\\ & \\mu_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", 212 | "等价的\n", 213 | "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", 214 | "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 215 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} " 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "线性支持向量机(软间隔支持向量机)学习算法: \n", 223 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", 224 | "输出:最大间隔分离超平面和分类决策函数 \n", 225 | "1. 选择惩罚参数$C \\geq 0$,构建并求解约束最优化问题\n", 226 | "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", 227 | "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 228 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} \n", 229 | "求得最优解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{1}^{*}, \\cdots, \\alpha_{N}^{*} \\right) $。 \n", 230 | "2. 计算\n", 231 | "\\begin{align*} \\\\ & w^{*} = \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} x_{i} \\end{align*} \n", 232 | "并选择$\\alpha^{*}$的一个分量$0 < \\alpha_{j}^{*} < C$,计算\n", 233 | "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} \\left( x_{i} \\cdot x_{j} \\right) \\end{align*} \n", 234 | "3. 得到分离超平面\n", 235 | "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", 236 | "以及分类决策函数 \n", 237 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} " 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "(软间隔)支持向量:线性不可分情况下,最优化问题的解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{2}^{*}, \\cdots, \\alpha_{N}^{*} \\right)^{T}$中对应于$\\alpha_{i}^{*} > 0$的样本点$\\left( x_{i}, y_{i} \\right)$的实例$x_{i}$。 " 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "实例$x_{i}$的几何间隔\n", 252 | "\\begin{align*} \\\\& \\gamma_{i} = \\dfrac{y_{i} \\left( w \\cdot x_{i} + b \\right)}{ \\| w \\|} = \\dfrac{| 1 - \\xi_{i} |}{\\| w \\|} \\end{align*} \n", 253 | "且$\\dfrac{1}{2} | H_{1}H_{2} | = \\dfrac{1}{\\| w \\|}$ \n", 254 | "则实例$x_{i}$到间隔边界的距离\n", 255 | "\\begin{align*} \\\\& \\left| \\gamma_{i} - \\dfrac{1}{\\| w \\|} \\right| = \\left| \\dfrac{| 1 - \\xi_{i} |}{\\| w \\|} - \\dfrac{1}{\\| w \\|} \\right| \n", 256 | "\\\\ & = \\dfrac{\\xi_{i}}{\\| w \\|}\\end{align*} \n", 257 | "\\begin{align*} \\xi_{i} \\geq 0 \\Leftrightarrow \\left\\{\n", 258 | "\\begin{aligned} \n", 259 | "\\ & \\xi_{i}=0, x_{i}在间隔边界上;\n", 260 | "\\\\ & 0 < \\xi_{i} < 1, x_{i}在间隔边界与分离超平面之间;\n", 261 | "\\\\ & \\xi_{i}=1, x_{i}在分离超平面上;\n", 262 | "\\\\ & \\xi_{i}>1, x_{i}在分离超平面误分类一侧;\n", 263 | "\\end{aligned}\n", 264 | "\\right.\\end{align*} " 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "线性支持向量机(软间隔)的合页损失函数\n", 272 | "\\begin{align*} \\\\& L \\left( y \\left( w \\cdot x + b \\right) \\right) = \\left[ 1 - y \\left(w \\cdot x + b \\right) \\right]_{+} \\end{align*} \n", 273 | "其中,“+”为取正函数\n", 274 | "\\begin{align*} \\left[ z \\right]_{+} = \\left\\{\n", 275 | "\\begin{aligned} \n", 276 | "\\ & z, z > 0\n", 277 | "\\\\ & 0, z \\leq 0\n", 278 | "\\end{aligned}\n", 279 | "\\right.\\end{align*} " 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "核函数 \n", 287 | "设$\\mathcal{X}$是输入空间(欧氏空间$R^{n}$的子集或离散集合),$\\mathcal{H}$是特征空间(希尔伯特空间),如果存在一个从$\\mathcal{X}$到$\\mathcal{H}$的映射\n", 288 | "\\begin{align*} \\\\& \\phi \\left( x \\right) : \\mathcal{X} \\to \\mathcal{H} \\end{align*} \n", 289 | "使得对所有$x,z \\in \\mathcal{X}$,函数$K \\left(x, z \\right)$满足条件 \n", 290 | "\\begin{align*} \\\\ & K \\left(x, z \\right) = \\phi \\left( x \\right) \\cdot \\phi \\left( z \\right) \\end{align*} \n", 291 | "则称$K \\left(x, z \\right)$为核函数,$\\phi \\left( x \\right)$为映射函数,式中$\\phi \\left( x \\right) \\cdot \\phi \\left( z \\right)$为$\\phi \\left( x \\right)$和$\\phi \\left( z \\right)$的内积。 " 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "常用核函数: \n", 299 | "1. 多项式核函数\n", 300 | "\\begin{align*} \\\\& K \\left( x, z \\right) = \\left( x \\cdot z + 1 \\right)^{p} \\end{align*} \n", 301 | "2. 高斯核函数 \n", 302 | "\\begin{align*} \\\\& K \\left( x, z \\right) = \\exp \\left( - \\dfrac{\\| x - z \\|^{2}}{2 \\sigma^{2}} \\right) \\end{align*} " 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "非线性支持向量机:从非线性分类训练集,通过核函数与软间隔最大化,学习得到分类决策函数 \n", 310 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left(x, x_{i} \\right) + b^{*} \\right) \\end{align*} \n", 311 | "称为非线性支持向量机,$K \\left( x, z \\right)$是正定核函数。" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "非线性支持向量机学习算法: \n", 319 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", 320 | "输出:分类决策函数 \n", 321 | "1. 选择适当的核函数$K \\left( x, z \\right)$和惩罚参数$C \\geq 0$,构建并求解约束最优化问题\n", 322 | "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} K \\left( x_{i}, x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", 323 | "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 324 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} \n", 325 | "求得最优解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{1}^{*}, \\cdots, \\alpha_{N}^{*} \\right) $。 \n", 326 | "2. 计算\n", 327 | "\\begin{align*} \\\\ & w^{*} = \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} x_{i} \\end{align*} \n", 328 | "并选择$\\alpha^{*}$的一个分量$0 < \\alpha_{j}^{*} < C$,计算\n", 329 | "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left( x_{i}, x_{j} \\right) \\end{align*} \n", 330 | "3. 得到分离超平面\n", 331 | "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", 332 | "以及分类决策函数 \n", 333 | "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left( x_{i}, x_{j} \\right) + b^{*} \\right) \\end{align*} " 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "序列最小最优化(sequential minimal optimization,SMO)算法 要解如下凸二次规划的对偶问题: \n", 341 | "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} K \\left( x_{i}, x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", 342 | "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 343 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} " 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "选择$\\alpha_{1}, \\alpha_{2}$两个变量,其他变量$\\alpha_{i} \\left( i = 3, 4, \\cdots, N \\right)$是固定的,SMO的最优化问题的子问题 \n", 351 | "\\begin{align*} \\\\ & \\min_{\\alpha_{1}, \\alpha_{2}} W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", 352 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} \\alpha_{1} \\sum_{i=3}^{N} y_{i} \\alpha_{i} K_{i1} + y_{2} \\alpha_{2} \\sum_{i=3}^{N} y_{i} \\alpha_i K_{i2}\n", 353 | "\\\\ & s.t. \\quad \\alpha_{1} + \\alpha_{2} = -\\sum_{i=3}^{N} \\alpha_{i} y_{i} = \\varsigma\n", 354 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2 \\end{align*} \n", 355 | "其中,$K_{ij} = K \\left( x_{i}, x_{j} \\right), i,j = 1,2, \\cdots, N, \\varsigma$是常数,且省略了不含$\\alpha_{1}, \\alpha_{2}$的常数项。" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "设凸二次规划的对偶问题的初始可行解为$\\alpha_{1}^{old}, \\alpha_{2}^{old}$,最优解为$\\alpha_{1}^{new}, \\alpha_{2}^{new}$,且在沿着约束方向未经剪辑时$\\alpha_{2}$的最优解为$ \\alpha_{2}^{new,unc}$。" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "由于$\\alpha_{2}^{new}$需要满足$0 \\leq \\alpha_{i} \\leq C$,所以最优解$\\alpha_{2}^{new}$的取值范围需满足\n", 370 | "\\begin{align*} \\\\ & L \\leq \\alpha_{2}^{new} \\leq H \\end{align*} \n", 371 | "其中,L与H是$\\alpha_{2}^{new}$所在的对角线段断点的界。 \n", 372 | "如果$y_{1} \\neq y_{2}$,则 \n", 373 | "\\begin{align*} \\\\ & L = \\max \\left( 0, \\alpha_{2}^{old} - \\alpha_{1}^{old} \\right), H = \\min \\left( C, C + \\alpha_{2}^{old} - \\alpha_{1}^{old} \\right) \\end{align*} \n", 374 | "如果$y_{1} = y_{2}$,则 \n", 375 | "\\begin{align*} \\\\ & L = \\max \\left( 0, \\alpha_{2}^{old} + \\alpha_{1}^{old} - C \\right), H = \\min \\left( C, \\alpha_{2}^{old} + \\alpha_{1}^{old} \\right) \\end{align*} " 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "记\n", 383 | "\\begin{align*} \\\\ & g \\left( x \\right) = \\sum_{i=1}^{N} \\alpha_{i} y_{i} K \\left( x_{i}, x \\right) + b \\end{align*} \n", 384 | "令\n", 385 | "\\begin{align*} \\\\ & E_{i} = g \\left( x_{i} \\right) - y_{i} = \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} K \\left( x_{j}, x_{i} \\right) + b \\right) - y_{i}, \\quad i=1,2\n", 386 | "\\\\ & v_{i} = \\sum_{j=3}^{N} \\alpha_{j} y_{j} K \\left( x_{i}, x_{j} \\right) = g \\left( x_{i} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K \\left( x_{i}, x_{j} \\right) - b, \\quad i=1,2\\end{align*} \n", 387 | "则\n", 388 | "\\begin{align*} \\\\ & W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", 389 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} v_{1} \\alpha_{1}+ y_{2} v_{2} \\alpha_{2} \n", 390 | "\\end{align*} " 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "由于$\\alpha_{1} y_{1} = \\varsigma, y_{i}^{2} = 1$,可将$\\alpha_{1}$表示为\n", 398 | "\\begin{align*} \\\\ & \\alpha_{1} = \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1}\\end{align*} \n", 399 | "代入,得\n", 400 | "\\begin{align*} \\\\ & W \\left( \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\left[ \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} \\right]^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} \\alpha_{2} \n", 401 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left[ \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} + \\alpha_{2} \\right] + y_{1} v_{1} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} + y_{2} v_{2} \\alpha_{2}\n", 402 | "\\\\ & = \\dfrac{1}{2} K_{11} \\left( \\varsigma - y_{2} \\alpha_{2} \\right)^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{2} K_{12} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) \\alpha_{2} \n", 403 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} - \\alpha_{2} + v_{1} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) + y_{2} v_{2} \\alpha_{2}\n", 404 | "\\end{align*} " 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "对$\\alpha_{2}$求导\n", 412 | "\\begin{align*} \\\\ & \\dfrac {\\partial W}{\\partial \\alpha_{2}} = K_{11} \\alpha_{2} + K_{22} \\alpha_{2} -2 K_{12} \\alpha_{2}\n", 413 | "\\\\ & \\quad\\quad\\quad - K_{11} \\varsigma y_{2} + K_{12} \\varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \\end{align*} \n", 414 | "令其为0,得\n", 415 | "\\begin{align*} \\\\ & \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2} = y_{2} \\left( y_{2} - y_{1} + \\varsigma K_{11} - \\varsigma K_{12} + v_{1} - v_{2} \\right)\n", 416 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad = y_{2} \\left[ y_{2} - y_{1} + \\varsigma K_{11} - \\varsigma K_{12} + \\left( g \\left( x_{1} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K_1j - b \\right) \n", 417 | "\\\\ \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad - \\left( g \\left( x_{2} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K_2j - b \\right) \\right]\\end{align*} " 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "将$\\varsigma = \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2}$代入,得\n", 425 | "\\begin{align*} \\\\ & \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{new,unc} = y_{2} \\left( \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \\left( x_{1} \\right) - g \\left( x_{2} \\right) \\right)\n", 426 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad = \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{old} + y_{2} \\left( E_{1} - E_{2} \\right) \\end{align*} " 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "令$\\eta = K_{11} + K_{22} - 2 K_{12}$代入,得\n", 434 | "\\begin{align*} \\\\ & \\alpha_{2}^{new,unc} = \\alpha_{2}^{old} + \\dfrac{y_{2} \\left( E_{1} - E_{2} \\right)}{\\eta}\\end{align*} " 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "经剪辑后\n", 442 | "\\begin{align*} \\alpha_{2}^{new} = \\left\\{\n", 443 | "\\begin{aligned} \n", 444 | "\\ & H, \\alpha_{2}^{new,unc} > H\n", 445 | "\\\\ & \\alpha_{2}^{new,unc}, L \\leq \\alpha_{2}^{new,unc} \\leq H\n", 446 | "\\\\ & L, \\alpha_{2}^{new,unc} < L \n", 447 | "\\end{aligned}\n", 448 | "\\right.\\end{align*} " 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "由于$\\varsigma = \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2}$及$\\varsigma = \\alpha_{1}^{new} y_{1} + \\alpha_{2}^{new} y_{2}$ \n", 456 | "则\n", 457 | "\\begin{align*} \\\\ & \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2} = \\alpha_{1}^{new} y_{1} + \\alpha_{2}^{new} y_{2}\n", 458 | "\\\\ & \\quad\\quad\\quad\\quad \\alpha_{1}^{new} = \\alpha_{1}^{old} + y_{1} y_{2} \\left( \\alpha_{2}^{old} - \\alpha_{2}^{new} \\right) \\end{align*} " 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": { 464 | "collapsed": true 465 | }, 466 | "source": [ 467 | "由分量$0 < \\alpha_{1}^{new} < C$,则\n", 468 | "\\begin{align*} \\\\ & b_1^{new} = y_{1} - \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} - \\alpha_{1}^{new} y_{1} K_{11} - \\alpha_{2}^{new} y_{2} K_{21} \\end{align*} \n", 469 | "由\n", 470 | "\\begin{align*} \\\\ & E_{1} = g \\left( x_{1} \\right) - y_{1} = \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} K_{ij} + b \\right) - y_{1}\n", 471 | "\\\\ & = \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} + \\alpha_{1}^{old} y_{1} K_{11} + \\alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \\end{align*} \n", 472 | "则\n", 473 | "\\begin{align*} \\\\ & y_{1} - \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} = -E_{1} + \\alpha_{1}^{old} y_{1} K_{11} + \\alpha_{2}^{old} y_{2} K_{21} + b^{old} \\end{align*} \n", 474 | "代入,得\n", 475 | "\\begin{align*} \\\\ & b_1^{new} = -E_{1} + y_{1} K_{11} \\left( \\alpha_{1}^{new} - \\alpha_{1}^{old} \\right) - y_{2} K_{21} \\left( \\alpha_{2}^{new} - \\alpha_{2}^{old} \\right) + b^{old} \\end{align*} \n", 476 | "同理,得\n", 477 | "\\begin{align*} \\\\ & b_2^{new} = -E_{2} + y_{1} K_{12} \\left( \\alpha_{1}^{new} - \\alpha_{1}^{old} \\right) - y_{2} K_{22} \\left( \\alpha_{2}^{new} - \\alpha_{2}^{old} \\right) + b^{old} \\end{align*} \n", 478 | "如果$\\alpha_{1}^{new}, \\alpha_{2}^{new}$满足$0 < \\alpha_{i}^{new} < C, i = 1, 2$, \n", 479 | "则 \n", 480 | "\\begin{align*} \\\\ & b^{new} = b_{1}^{new} = b_{2}^{new}\\end{align*} \n", 481 | "否则\n", 482 | "\\begin{align*} \\\\ & b^{new} = \\dfrac{b_{1}^{new} + b_{2}^{new}}{2} \\end{align*} " 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "更新$E_{i}$ \n", 490 | "\\begin{align*} \\\\ & E_{i}^{new} = \\sum_{S} y_{j} \\alpha_{j} K_{ \\left( x_{i}, x_{j} \\right)} + b^{new} - y_{i} \\end{align*} \n", 491 | "其中,$S$是所有支持向量$x_{j}$的集合。" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "SMO算法: \n", 499 | "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$,精度$\\varepsilon$; \n", 500 | "输出:近似解$\\hat \\alpha$ \n", 501 | "1. 取初始值$\\alpha^{0} = 0$,令$k = 0$;\n", 502 | "2. 选取优化变量$\\alpha_{1}^{\\left( k \\right)},\\alpha_{2}^{\\left( k \\right)}$,求解\n", 503 | "\\begin{align*} \\\\ & \\min_{\\alpha_{1}, \\alpha_{2}} W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", 504 | "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} \\alpha_{1} \\sum_{i=3}^{N} y_{i} \\alpha_{i} K_{i1} + y_{2} \\alpha_{2} \\sum_{i=3}^{N} y_{i} \\alpha_i K_{i2}\n", 505 | "\\\\ & s.t. \\quad \\alpha_{1} + \\alpha_{2} = -\\sum_{i=3}^{N} \\alpha_{i} y_{i} = \\varsigma\n", 506 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2 \\end{align*} \n", 507 | "求得最优解$\\alpha_{1}^{\\left( k+1 \\right)},\\alpha_{2}^{\\left( k+1 \\right)}$,更新$\\alpha$为$\\alpha^{\\left( k+1 \\right)}$;\n", 508 | "3. 若在精度$\\varepsilon$范围内满足停机条件\n", 509 | "\\begin{align*} \\\\ & \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", 510 | "\\\\ & 0 \\leq \\alpha_{i} \\leq C, i = 1, 2, \\cdots, N\n", 511 | "\\\\ & \\end{align*} \n", 512 | "\\begin{align*} y_{i} \\cdot g \\left( x_{i} \\right) = \\left\\{\n", 513 | "\\begin{aligned} \n", 514 | "\\ & \\geq 1, \\left\\{ x_{i} | \\alpha_{i} = 0 \\right\\}\n", 515 | "\\\\ & = 1, \\left\\{ x_{i} | 0 < \\alpha_{i} < C \\right\\}\n", 516 | "\\\\ & \\leq 1, \\left\\{ x_{i} | \\alpha_{i} = C \\right\\}\n", 517 | "\\end{aligned}\n", 518 | "\\right.\\end{align*} \n", 519 | "则转4.;否则令$k = k + 1$,转2.; \n", 520 | "4.取$\\hat \\alpha = \\alpha^{\\left( k + 1 \\right)}$。" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": { 527 | "collapsed": true 528 | }, 529 | "outputs": [], 530 | "source": [] 531 | } 532 | ], 533 | "metadata": { 534 | "kernelspec": { 535 | "display_name": "Python 2", 536 | "language": "python", 537 | "name": "python2" 538 | }, 539 | "language_info": { 540 | "codemirror_mode": { 541 | "name": "ipython", 542 | "version": 2 543 | }, 544 | "file_extension": ".py", 545 | "mimetype": "text/x-python", 546 | "name": "python", 547 | "nbconvert_exporter": "python", 548 | "pygments_lexer": "ipython2", 549 | "version": "2.7.11" 550 | } 551 | }, 552 | "nbformat": 4, 553 | "nbformat_minor": 0 554 | } 555 | --------------------------------------------------------------------------------