├── docs ├── .nojekyll ├── chapter2 │ ├── resources │ │ └── images │ │ │ └── lrank.png │ └── chapter2.md ├── _sidebar.md ├── index.html ├── chapter1 │ └── chapter1.md ├── README.md ├── chapter4 │ └── chapter4.md ├── chapter7 │ └── chapter7.md ├── chapter5 │ └── chapter5.md ├── chapter16 │ └── chapter16.md ├── chapter12 │ └── chapter12.md ├── chapter8 │ └── chapter8.md ├── chapter14 │ └── chapter14.md ├── chapter10 │ └── chapter10.md ├── chapter11 │ └── chapter11.md ├── chapter6 │ └── chapter6.md ├── chapter3 │ └── chapter3.md ├── chapter13 │ └── chapter13.md └── chapter9 │ └── chapter9.md ├── ISSUE_TEMPLATE.md ├── res ├── xigua.jpg ├── example.png └── qrcode.jpeg ├── CONTRIBUTING.md ├── .gitignore ├── PULL_REQUEST_TEMPLATE.md ├── README.md └── LICENSE /docs/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 请在这里写上你具体哪一个章节哪一个公式不理解,如果能写清哪里不理解那就最好啦~ -------------------------------------------------------------------------------- /res/xigua.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/xigua.jpg -------------------------------------------------------------------------------- /res/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/example.png -------------------------------------------------------------------------------- /res/qrcode.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/qrcode.jpeg -------------------------------------------------------------------------------- /docs/chapter2/resources/images/lrank.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/docs/chapter2/resources/images/lrank.png -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # 使用说明 2 | 3 | 南瓜书仅仅是西瓜书的一些细微补充,内容都是以西瓜书的内容为前置知识进行表述的。所以,南瓜书的最佳使用方法是以西瓜书为主线,遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。 4 | 5 | ## 西瓜书待推导或待解析公式征集 6 | 7 | 若南瓜书里没有你想要查阅的公式,请[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号,我们看到后会尽快进行补充。 -------------------------------------------------------------------------------- /docs/_sidebar.md: -------------------------------------------------------------------------------- 1 | - 目录 2 | - [第1章 绪论](chapter1/chapter1.md) 3 | - [第2章 模型评估](chapter2/chapter2.md) 4 | - [第3章 线性模型](chapter3/chapter3.md) 5 | - [第4章 决策树](chapter4/chapter4.md) 6 | - [第5章 神经网络](chapter5/chapter5.md) 7 | - [第6章 支持向量机](chapter6/chapter6.md) 8 | - [第7章 贝叶斯分类器](chapter7/chapter7.md) 9 | - [第8章 集成学习](chapter8/chapter8.md) 10 | - [第9章 聚类](chapter9/chapter9.md) 11 | - [第10章 降维与度量学习](chapter10/chapter10.md) 12 | - [第11章 特征选择与稀疏学习](chapter11/chapter11.md) 13 | - [第12章 计算学习理论](chapter12/chapter12.md) 14 | - [第13章 半监督学习](chapter13/chapter13.md) 15 | - [第14章 概率图模型](chapter14/chapter14.md) 16 | - [第16章 强化学习](chapter16/chapter16.md) 17 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | 5 | # Runtime data 6 | pids 7 | *.pid 8 | *.seed 9 | 10 | # Directory for instrumented libs generated by jscoverage/JSCover 11 | lib-cov 12 | 13 | # Coverage directory used by tools like istanbul 14 | coverage 15 | 16 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 17 | .grunt 18 | 19 | # Compiled binary addons (http://nodejs.org/api/addons.html) 20 | build/Release 21 | 22 | # Dependency directory 23 | # Deployed apps should consider commenting this line out: 24 | # see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git 25 | node_modules 26 | 27 | _book/ 28 | book.pdf 29 | book.epub 30 | book.mobi 31 | 32 | .idea 33 | -------------------------------------------------------------------------------- /PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # 提交PR后请联系 Datawhale 的同学 2 | 3 | 提交时请检查目录结构和公式是否符合规范,无问题即可提交PR 4 | 5 | ## 如果你想加入这个项目: 6 | 7 | 请直接 fork 我们,直接提交Pull Request,我们会不定时检查。 8 | 9 | ## How to Pull Request 10 | 11 | First time contributing to datawhalechina/pumpkin-book? 12 | 13 | If you know how to fix an issue, consider opening a pull request for it. 14 | 15 | ### 目录结构规范: 16 | 17 | ```markdown 18 | pumpkin-book 19 | ├─docs 20 | | ├─chapter1 # 第1章 21 | | | ├─resources # 资源文件夹 22 | | | | └─images # 图片资源 23 | | | └─chapter1.md # 第1章公式全解 24 | | ├─chapter2 25 | ... 26 | ``` 27 | 28 | ### 公式全解文档规范: 29 | 30 | ```markdown 31 | ## 公式编号 32 | 33 | $$(公式的LaTeX表达式)$$ 34 | 35 | [推导]:(公式推导步骤) or [解析]:(公式解析说明) 36 | 37 | ## 附录(可选) 38 | 39 | (附录内容) 40 | ``` 41 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 南瓜书PumpkinBook 6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 23 | 24 | 25 | 26 | 30 | 31 | 32 | 33 | 34 | -------------------------------------------------------------------------------- /docs/chapter1/chapter1.md: -------------------------------------------------------------------------------- 1 | ## 1.2 2 | $$\begin{aligned} 3 | \sum_{f}E_{ote}(\mathfrak{L}_a\vert X,f) &= \sum_f\sum_h\sum_{x\in\mathcal{X}-X}P(x)\mathbb{I}(h(x)\neq f(x))P(h\vert X,\mathfrak{L}_a) \\ 4 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\sum_f\mathbb{I}(h(x)\neq f(x)) \\ 5 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\cfrac{1}{2}2^{\vert \mathcal{X} \vert} \\ 6 | &=\cfrac{1}{2}2^{\vert \mathcal{X} \vert}\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a) \\ 7 | &=2^{\vert \mathcal{X} \vert-1}\sum_{x\in\mathcal{X}-X}P(x) \cdot 1\\ 8 | \end{aligned}$$ 9 | 10 | [解析]: 11 | 12 | 第一步到第二步是因为$\sum_i^m\sum_j^n\sum_k^o a_ib_jc_k=\sum_i^m a_i \cdot \sum_j^n b_j \cdot \sum_k^o c_k$。 13 | 14 | 第二步到第三步:首先要知道此时$f$的定义为**任何能将样本映射到{0,1}的函数+均匀分布**,也即不止一个$f$且每个$f$出现的概率相等,例如样本空间只有两个样本时:$ \mathcal{X}=\{x_1,x_2\},\vert \mathcal{X} \vert=2$,那么所有的真实目标函数$f$为: 15 | $$\begin{aligned} 16 | f_1:f_1(x_1)=0,f_1(x_2)=0;\\ 17 | f_2:f_2(x_1)=0,f_2(x_2)=1;\\ 18 | f_3:f_3(x_1)=1,f_3(x_2)=0;\\ 19 | f_4:f_4(x_1)=1,f_4(x_2)=1; 20 | \end{aligned}$$ 21 | 一共$2^{\vert \mathcal{X} \vert}=2^2=4$个真实目标函数。所以此时通过算法$\mathfrak{L}_a$学习出来的模型$h(x)$对每个样本无论预测值为0还是1必然有一半的$f$与之预测值相等,例如,现在学出来的模型$h(x)$对$x_1$的预测值为1,也即$h(x_1)=1$,那么有且只有$f_3$和$f_4$与$h(x)$的预测值相等,也就是有且只有一半的$f$与它预测值相等,所以$\sum_f\mathbb{I}(h(x)\neq f(x)) = \cfrac{1}{2}2^{\vert \mathcal{X} \vert} $。 22 | 23 | 第三步一直到最后有点概率论的基础应该都能看懂了。 -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- 1 | # 南瓜书PumpkinBook 2 | 3 | 周志华老师的《机器学习》(西瓜书)是机器学习领域的经典入门教材之一,周老师为了使尽可能多的读者通过西瓜书对机器学习有所了解, 所以在书中对部分公式的推导细节没有详述,但是这对那些想深究公式推导细节的读者来说可能“不太友好”,本书旨在对西瓜书里比较难理解的公式加以解析,以及对部分公式补充具体的推导细节,诚挚欢迎每一位西瓜书读者前来参与完善本书:一个人可以走的很快,但是一群人却可以走的更远。 4 | 5 | ## 使用说明 6 | 7 | 南瓜书仅仅是西瓜书的一些细微补充,内容都是以西瓜书的内容为前置知识进行表述的。所以,南瓜书的最佳使用方法是以西瓜书为主线,遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。 8 | 9 | 若南瓜书里没有你想要查阅的公式,可以[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号,我们看到后会尽快进行补充。 10 | 11 | ## 选用的西瓜书版本 12 | 13 | 14 | 15 | > 书名:机器学习
16 | > 作者:周志华
17 | > 出版社:清华大学出版社
18 | > 版次:2016年1月第1版
19 | > 勘误表:http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm
20 | > 全书共16章,大致分为3个部分:第1部分(第1~3章)介绍机器学习的基础知识;第2部分(第4~10章)讨论一些经典而常用的机器学习方法(决策树、神经网络、支持向量机、贝叶斯分类器、集成学习、聚类、降维与度量学习);第3部分(第11~16章)为进阶知识,内容涉及特征选择与稀疏学习、计算学习理论、半监督学习、概率图模型、规则学习以及强化学习等。 21 | 22 | ### 主要贡献者(按首字母排名) 23 | 24 | [@awyd234](https://github.com/awyd234) 25 | [@feijuan](https://github.com/feijuan) 26 | [@Ggmatch](https://github.com/Ggmatch) 27 | [@Heitao5200](https://github.com/Heitao5200) 28 | [@huaqing89](https://github.com/huaqing89) 29 | [@juxiao](https://github.com/juxiao) 30 | [@jbb0523](https://blog.csdn.net/jbb0523) 31 | [@LongJH](https://github.com/LongJH) 32 | [@LilRachel](https://github.com/LilRachel) 33 | [@LeoLRH](https://github.com/LeoLRH) 34 | [@Majingmin](https://github.com/Majingmin) 35 | [@MrBigFan](https://github.com/MrBigFan) 36 | [@Nono17](https://github.com/Nono17) 37 | [@spareribs](https://github.com/spareribs) 38 | [@sunchaothu](https://github.com/sunchaothu) 39 | [@StevenLzq](https://github.com/StevenLzq) 40 | [@Sm1les](https://github.com/Sm1les) 41 | [@shanry](https://github.com/shanry) 42 | [@Ye980226](https://github.com/Ye980226) 43 | 44 | ## 关注我们 45 | 46 |
Datawhale,一个专注于AI领域的学习圈子。初衷是for the learner,和学习者一起成长。目前加入学习社群的人数已经数千人,组织了机器学习,深度学习,数据分析,数据挖掘,爬虫,编程,统计学,Mysql,数据竞赛等多个领域的内容学习,微信搜索公众号Datawhale可以加入我们。
47 | 48 | ## LICENSE 49 | 50 | [GNU General Public License v3.0](https://github.com/datawhalechina/pumpkin-book/blob/master/LICENSE) -------------------------------------------------------------------------------- /docs/chapter4/chapter4.md: -------------------------------------------------------------------------------- 1 | ## 4.1 2 | $$Ent(D) =-\sum_{k=1}^{|y|}p_klog_{2}{p_k}$$ 3 | [解析]: 4 | 5 | 熵是度量样本集合纯度最常用的一种指标,代表一个系统中蕴含多少信息量,信息量越大表明一个系统不确定性就越大,就存在越多的可能性。 6 | 7 | 假定当前样本集合 $D$ 中第 $k$ 类样本所占的比例为 $p_k(k =1,2,...,|y|)$ ,则 $D$ 的信息熵为: 8 | 9 | $$ 10 | Ent(D) =-\sum_{k=1}^{|y|}p_klog_{2}{p_k} 11 | $$ 12 | 13 | 其中,当样本 $D$ 中 $|y|$ 类样本均匀分布时,这时信息熵最大,其值为 14 | $$ 15 | Ent(D) =-\sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{\frac{1}{|y|}} = \sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{|y|} = log_{2}{|y|} 16 | $$ 17 | 此时样本D的纯度越小; 18 | 19 | 相反,假设样本D中只有一类样本,此时信息熵最小,其值为 20 | $$ 21 | Ent(D) =-\sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{\frac{1}{|y|}} = -1log_21-0log_20-...-0log_20 = 0 22 | $$ 23 | 此时样本的纯度最大。 24 | 25 | ## 4.2 26 | $$ 27 | Gain(D,a) = Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent({D^v}) 28 | $$ 29 | [解析]:假定在样本D中有某个**离散特征** $a$ 有 $V$ 个可能的取值 $(a^1,a^2,...,a^V)$,若使用特征 $a$ 来对样本集 $D$ 进行划分,则会产生 $V$ 个分支结点,其中第 $v$ 个分支结点包含了 $D$ 中所有在特征 $a$ 上取值为 $a^v$ 的样本,样本记为 $D^v$,由于根据离散特征a的每个值划分的 $V$ 个分支结点下的样本数量不一致,对于这 $V$ 个分支结点赋予权重 $\frac{|D^v|}{|D|}$,即样本数越多的分支结点的影响越大,特征 $a$ 对样本集 $D$ 进行划分所获得的“信息增益”为 30 | $$ 31 | Gain(D,a) = Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent({D^v}) 32 | $$ 33 | 信息增益越大,表示使用特征a来对样本集进行划分所获得的纯度提升越大。 34 | 35 | **缺点**:由于在计算信息增益中倾向于特征值越多的特征进行优先划分,这样假设某个特征值的离散值个数与样本集 $D$ 个数相同(假设为样本编号),虽然用样本编号对样本进行划分,样本纯度提升最高,但是并不具有泛化能力。 36 | 37 | ## 4.3 38 | $$ 39 | Gain-ratio(D,a)=\frac{Gain(D,a)}{IV(a)} 40 | $$ 41 | [解析]:基于信息增益的缺点,$C4.5$ 算法不直接使用信息增益,而是使用一种叫增益率的方法来选择最优特征进行划分,对于样本集 $D$ 中的离散特征 $a$ ,增益率为 42 | $$ 43 | Gain-ratio(D,a)=\frac{Gain(D,a)}{IV(a)} 44 | $$ 45 | 其中, 46 | $$ 47 | IV(a)=-\sum_{v=1}^{V}\frac{|D^v|}{|D|}log_2\frac{|D^v|}{|D|} 48 | $$ 49 | IV(a) 是特征 a 的熵。 50 | 51 | 增益率对特征值较少的特征有一定偏好,因此 $C4.5$ **算法选择特征的方法是先从候选特征中选出信息增益高于平均水平的特征,再从这些特征中选择增益率最高的**。 52 | 53 | ## 4.5 54 | 55 | $$ 56 | \begin{aligned} 57 | Gini(D) &=\sum_{k=1}^{|y|}\sum_{k\neq{k'}}{p_k}{p_{k'}}\\ 58 | &=1-\sum_{k=1}^{|y|}p_k^2 59 | \end{aligned} 60 | $$ 61 | [推导]:假定当前样本集合 $D$ 中第 $k$ 类样本所占的比例为 $p_k(k =1,2,...,|y|)$,则 $D$ 的**基尼值**为 62 | $$ 63 | \begin{aligned} 64 | Gini(p) &=\sum_{k=1}^{|y|}\sum_{k\neq{k'}}{p_k}{p_{k'}}\\ 65 | &=\sum_{k=1}^{|y|}{p_k}{(1-p_k)} \\ 66 | &=1-\sum_{k=1}^{|y|}p_k^2 67 | \end{aligned} 68 | $$ 69 | 70 | ## 4.7 - 4.8 71 | 72 | [解析]:样本集 $D$ 中的**连续特征** $a$,假设特征 $a$ 有 $n$ 个不同的取值,对其进行大小排序,记为 $\lbrace{a^1,a^2,...,a^n}\rbrace$,根据特征 $a$ 可得到 $n-1$ 个划分点 $t$,划分点 $t$ 的集合为 73 | $$ 74 | T_a=\lbrace{\frac{a^i+a^{i+1}}{2}|1\leq{i}\leq{n-1}}\rbrace \tag {4.7} 75 | $$ 76 | 对于取值集合 $ T_a$ 中的每个 $t$ 值计算将特征 $a$ 离散为一个特征值只有两个值,分别是 $\lbrace{a} >t\rbrace$ 和 $\lbrace{a} \leq{t}\rbrace$ 的特征,计算新特征的信息增益,找到信息增益最大的 $t$ 值即为该特征的最优划分点。 77 | $$ 78 | \begin{aligned} 79 | Gain(D,a) &= \max\limits_{t \in T_a} \ Gain(D,a) \\ 80 | &= \max\limits_{t \in T_a} \ Ent(D)-\sum_{\lambda \in \{-,+\}} \frac{\left | D_t^{\lambda } \right |}{\left |D \right |}Ent(D_t^{\lambda }) \end{aligned} \tag{4.8} 81 | $$ 82 | 83 | ### 脚注:熵 84 | 85 | >熵的量度正是能量退化的指标。熵亦被用于计算一个系统中的失序现象,也就是计算该系统混乱的程度。熵是一个描述系统状态的函数,但是经常用熵的参考值和变化量进行分析比较,它在控制论、概率论、数论、天体物理、生命科学等领域都有重要应用,在不同的学科中也有引申出的更为具体的定义,是各领域十分重要的参量。 -------------------------------------------------------------------------------- /docs/chapter2/chapter2.md: -------------------------------------------------------------------------------- 1 | ## 2.20 2 | 3 | $$ AUC=\cfrac{1}{2}\sum_{i=1}^{m-1}(x_{i+1} - x_i)\cdot(y_i + y_{i+1}) $$ 4 | 5 | [解析]:由于图2.4(b)中给出的ROC曲线为横平竖直的标准折线,所以乍一看这个式子的时候很不理解其中的$ \cfrac{1}{2} $和$ (y_i + y_{i+1}) $代表着什么,因为对于横平竖直的标准折线用$ AUC=\sum_{i=1}^{m-1}(x_{i+1} - x_i) \cdot y_i $就可以求出AUC了,但是图2.4(b)中的ROC曲线只是个特例罢了,因为此图是所有样例的预测值均不相同时的情形,也就是说每次分类阈值变化的时候只会划分新增**1个**样例为正例,所以下一个点的坐标为$ (x+\cfrac{1}{m^-},y) $或$ (x,y+\cfrac{1}{m^+}) $,然而当模型对某个正样例和某个反样例给出的预测值相同时,便会划分新增**两个**样例为正例,于是其中一个分类正确一个分类错误,那么下一个点的坐标为$ (x+\cfrac{1}{m^-},y+\cfrac{1}{m^+}) $(当没有预测值相同的样例时,若采取按固定梯度改变分类阈值,也会出现一下划分新增两个甚至多个正例的情形,但是此种阈值选取方案画出的ROC曲线AUC值更小,不建议使用),此时ROC曲线中便会出现斜线,而不再是只有横平竖直的折线,所以用**梯形面积公式**就能完美兼容这两种分类阈值选取方案,也即 **(上底+下底)\*高\*$ \cfrac{1}{2} $** 6 | 7 | ## 2.21 8 | 9 | $$ l_{rank}=\cfrac{1}{m^+m^-}\sum_{x^+ \in D^+}\sum_{x^- \in D^-}(\mathbb{I}(f(x^+) roc曲线:接收者操作特征(receiveroperating characteristic),roc曲线上每个点反映着对同一信号刺激的感受性。**横轴:负正类率(false postive rate FPR)特异度**,划分实例中所有负例占所有负例的比例;(1-Specificity),**纵轴:真正类率(true postive rate TPR)灵敏度**,Sensitivity(正类覆盖率)。 44 | > 参考: 45 | > [ROC和AUC介绍以及如何计算AUC](http://alexkong.net/2013/06/introduction-to-auc-and-roc/) 46 | > [ROC曲线-阈值评价标准](http://blog.csdn.net/abcjennifer/article/details/7359370) -------------------------------------------------------------------------------- /docs/chapter7/chapter7.md: -------------------------------------------------------------------------------- 1 | ## 7.5 2 | $$R(c|\boldsymbol x)=1−P(c|\boldsymbol x)$$ 3 | [推导]:由式7.1和式7.4可得: 4 | $$R(c_i|\boldsymbol x)=1*P(c_1|\boldsymbol x)+1*P(c_2|\boldsymbol x)+...+0*P(c_i|\boldsymbol x)+...+1*P(c_N|\boldsymbol x)$$ 5 | 又$\sum_{j=1}^{N}P(c_j|\boldsymbol x)=1$,则: 6 | $$R(c_i|\boldsymbol x)=1-P(c_i|\boldsymbol x)$$ 7 | 此即为式7.5 8 | ## 7.8 9 | $$P(c|\boldsymbol x)=\cfrac{P(c)P(\boldsymbol x|c)}{P(\boldsymbol x)}$$ 10 | [解析]:最小化误差,也就是最大化P(c|x),但由于P(c|x)属于后验概率无法直接计算,由贝叶斯公式可计算出: 11 | $$P(c|\boldsymbol x)=\cfrac{P(c)P(\boldsymbol x|c)}{P(\boldsymbol x)}$$ 12 | $P(\boldsymbol x)$可以省略,因为我们比较的时候$P(\boldsymbol x)$一定是相同的,所以我们就是用历史数据计算出$P(c)$和$P(\boldsymbol x|c)$。 13 | 1. $P(c)$根据大数定律,当样本量到了一定程度且服从独立同分布,c的出现的频率就是c的概率。 14 | 2. $P(\boldsymbol x|c)$,因为$\boldsymbol x$在这里不对单一元素是个矩阵,涉及n个元素,不太好直接统计分类为c时,$\boldsymbol x$的概率,所以我们根据假设独立同分布,对每个$\boldsymbol x$的每个特征分别求概率 15 | $$P(\boldsymbol x|c)=P(x_1|c)*P(x_2|c)*P(x_3|c)...*P(x_n|c)$$ 16 | 这个式子就可以很方便的通过历史数据去统计了,比如特征n,就是在分类为c时特征n出现的概率,在数据集中应该是用1显示。 17 | 但是当某一概率为0时会导致整个式子概率为0,所以采用拉普拉斯修正 18 | 19 | 当样本属性独依赖时,也就是除了c多加一个依赖条件,式子变成了 20 | $$∏_{i=1}^n P(x_i|c,p_i)$$ 21 | $p_i$是$x_i$所依赖的属性 22 | 23 | 当样本属性相关性未知时,我们采用贝叶斯网的算法,对相关性进行评估,以找出一个最佳的分类模型。 24 | 25 | 当遇到不完整的训练样本时,可通过使用EM算法对模型参数进行评估来解决。 26 | 27 | ## 7.17-7.18 28 | $$P_{(\boldsymbol x_{i}|c)}\in[0,1]$$ 29 | $$p_{(\boldsymbol x_{i}| c)}$$ 30 | [解析]:式(7.17)所得$P_{(\boldsymbol x_{i}|c)}\in[0,1]$为条件概率,但式(7.18)所得$p_{(\boldsymbol x_{i}| c)}$为条件概率密度而非概率,其值并不在局限于区间[0,1]之内。 31 | 32 | ## 7.23 33 | 34 | $$P(c|\boldsymbol x)\propto{\sum_{i=1 \atop |D_{x_{i}}|\geq m'}^{d}}P(c,x_{i})\prod_{j=1}^{d}P(x_j|c,x_i)$$ 35 | [推导]: 36 | $$\begin{aligned} 37 | P(c|\boldsymbol x)&=\cfrac{P(\boldsymbol x,c)}{P(\boldsymbol x)}\\ 38 | &=\cfrac{P\left(x_{1}, x_{2}, \ldots, x_{d}, c\right)}{P(\boldsymbol x)}\\ 39 | &=\cfrac{P\left(x_{1}, x_{2}, \ldots, x_{d} | c\right) P(c)}{P(\boldsymbol x)} \\ 40 | &=\cfrac{P\left(x_{1}, \ldots, x_{i-1}, x_{i+1}, \ldots, x_{d} | c, x_{i}\right) P\left(c, x_{i}\right)}{P(\boldsymbol x)} \\ 41 | \end{aligned}$$ 42 | $$\begin{aligned} 43 | P(c|\boldsymbol x)&\propto P(c,x_{i})P(x_{1},…,x_{i-1},x_{i+1},…,x_{d}|c,x_{i}) \\ 44 | &=P(c,x_{i})\prod _{j=1}^{d}P(x_j|c,x_i) 45 | \end{aligned}$$ 46 | $$P(c|\boldsymbol x)\propto\sum\limits_{i=1 \atop |D_{x_{i}}|\geq m'}^{d}P(c_{i}|\boldsymbol x_{i})\prod_{j=1}^{d}P(c_{i}|\boldsymbol x_{i})$$ 47 | 此即为式7.23,由于式(7.24)和式(7.25)的使用到了$|D_{c,x_{i}}|$与$|D_{c,x_{i},x_{j}}|$,若$|D_{x_{i}}|$集合中样本数量过少,则$|D_{c,x_{i}}|$与$|D_{c,x_{i},x_{j}}|$将会更小,因此在式(7.23)中要求$|D_{x_{i}}|$集合中样本数量不少于$m'$。 48 | 49 | 50 | 51 | 52 | ## 附录 53 | ### 勘误 54 | 7.3节例子计算有笔误,第152页的第9个等式应为$P_{(凹陷|\boldsymbol 是)}=\cfrac{5}{8}$。截止到目前(第28次印刷),西瓜书官方勘误修订仅在第8次印刷时修正了第3个等式$P_{(蜷缩|\boldsymbol 是)}$,但第9个等式仍未修正(若修正第84页的西瓜数据集3.0也可,但亦未修正)。 55 | 56 | #### sklearn调包 57 | 58 | ```python 59 | import numpy as np 60 | X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) 61 | Y = np.array([1, 1, 1, 2, 2, 2]) 62 | from sklearn.naive_bayes import GaussianNB 63 | clf = GaussianNB() 64 | clf.fit(X, Y) 65 | GaussianNB(priors=None, var_smoothing=1e-09) 66 | print(clf.predict([[-0.8, -1]])) 67 | ``` 68 | #### 参数: 69 | priors : array-like, shape (n_classes,) 70 | Prior probabilities of the classes. If specified the priors are not adjusted according to the data. 71 | 72 | var_smoothing : float, optional (default=1e-9) 73 | Portion of the largest variance of all features that is added to variances for calculation stability. 74 | #### 贝叶斯应用 75 | 76 | 1. 中文分词 77 | 分词后,得分的假设是基于两词之间是独立的,后词的出现与前词无关 78 | 2. 统计机器翻译 79 | 统计机器翻译因为其简单,无需手动添加规则,迅速成为了机器翻译的事实标准。 80 | 3. 贝叶斯图像识别 81 | 首先是视觉系统提取图形的边角特征,然后使用这些特征自底向上地激活高层的抽象概念,然后使用一个自顶向下的验证来比较到底哪个概念最佳地解释了观察到的图像。 82 | -------------------------------------------------------------------------------- /docs/chapter5/chapter5.md: -------------------------------------------------------------------------------- 1 | ## 5.2 2 | $$\Delta w_i = \eta(y-\hat{y})x_i$$ 3 | [推导]:此处感知机的模型为: 4 | $$y=f(\sum_{i} w_i x_i - \theta)$$ 5 | 将$\theta$看成哑结点后,模型可化简为: 6 | $$y=f(\sum_{i} w_i x_i)=f(\boldsymbol w^T \boldsymbol x)$$ 7 | 其中$f$为阶跃函数。
根据《统计学习方法》§2可知,假设误分类点集合为$M$,$\boldsymbol x_i \in M$为误分类点,$\boldsymbol x_i$的真实标签为$y_i$,模型的预测值为$\hat{y_i}$,对于误分类点$\boldsymbol x_i$来说,此时$\boldsymbol w^T \boldsymbol x_i \gt 0,\hat{y_i}=1,y_i=0$或$\boldsymbol w^T \boldsymbol x_i \lt 0,\hat{y_i}=0,y_i=1$,综合考虑两种情形可得: 8 | $$(\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i>0$$ 9 | 所以可以推得损失函数为: 10 | $$L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i$$ 11 | 损失函数的梯度为: 12 | $$\nabla_w L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol x_i$$ 13 | 随机选取一个误分类点$(\boldsymbol x_i,y_i)$,对$\boldsymbol w$进行更新: 14 | $$\boldsymbol w \leftarrow \boldsymbol w-\eta(\hat{y_i}-y_i)\boldsymbol x_i=\boldsymbol w+\eta(y_i-\hat{y_i})\boldsymbol x_i$$ 15 | 显然式5.2为$\boldsymbol w$的第$i$个分量$w_i$的变化情况 16 | ## 5.12 17 | $$\Delta \theta_j = -\eta g_j$$ 18 | [推导]:因为 19 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}$$ 20 | 又 21 | $$ 22 | \begin{aligned} 23 | \cfrac{\partial E_k}{\partial \theta_j} &= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial \hat{y}_j^k}{\partial \theta_j} \\ 24 | &= (\hat{y}_j^k-y_j^k) \cdot f’(\beta_j-\theta_j) \cdot (-1) \\ 25 | &= -(\hat{y}_j^k-y_j^k)f’(\beta_j-\theta_j) \\ 26 | &= g_j 27 | \end{aligned} 28 | $$ 29 | 所以 30 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}=-\eta g_j$$ 31 | ## 5.13 32 | $$\Delta v_{ih} = \eta e_h x_i$$ 33 | [推导]:因为 34 | $$\Delta v_{ih} = -\eta \cfrac{\partial E_k}{\partial v_{ih}}$$ 35 | 又 36 | $$ 37 | \begin{aligned} 38 | \cfrac{\partial E_k}{\partial v_{ih}} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot \cfrac{\partial \alpha_h}{\partial v_{ih}} \\ 39 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot x_i \\ 40 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\ 41 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\ 42 | &= \sum_{j=1}^{l} (-g_j) \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\ 43 | &= -f’(\alpha_h-\gamma_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i\\ 44 | &= -b_h(1-b_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i \\ 45 | &= -e_h \cdot x_i 46 | \end{aligned} 47 | $$ 48 | 所以 49 | $$\Delta v_{ih} = -\eta \cdot -e_h \cdot x_i=\eta e_h x_i$$ 50 | ## 5.14 51 | $$\Delta \gamma_h= -\eta e_h$$ 52 | [推导]:因为 53 | $$\Delta \gamma_h = -\eta \cfrac{\partial E_k}{\partial \gamma_h}$$ 54 | 又 55 | $$ 56 | \begin{aligned} 57 | \cfrac{\partial E_k}{\partial \gamma_h} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \gamma_h} \\ 58 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot (-1) \\ 59 | &= -\sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h)\\ 60 | &=e_h 61 | \end{aligned} 62 | $$ 63 | 所以 64 | $$\Delta \gamma_h= -\eta e_h$$ -------------------------------------------------------------------------------- /docs/chapter16/chapter16.md: -------------------------------------------------------------------------------- 1 | ## 16.2 2 | $$ 3 | Q_{n}(k)=\frac{1}{n}\left((n-1)\times Q_{n-1}(k)+v_{n}\right) 4 | $$ 5 | 6 | [推导]: 7 | $$ 8 | Q_{n}(k)=\frac{1}{n}\sum_{i=1}^{n}v_{i}=\frac{1}{n}\left(\sum_{i=1}^{n-1}v_{i}+v_{n}\right)=\frac{1}{n}\left((n-1)Q_{n-1}(k)+v_{n}\right) 9 | $$ 10 | 11 | ## 16.4 12 | 13 | $$ 14 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}} 15 | $$ 16 | 17 | $$ 18 | \tau越小则平均奖赏高的摇臂被选取的概率越高 19 | $$ 20 | 21 | [解析]: 22 | $$ 23 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}}\propto e^{\frac{Q(k)}{\tau }}\propto\frac{Q(k)}{\tau }\propto\frac{1}{\tau} 24 | $$ 25 | 26 | ## 16.7 27 | 28 | $$ 29 | \begin{aligned} 30 | V_{T}^{\pi}(x)&=\mathbb{E}_{\pi}[\frac{1}{T}\sum_{t=1}^{T}r_{t}\mid x_{0}=x]\\ 31 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\ 32 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}'])\\ 33 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}V_{T-1}^{\pi}(x{}')]) 34 | \end{aligned} 35 | $$ 36 | 37 | [解析]: 38 | 39 | 因为 40 | $$ 41 | \pi(x,a)=P(action=a|state=x) 42 | $$ 43 | 表示在状态x下选择动作a的概率,又因为动作事件之间两两互斥且和为动作空间,由全概率展开公式 44 | $$ 45 | P(A)=\sum_{i=1}^{\infty}P(B_{i})P(A\mid B_{i}) 46 | $$ 47 | 可得 48 | $$ 49 | \begin{aligned} 50 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\ 51 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}']) 52 | \end{aligned} 53 | $$ 54 | 其中 55 | $$ 56 | r_{1}=\pi(x,a)P_{x\rightarrow x{}'}^{a}R_{x\rightarrow x{}'}^{a} 57 | $$ 58 | 最后一个等式用到了递归形式。 59 | 60 | 61 | 62 | ## 16.8 63 | 64 | $$ 65 | V_{\gamma }^{\pi}(x)=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}')) 66 | $$ 67 | 68 | [推导]: 69 | $$ 70 | \begin{aligned} 71 | V_{\gamma }^{\pi}(x)&=\mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x]\\ 72 | &=\mathbb{E}_{\pi}[r_{1}+\sum_{t=1}^{\infty}\gamma^{t}r_{t+1}\mid x_{0}=x]\\ 73 | &=\mathbb{E}_{\pi}[r_{1}+\gamma\sum_{t=1}^{\infty}\gamma^{t-1}r_{t+1}\mid x_{0}=x]\\ 74 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma \mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x{}'])\\ 75 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}')) 76 | \end{aligned} 77 | $$ 78 | 79 | ## 16.16 80 | 81 | $$ 82 | V^{\pi}(x)\leq V^{\pi{}'}(x) 83 | $$ 84 | 85 | [推导]: 86 | $$ 87 | \begin{aligned} 88 | V^{\pi}(x)&\leq Q^{\pi}(x,\pi{}'(x))\\ 89 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi}(x{}'))\\ 90 | &\leq \sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma Q^{\pi}(x{}',\pi{}'(x{}')))\\ 91 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma \sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}')))\\ 92 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi{}'}(x{}'))\\ 93 | &=V^{\pi{}'}(x) 94 | \end{aligned} 95 | $$ 96 | 其中,使用了动作改变条件 97 | $$ 98 | Q^{\pi}(x,\pi{}'(x))\geq V^{\pi}(x) 99 | $$ 100 | 以及状态-动作值函数 101 | $$ 102 | Q^{\pi}(x{}',\pi{}'(x{}'))=\sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}')) 103 | $$ 104 | 于是,当前状态的最优值函数为 105 | 106 | $$ 107 | V^{\ast}(x)=V^{\pi{}'}(x)\geq V^{\pi}(x) 108 | $$ 109 | 110 | 111 | 112 | ## 16.31 113 | 114 | $$ 115 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\alpha (R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}')-Q_{t}^{\pi}(x,a)) 116 | $$ 117 | 118 | [推导]:对比公式16.29 119 | $$ 120 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\frac{1}{t+1}(r_{t+1}-Q_{t}^{\pi}(x,a)) 121 | $$ 122 | 以及由 123 | $$ 124 | \frac{1}{t+1}=\alpha 125 | $$ 126 | 可知 127 | $$ 128 | r_{t+1}=R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}') 129 | $$ 130 | 而由γ折扣累积奖赏可估计得到。 131 | 132 | 133 | 134 | -------------------------------------------------------------------------------- /docs/chapter12/chapter12.md: -------------------------------------------------------------------------------- 1 | ## 12.4 2 | $$ 3 | f(E(x)) \leq E(f(x)) 4 | $$ 5 | [推导]:显然,对于任意凸函数,必然有: 6 | $$ 7 | f\left(\alpha x_{1}+(1-\alpha) x_{2}\right) \leq \alpha f\left(x_{1}\right)+(1-\alpha) f\left(x_{2}\right) 8 | $$ 9 | 10 | $$ 11 | f(E(x))=f\left(\frac{1}{m} \sum_{i}^{m} x_{i}\right)=f\left(\frac{m-1}{m} \frac{1}{m-1} \sum_{i}^{m-1} x_{i}+\frac{1}{m} x_{i}\right) 12 | $$ 13 | 取: 14 | $$ 15 | \alpha=\frac{m-1}{m} 16 | $$ 17 | 18 | 所以推得: 19 | $$ 20 | f(E(x)) \leq \frac{m-1}{m} f\left(\frac{1}{m-1} \sum_{i}^{m-1} x_{i}\right)+\frac{1}{m} f\left(x_{m}\right) 21 | $$ 22 | 以此类推得: 23 | $$ 24 | f(E(x)) \leq \frac{1}{m} f\left(x_{1}\right)+\frac{1}{m} f\left(x_{2}\right)+\ldots \ldots+\frac{1}{m} f\left(x_{m}\right)=E(f(x)) 25 | $$ 26 | 27 | ## 12.17 28 | $$ 29 | P(|\hat{E}(h)-E(h)| \geq \varepsilon) \leq 2 e^{-2 m \varepsilon^{2}} 30 | $$ 31 | [推导]:已知Hoeffding不等式:若$x_{1}, x_{2} \ldots . . . x_{m}$为$m$个独立变量,且满足$0 \leq x_{i} \leq 1$ ,则对任意$\varepsilon>0$,有: 32 | $$ 33 | P\left(\left|\frac{1}{m} \sum_{i}^{m} x_{i}-\frac{1}{m} \sum_{i}^{m} E\left(x_{i}\right)\right| \geq \varepsilon\right) \leq 2 e^{-2 m \varepsilon^{2}} 34 | $$ 35 | 将$x_{i}$替换成损失函数$l\left(h\left(x_{i}\right) \neq y_{i}\right)$,显然$0 \leq l\left(h\left(x_{i}\right) \neq y_{i}\right) \leq 1$,且独立,带入Hoeffiding不等式可得: 36 | $$ 37 | P\left( | \frac{1}{m} \sum_{i}^{m} l\left(h\left(x_{i}\right) \neq y_{i}\right)-\frac{1}{m} \sum_{i}^{m} E\left(l\left(x_{i}\right) \neq y_{i}\right)\right) | \geq \varepsilon ) \leq 2 e^{-2 m \varepsilon^{2}} 38 | $$ 39 | 其中: 40 | $$ 41 | \hat{E}(h)=\frac{1}{m} \sum_{i}^{m} l\left(h\left(x_{i}\right) \neq y_{i}\right) 42 | $$ 43 | 44 | $$ 45 | E(h)=P_{x \in \mathbb{D}} l(h(x) \neq y)=E(l(h(x) \neq y))=\frac{1}{m} \sum_{i}^{m} E\left(l\left(h\left(x_{i}\right) \neq y_{i}\right)\right) 46 | $$ 47 | 48 | 所以有: 49 | $$ 50 | P(|\hat{E}(h)-E(h)| \geq \varepsilon) \leq 2 e^{-2 m \varepsilon^{2}} 51 | $$ 52 | 53 | ## 12.18 54 | $$ 55 | \hat{E}(h)-\sqrt{\frac{\ln (2 / \delta)}{2 m}} \leq E(h) \leq \hat{E}(h)+\sqrt{\frac{\ln (2 / \delta)}{2 m}} 56 | $$ 57 | [推导]:由(12.17)可知: 58 | $$ 59 | P(|\hat{E}(h)-E(h)| \geq \varepsilon) \leq 2 e^{-2 m \varepsilon^{2}} 60 | $$ 61 | 成立 62 | 63 | 即: 64 | $$ 65 | P(|\hat{E}(h)-E(h)| \leq \varepsilon) \geq 1-2 e^{-2 m \varepsilon^{2}} 66 | $$ 67 | 取$\delta=2 e^{-2 m \varepsilon^{2}}$,则$\varepsilon=\sqrt{\frac{\ln (2 / \delta)}{2 m}}$ 68 | 69 | 所以$|\hat{E}(h)-E(h)| \leq \sqrt{\frac{\ln (2 / \delta)}{2 m}}$的概率不小于$1-\delta$ 70 | 71 | 整理得: 72 | $$ 73 | \hat{E}(h)-\sqrt{\frac{\ln (2 / \delta)}{2 m}} \leq E(h) \leq \hat{E}(h)+\sqrt{\frac{\ln (2 / \delta)}{2 m}} 74 | $$ 75 | 以至少$1-\delta$的概率成立 76 | 77 | ## 12.59 78 | $$ 79 | l(\varepsilon, D) \leq l_{l o o}(\overline{\varepsilon}, D)+\beta+(4 m \beta+M) \sqrt{ \frac{\ln (1 / \delta)}{2 m}} 80 | $$ 81 | [解析]:取$ \varepsilon=\beta+(4 m \beta+M) \sqrt\frac{\ln (1 / \delta)}{2 m}$时,可以得到: 82 | 83 | $l(\varepsilon, D)-l_{l o o}(\varepsilon, D) \leq \varepsilon$以至少$1-\frac{\delta} 2 $的概率成立,$K$折交叉验证,当$K=m$时,就成了留一法,这时候会有很不错的泛化能力,但是有前提条件,对于损失函数$l$满足$ \beta$均匀稳定性,且$ \beta$应该是$O(\frac{1}m)$这个量级,仅拿出一个样本,可以保证很小的$ \beta$,随着$K$的减小,训练的样本会减少,$\beta$会逐渐增大,当$\beta$量级小于$O(\frac{1}m)$时,交叉验证就会不合理了 84 | 85 | ## 附录 86 | 87 | 给定函数空间$F_{1}, F_{2}$,证明$Rademacher$复杂度: 88 | $$ 89 | R_{m}\left(F_{1}+F_{2}\right) \leq R_{m}\left(F_{1}\right)+R_{m}\left(F_{2}\right) 90 | $$ 91 | [推导]: 92 | $$ 93 | R_{m}\left(F_{1}+F_{2}\right)=E_{Z \in \mathbf{z} :|Z|=m}\left[\hat{R}_{Z}\left(F_{1}+F_{2}\right)\right] 94 | $$ 95 | 96 | $$ 97 | \hat{R}_{Z}\left(F_{1}+F_{2}\right)=E_{\sigma}\left[\sup _{f_{1} F_{1}, f_{2} \in F_{2}} \frac{1}{m} \sum_{i}^{m} \sigma_{i}\left(f_{1}\left(z_{i}\right)+f_{2}\left(z_{i}\right)\right)\right] 98 | $$ 99 | 100 | 当$f_{1}\left(z_{i}\right) f_{2}\left(z_{i}\right)<0$时, 101 | $$ 102 | \sigma_{i}\left(f_{1}\left(z_{i}\right)+f_{2}\left(z_{i}\right)\right)<\sigma_{i 1} f_{1}\left(z_{i}\right)+\sigma_{i 2} f_{2}\left(z_{i}\right) 103 | $$ 104 | 当$f_{1}\left(z_{i}\right) f_{2}\left(z_{i}\right) \geq 0$时, 105 | $$ 106 | \sigma_{i}\left(f_{1}\left(z_{i}\right)+f_{2}\left(z_{i}\right)\right)=\sigma_{i 1} f_{1}\left(z_{i}\right)+\sigma_{i 2} f_{2}\left(z_{i}\right) 107 | $$ 108 | 所以: 109 | $$ 110 | \hat{R}_{Z}\left(F_{1}+F_{2}\right) \leq \hat{R}_{Z}\left(F_{1}\right)+\hat{R}_{Z}\left(F_{2}\right) 111 | $$ 112 | 也即: 113 | $$ 114 | R_{m}\left(F_{1}+F_{2}\right) \leq R_{m}\left(F_{1}\right)+R_{m}\left(F_{2}\right) 115 | $$ 116 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 南瓜书PumpkinBook 2 | 3 | 周志华老师的《机器学习》(西瓜书)是机器学习领域的经典入门教材之一,周老师为了使尽可能多的读者通过西瓜书对机器学习有所了解, 所以在书中对部分公式的推导细节没有详述,但是这对那些想深究公式推导细节的读者来说可能“不太友好”,本书旨在对西瓜书里比较难理解的公式加以解析,以及对部分公式补充具体的推导细节,诚挚欢迎每一位西瓜书读者前来参与完善本书:一个人可以走的很快,但是一群人却可以走的更远。 4 | 5 | ## 使用说明 6 | 7 | 南瓜书仅仅是西瓜书的一些细微补充而已,里面的内容都是以西瓜书的内容为前置知识进行表述的,所以南瓜书的最佳使用方法是以西瓜书为主线,遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。若南瓜书里没有你想要查阅的公式,可以[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号,我们看到后会尽快进行补充。 8 | 9 | ### 在线阅读地址 10 | 11 | 在线阅读地址:https://datawhalechina.github.io/pumpkin-book 12 | 13 | ## 目录 14 | 15 | - 第1章 [绪论](https://datawhalechina.github.io/pumpkin-book/#/chapter1/chapter1) 16 | - 第2章 [模型评估与选择](https://datawhalechina.github.io/pumpkin-book/#/chapter2/chapter2) 17 | - 第3章 [线性模型](https://datawhalechina.github.io/pumpkin-book/#/chapter3/chapter3) 18 | - 第4章 [决策树](https://datawhalechina.github.io/pumpkin-book/#/chapter4/chapter4) 19 | - 第5章 [神经网络](https://datawhalechina.github.io/pumpkin-book/#/chapter5/chapter5) 20 | - 第6章 [支持向量机](https://datawhalechina.github.io/pumpkin-book/#/chapter6/chapter6) 21 | - 第7章 [贝叶斯分类器](https://datawhalechina.github.io/pumpkin-book/#/chapter7/chapter7) 22 | - 第8章 [集成学习](https://datawhalechina.github.io/pumpkin-book/#/chapter8/chapter8) 23 | - 第9章 [聚类](https://datawhalechina.github.io/pumpkin-book/#/chapter9/chapter9) 24 | - 第10章 [降维与度量学习](https://datawhalechina.github.io/pumpkin-book/#/chapter10/chapter10) 25 | - 第11章 [特征选择与稀疏学习](https://datawhalechina.github.io/pumpkin-book/#/chapter11/chapter11) 26 | - 第12章 [计算学习理论](https://datawhalechina.github.io/pumpkin-book/#/chapter12/chapter12) 27 | - 第13章 [半监督学习](https://datawhalechina.github.io/pumpkin-book/#/chapter13/chapter13) 28 | - 第14章 [概率图模型](https://datawhalechina.github.io/pumpkin-book/#/chapter14/chapter14) 29 | - 第15章 规则学习(正在完成中) 30 | - 第16章 [强化学习](https://datawhalechina.github.io/pumpkin-book/#/chapter16/chapter16) 31 | 32 | ## 选用的西瓜书版本 33 | 34 | 35 | 36 | > 书名:机器学习
37 | > 作者:周志华
38 | > 出版社:清华大学出版社
39 | > 版次:2016年1月第1版
40 | > 勘误表:http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm
41 | > 全书共16章,大致分为3个部分:第1部分(第1~3章)介绍机器学习的基础知识;第2部分(第4~10章)讨论一些经典而常用的机器学习方法(决策树、神经网络、支持向量机、贝叶斯分类器、集成学习、聚类、降维与度量学习);第3部分(第11~16章)为进阶知识,内容涉及特征选择与稀疏学习、计算学习理论、半监督学习、概率图模型、规则学习以及强化学习等。 42 | 43 | ## 协作规范 44 | 45 | ### 文档书写规范: 46 | 47 | 文档采用```Markdown```语法编写,数学公式采用```LaTeX```语法编写,数学符号规范参见西瓜书目录前一页《主要符号表》。
48 | 编辑器推荐:[马克飞象](https://maxiang.io) 49 | 50 | ### 目录结构规范: 51 | 52 | ```markdown 53 | pumpkin-book 54 | ├─docs 55 | | ├─chapter1 # 第1章 56 | | | ├─resources # 资源文件夹 57 | | | | └─images # 图片资源 58 | | | └─chapter1.md # 第1章公式全解 59 | | ├─chapter2 60 | ... 61 | ``` 62 | 63 | ### 公式全解文档规范: 64 | 65 | ```markdown 66 | ## 公式编号 67 | 68 | $$(公式的LaTeX表达式)$$ 69 | 70 | [推导]:(公式推导步骤) or [解析]:(公式解析说明) 71 | 72 | ## 附录(可选) 73 | 74 | (附录内容) 75 | ``` 76 | 77 | 例如: 78 | 79 | 80 | ### 主要贡献者(按首字母排名) 81 | 82 | [@awyd234](https://github.com/awyd234) 83 | [@feijuan](https://github.com/feijuan) 84 | [@Ggmatch](https://github.com/Ggmatch) 85 | [@Heitao5200](https://github.com/Heitao5200) 86 | [@huaqing89](https://github.com/huaqing89) 87 | [@juxiao](https://github.com/juxiao) 88 | [@jbb0523](https://blog.csdn.net/jbb0523) 89 | [@LongJH](https://github.com/LongJH) 90 | [@LilRachel](https://github.com/LilRachel) 91 | [@LeoLRH](https://github.com/LeoLRH) 92 | [@Majingmin](https://github.com/Majingmin) 93 | [@MrBigFan](https://github.com/MrBigFan) 94 | [@Nono17](https://github.com/Nono17) 95 | [@spareribs](https://github.com/spareribs) 96 | [@sunchaothu](https://github.com/sunchaothu) 97 | [@StevenLzq](https://github.com/StevenLzq) 98 | [@Sm1les](https://github.com/Sm1les) 99 | [@shanry](https://github.com/shanry) 100 | [@Ye980226](https://github.com/Ye980226) 101 | 102 | 103 | ## 关注我们 104 | 105 |
Datawhale,一个专注于AI领域的学习圈子。初衷是for the learner,和学习者一起成长。目前加入学习社群的人数已经数千人,组织了机器学习,深度学习,数据分析,数据挖掘,爬虫,编程,统计学,Mysql,数据竞赛等多个领域的内容学习,微信搜索公众号Datawhale可以加入我们。
106 | 107 | ## LICENSE 108 | 109 | [GNU General Public License v3.0](https://github.com/datawhalechina/pumpkin-book/blob/master/LICENSE) -------------------------------------------------------------------------------- /docs/chapter8/chapter8.md: -------------------------------------------------------------------------------- 1 | ## 8.3 2 | $$ 3 | \begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned} 4 | $$ 5 | [推导]:由基分类器相互独立,设X为T个基分类器分类正确的次数,因此$\mathrm{X} \sim \mathrm{B}(\mathrm{T}, 1-\mathrm{\epsilon})$ 6 | $$ 7 | \begin{aligned} P(H(x) \neq f(x))=& P(X \leq\lfloor T / 2\rfloor) \\ & \leqslant P(X \leq T / 2) 8 | \\ & =P\left[X-(1-\varepsilon) T \leqslant \frac{T}{2}-(1-\varepsilon) T\right] 9 | \\ & =P\left[X- 10 | (1-\varepsilon) T \leqslant -\frac{T}{2}\left(1-2\varepsilon\right)]\right] 11 | \end{aligned} 12 | $$ 13 | 根据Hoeffding不等式$P(X-(1-\epsilon)T\leqslant -kT) \leq \exp (-2k^2T)$ 14 | 令$k=\frac {(1-2\epsilon)}{2}$得 15 | $$ 16 | \begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned} 17 | $$ 18 | 19 | ## 8.5-8.8 20 | [推导]:由式(8.4)可知 21 | $$H(\boldsymbol{x})=\sum_{t=1}^{T} \alpha_{t} h_{t}(\boldsymbol{x})$$ 22 | 23 | 又由式(8.11)可知 24 | $$ 25 | \alpha_{t}=\frac{1}{2} \ln \left(\frac{1-\epsilon_{t}}{\epsilon_{t}}\right) 26 | $$ 27 | 该分类器的权重只与分类器的错误率负相关(即错误率越大,权重越低) 28 | 29 | (1)先考虑指数损失函数$e^{-f(x) H(x)}$的含义:$f$为真实函数,对于样本$x$来说,$f(\boldsymbol{x}) \in\{-1,+1\}$只能取和两个值,而$H(\boldsymbol{x})$是一个实数; 30 | 当$H(\boldsymbol{x})$的符号与$f(x)$一致时,$f(\boldsymbol{x}) H(\boldsymbol{x})>0$,因此$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}=e^{-|H(\boldsymbol{x})|}<1$,且$|H(\boldsymbol{x})|$越大指数损失函数$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}$越小(这很合理:此时$|H(\boldsymbol{x})|$越大意味着分类器本身对预测结果的信心越大,损失应该越小;若$|H(\boldsymbol{x})|$在零附近,虽然预测正确,但表示分类器本身对预测结果信心很小,损失应该较大); 31 | 当$H(\boldsymbol{x})$的符号与$f(\boldsymbol{x})$不一致时,$f(\boldsymbol{x}) H(\boldsymbol{x})<0$,因此$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}=e^{|H(\boldsymbol{x})|}>1$,且$| H(\boldsymbol{x}) |$越大指数损失函数越大(这很合理:此时$| H(\boldsymbol{x}) |$越大意味着分类器本身对预测结果的信心越大,但预测结果是错的,因此损失应该越大;若$| H(\boldsymbol{x}) |$在零附近,虽然预测错误,但表示分类器本身对预测结果信心很小,虽然错了,损失应该较小); 32 | (2)符号$\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}[\cdot]$的含义:$\mathcal{D}$为概率分布,可简单理解为在数据集$D$中进行一次随机抽样,每个样本被取到的概率;$\mathbb{E}[\cdot]$为经典的期望,则综合起来$\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}[\cdot]$表示在概率分布$\mathcal{D}$上的期望,可简单理解为对数据集$D$以概率$\mathcal{D}$进行加权后的期望。 33 | $$ 34 | \begin{aligned} 35 | \ell_{\mathrm{exp}}(H | \mathcal{D})=&\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}\left[e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}\right] 36 | \\ =&P(f(x)=1|x)*e^{-H(x)}+P(f(x)=-1|x)*e^{H(x)} 37 | \end{aligned} 38 | $$ 39 | 40 | 由于$P(f(x)=1|x)和P(f(x)=-1|x)$为常数 41 | 42 | 故式(8.6)可轻易推知 43 | 44 | $$ 45 | \frac{\partial \ell_{\exp }(H | \mathcal{D})}{\partial H(\boldsymbol{x})}=-e^{-H(\boldsymbol{x})} P(f(\boldsymbol{x})=1 | \boldsymbol{x})+e^{H(\boldsymbol{x})} P(f(\boldsymbol{x})=-1 | \boldsymbol{x}) 46 | $$ 47 | 48 | 令式(8.6)等于0可得 49 | 50 | 式(8.7) 51 | $$ 52 | H(\boldsymbol{x})=\frac{1}{2} \ln \frac{P(f(x)=1 | \boldsymbol{x})}{P(f(x)=-1 | \boldsymbol{x})} 53 | $$ 54 | 式(8.8)显然成立 55 | $$ 56 | \begin{aligned} 57 | \operatorname{sign}(H(\boldsymbol{x}))&=\operatorname{sign}\left(\frac{1}{2} \ln \frac{P(f(x)=1 | \boldsymbol{x})}{P(f(x)=-1 | \boldsymbol{x})}\right) 58 | \\ & =\left\{\begin{array}{ll}{1,} & {P(f(x)=1 | \boldsymbol{x})>P(f(x)=-1 | \boldsymbol{x})} \\ {-1,} & {P(f(x)=1 | \boldsymbol{x})0,使得$|f(\boldsymbol x1)-f(\boldsymbol x2)|≤L|\boldsymbol x1-\boldsymbol x2|$。通俗理解就是,对于函数上的每一对点,都存在一个实数L,使得它们连线斜率的绝对值不大于这个L,其中最小的L称为*Lipschitz*常数。 4 |    将公式变形可以更好的理解:$$\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x) \right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}\leqslant L  (\forall \boldsymbol x,\boldsymbol x{}'),$$ 5 |    进一步,如果$\boldsymbol x{}'\to \boldsymbol x$,即$$\lim_{\boldsymbol x{}'\to \boldsymbol x}\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x)\right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}$$ 6 |    “ *Lipschitz*连续”很常见,知乎有一个问答(https://www.zhihu.com/question/51809602) 对*Lipschitz*连续的解释很形象:以陆地为例, 连续就是说这块地上没有特别陡的坡;其中最陡的地方有多陡呢?这就是所谓的*Lipschitz*常数。 7 | 8 | ## 11.10 9 | 10 | $$ 11 | \hat{f}(x) \simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2} 12 | $$ 13 | 14 | [推导]: 15 | $$ 16 | \begin{aligned} 17 | \hat{f}(x) &\simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2} \\ 18 | &= f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \langle\frac{L}{2}(x-x_{k}),x-x_{k}\rangle \\ 19 | &= f(x_{k})+\langle \nabla f(x_{k})+\frac{L}{2}(x-x_{k}),x-x_{k} \rangle \\ 20 | &= f(x_{k})+\frac{L}{2}\langle\frac{2}{L}\nabla f(x_{k})+(x-x_{k}),x-x_{k} \rangle \\ 21 | &= f(x_{k})+\frac{L}{2}\langle x-x_{k}+\frac{1}{L}\nabla f(x_{k})+\frac{1}{L}\nabla f(x_{k}),x-x_{k}+\frac{1}{L}\nabla f(x_{k})-\frac{1}{L}\nabla f(x_{k}) \rangle \\ 22 | &= f(x_{k})+\frac{L}{2}\left\| x-x_{k}+\frac{1}{L}\nabla f(x_{k}) \right\|_{2}^{2} -\frac{1}{2L}\left\|\nabla f(x_{k})\right\|_{2}^{2} \\ 23 | &= \frac{L}{2}\left\| x-(x_{k}-\frac{1}{L}\nabla f(x_{k})) \right\|_{2}^{2} + const \qquad (因为f(x_{k})和\nabla f(x_{k})是常数) 24 | \end{aligned} 25 | $$ 26 | 27 | ## 11.13 28 | $$\boldsymbol x_{\boldsymbol k+\boldsymbol 1}=\underset{\boldsymbol x}{argmin}\frac{L}{2}\left \| \boldsymbol x -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}$$ 29 | [推导]:假设目标函数为$g(\boldsymbol x)$,则 30 | $$ 31 | \begin{aligned} 32 | g(\boldsymbol x) 33 | & =\frac{L}{2}\left \|\boldsymbol x \boldsymbol -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}\\ 34 | & =\frac{L}{2}\sum_{i=1}^{d}\left \| x^{i} -z^{i}\right \|_{2}^{2}+\lambda \sum_{i=1}^{d}\left \| x^{i} \right \|_{1} \\ 35 | & =\sum_{i=1}^{d}(\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |)& 36 | \end{aligned} 37 | $$ 38 | 由上式可见, $g(\boldsymbol x)$可以拆成 d个独立的函 数,求解式(11.13)可以分别求解d个独立的目标函数。 39 | 针对目标函数$g(x^{i})=\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |$,通过求导求解极值: 40 | $$\frac{dg(x^{i})}{dx^{i}}=L(x^{i}-z^{i})+\lambda sgn(x^{i})$$ 41 | 其中$$sgn(x^{i})=\left\{\begin{matrix} 42 | 1, &x^{i}>0\\ 43 | -1,& x^{i}<0 44 | \end{matrix}\right.$$ 45 | 令导数为0,可得:$$x^{i}=z^{i}-\frac{\lambda }{L}sgn(x^{i})$$可分为三种情况: 46 | 1. 当$z^{i}>\frac{\lambda }{L}$时: 47 | (1)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾。 48 | (2)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}>0$,成立。 49 | 2. 当$z^{i}<-\frac{\lambda }{L}$时: 50 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。 51 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}<0$,成立。 52 | 3. 当$\left |z^{i} \right |<\frac{\lambda }{L}$时: 53 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。 54 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾,此时$x^{i}=0$为函数的极小值。 55 | 综上所述可得函数闭式解如下: 56 | $$x_{k+1}^{i}=\left\{\begin{matrix} 57 | z^{i}-\frac{\lambda }{L}, &\frac{\lambda }{L}< z^{i}\\ 58 | 0, & \left |z^{i} \right |\leqslant \frac{\lambda }{L}\\ 59 | z^{i}+\frac{\lambda }{L}, & z^{i}<-\frac{\lambda }{L} 60 | \end{matrix}\right.$$ 61 | 62 | ## 11.18 63 | $$\begin{aligned} 64 | \underset{\boldsymbol B}{min}\left \|\boldsymbol X-\boldsymbol B\boldsymbol A \right \|_{F}^{2} 65 | & =\underset{b_{i}}{min}\left \| \boldsymbol X-\sum_{j=1}^{k}b_{j}\alpha ^{j} \right \|_{F}^{2}\\ 66 | & =\underset{b_{i}}{min}\left \| \left (\boldsymbol X-\sum_{j\neq i}b_{j}\alpha ^{j} \right )- b_{i}\alpha ^{i}\right \|_{F}^{2} \\ 67 | & =\underset{b_{i}}{min}\left \|\boldsymbol E_{\boldsymbol i}-b_{i}\alpha ^{i} \right \|_{F}^{2} & 68 | \end{aligned} 69 | $$ 70 | [推导]:此处只推导一下$BA=\sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$,其中$\boldsymbol b_{\boldsymbol j}$表示**B**的第j列,$\boldsymbol \alpha ^{\boldsymbol j}$表示**A**的第j行。 71 | 然后,用$b_{j}^{i}$,$\alpha _{j}^{i}$分别表示**B**和**A**的第i行第j列的元素,首先计算**BA**: 72 | $$ 73 | \begin{aligned} 74 | \boldsymbol B\boldsymbol A 75 | & =\begin{bmatrix} 76 | b_{1}^{1} &b_{2}^{1} & \cdot & \cdot & \cdot & b_{k}^{1}\\ 77 | b_{1}^{2} &b_{2}^{2} & \cdot & \cdot & \cdot & b_{k}^{2}\\ 78 | \cdot & \cdot & \cdot & & & \cdot \\ 79 | \cdot & \cdot & & \cdot & &\cdot \\ 80 | \cdot & \cdot & & & \cdot & \cdot \\ 81 | b_{1}^{d}& b_{2}^{d} & \cdot & \cdot &\cdot & b_{k}^{d} 82 | \end{bmatrix}_{d\times k}\cdot 83 | \begin{bmatrix} 84 | \alpha_{1}^{1} &\alpha_{2}^{1} & \cdot & \cdot & \cdot & \alpha_{m}^{1}\\ 85 | \alpha_{1}^{2} &\alpha_{2}^{2} & \cdot & \cdot & \cdot & \alpha_{m}^{2}\\ 86 | \cdot & \cdot & \cdot & & & \cdot \\ 87 | \cdot & \cdot & & \cdot & &\cdot \\ 88 | \cdot & \cdot & & & \cdot & \cdot \\ 89 | \alpha_{1}^{k}& \alpha_{2}^{k} & \cdot & \cdot &\cdot & \alpha_{m}^{k} 90 | \end{bmatrix}_{k\times m} \\ 91 | & =\begin{bmatrix} 92 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\ 93 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\ 94 | \cdot & \cdot & \cdot & & & \cdot \\ 95 | \cdot & \cdot & & \cdot & &\cdot \\ 96 | \cdot & \cdot & & & \cdot & \cdot \\ 97 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j} 98 | \end{bmatrix}_{d\times m} & 99 | \end{aligned} 100 | $$ 101 | 然后计算$\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$: 102 | $$ 103 | \begin{aligned} 104 | \boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j} 105 | & =\begin{bmatrix} 106 | b_{1}^{j}\\ b_{w}^{j} 107 | \\ \cdot 108 | \\ \cdot 109 | \\ \cdot 110 | \\ b_{d}^{j} 111 | \end{bmatrix}\cdot 112 | \begin{bmatrix} 113 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j} 114 | \end{bmatrix}\\ 115 | & =\begin{bmatrix} 116 | b_{j}^{1}\alpha _{1}^{j} &b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{1}\alpha _{m}^{j}\\ 117 | b_{j}^{2}\alpha _{1}^{j} &b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{2}\alpha _{m}^{j}\\ 118 | \cdot & \cdot & \cdot & & & \cdot \\ 119 | \cdot & \cdot & & \cdot & &\cdot \\ 120 | \cdot & \cdot & & & \cdot & \cdot \\ 121 | b_{j}^{d}\alpha _{1}^{j}& b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & b_{j}^{d}\alpha _{m}^{j} 122 | \end{bmatrix}_{d\times m} & 123 | \end{aligned} 124 | $$ 125 | 求和可得: 126 | $$ 127 | \begin{aligned} 128 | \sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j} 129 | & = \sum_{j=1}^{k}\left (\begin{bmatrix} 130 | b_{1}^{j}\\ b_{w}^{j} 131 | \\ \cdot 132 | \\ \cdot 133 | \\ \cdot 134 | \\ b_{d}^{j} 135 | \end{bmatrix}\cdot 136 | \begin{bmatrix} 137 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j} 138 | \end{bmatrix} \right )\\ 139 | & =\begin{bmatrix} 140 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\ 141 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\ 142 | \cdot & \cdot & \cdot & & & \cdot \\ 143 | \cdot & \cdot & & \cdot & &\cdot \\ 144 | \cdot & \cdot & & & \cdot & \cdot \\ 145 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j} 146 | \end{bmatrix}_{d\times m} & 147 | \end{aligned} 148 | $$ 149 | -------------------------------------------------------------------------------- /docs/chapter6/chapter6.md: -------------------------------------------------------------------------------- 1 | ## 6.3 2 | $$ 3 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \geqslant+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \leqslant-1,} & {y_{i}=-1}\end{array}\right. 4 | $$ 5 | [推导]:假设这个超平面是$\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}+b^{\prime}=0$,对于$\left(\boldsymbol{x}_{i}, y_{i}\right) \in D$,有: 6 | $$ 7 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}>0,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}<0,} & {y_{i}=-1}\end{array}\right. 8 | $$ 9 | 根据几何间隔,将以上关系修正为: 10 | $$ 11 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \geq+\zeta,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \leq-\zeta,} & {y_{i}=-1}\end{array}\right. 12 | $$ 13 | 其中$\zeta$为某个大于零的常数,两边同除以$\zeta$,再次修正以上关系为: 14 | $$ 15 | \left\{\begin{array}{ll}{\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \geq+1,} & {y_{i}=+1} \\ {\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \leq-1,} & {y_{i}=-1}\end{array}\right. 16 | $$ 17 | 令:$\boldsymbol{w}=\frac{1}{\zeta} \boldsymbol{w}^{\prime}, b=\frac{b^{\prime}}{\zeta}$,则以上关系可写为: 18 | $$ 19 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \geq+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \leq-1,} & {y_{i}=-1}\end{array}\right. 20 | $$ 21 | 22 | ## 6.8 23 | $$ 24 | L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right) 25 | $$ 26 | [推导]: 27 | 待求目标: 28 | $$\begin{aligned} 29 | \min_{\boldsymbol{x}}\quad f(\boldsymbol{x})\\ 30 | s.t.\quad h(\boldsymbol{x})&=0\\ 31 | g(\boldsymbol{x}) &\leq 0 32 | \end{aligned}$$ 33 | 34 | 等式约束和不等式约束:$h(\boldsymbol{x})=0, g(\boldsymbol{x}) \leq 0$分别是由一个等式方程和一个不等式方程组成的方程组。 35 | 36 | 拉格朗日乘子:$\boldsymbol{\lambda}=\left(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{m}\right)$ $\qquad\boldsymbol{\mu}=\left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$ 37 | 38 | 拉格朗日函数:$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\boldsymbol{\lambda} h(\boldsymbol{x})+\boldsymbol{\mu} g(\boldsymbol{x})$ 39 | 40 | ## 6.9-6.10 41 | $$\begin{aligned} 42 | w &= \sum_{i=1}^m\alpha_iy_i\boldsymbol{x}_i \\ 43 | 0 &=\sum_{i=1}^m\alpha_iy_i 44 | \end{aligned}​$$ 45 | [推导]:式(6.8)可作如下展开: 46 | $$\begin{aligned} 47 | L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)) \\ 48 | & = \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m(\alpha_i-\alpha_iy_i \boldsymbol{w}^T\boldsymbol{x}_i-\alpha_iy_ib)\\ 49 | & =\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib 50 | \end{aligned}​$$ 51 | 对$\boldsymbol{w}$和$b$分别求偏导数​并令其等于0: 52 | 53 | $$\frac {\partial L}{\partial \boldsymbol{w}}=\frac{1}{2}\times2\times\boldsymbol{w} + 0 - \sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i-0= 0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i$$ 54 | 55 | $$\frac {\partial L}{\partial b}=0+0-0-\sum_{i=1}^{m}\alpha_iy_i=0 \Longrightarrow \sum_{i=1}^{m}\alpha_iy_i=0$$ 56 | 57 | ## 6.11 58 | $$\begin{aligned} 59 | \max_{\boldsymbol{\alpha}} & \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\ 60 | s.t. & \sum_{i=1}^m \alpha_i y_i =0 \\ 61 | & \alpha_i \geq 0 \quad i=1,2,\dots ,m 62 | \end{aligned}$$ 63 | [推导]:将式 (6.9)代入 (6.8) ,即可将$L(\boldsymbol{w},b,\boldsymbol{\alpha})$ 中的 $\boldsymbol{w}$ 和 $b$ 消去,再考虑式 (6.10) 的约束,就得到式 (6.6) 的对偶问题: 64 | $$\begin{aligned} 65 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \\ 66 | &=\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i-\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_ 67 | i -b\sum _{i=1}^m\alpha_iy_i \\ 68 | & = -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i -b\sum _{i=1}^m\alpha_iy_i 69 | \end{aligned}$$ 70 | 又$\sum\limits_{i=1}^{m}\alpha_iy_i=0$,所以上式最后一项可化为0,于是得: 71 | $$\begin{aligned} 72 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\ 73 | &=-\frac {1}{2}(\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i)^T(\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i)+\sum _{i=1}^m\alpha_i \\ 74 | &=-\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\ 75 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j 76 | \end{aligned}$$ 77 | 所以 78 | $$\max_{\boldsymbol{\alpha}}\min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) =\max_{\boldsymbol{\alpha}} \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j $$ 79 | 80 | 81 | 82 | 83 | ## 6.39 84 | $$ C=\alpha_i +\mu_i $$ 85 | [推导]:对式(6.36)关于$\xi_i$求偏导并令其等于0可得: 86 | ​ 87 | $$\frac{\partial L}{\partial \xi_i}=0+C \times 1 - \alpha_i \times 1-\mu_i 88 | \times 1 =0\Longrightarrow C=\alpha_i +\mu_i$$ 89 | 90 | ## 6.40 91 | $$\begin{aligned} 92 | \max_{\boldsymbol{\alpha}}&\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\ 93 | s.t. &\sum_{i=1}^m \alpha_i y_i=0 \\ 94 | & 0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m 95 | \end{aligned}$$ 96 | 将式6.37-6.39代入6.36可以得到6.35的对偶问题: 97 | $$\begin{aligned} 98 | \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu}) &= \frac{1}{2}||\boldsymbol{w}||^2+C\sum_{i=1}^m \xi_i+\sum_{i=1}^m \alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))-\sum_{i=1}^m\mu_i \xi_i \\ 99 | &=\frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))+C\sum_{i=1}^m \xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\ 100 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m C\xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\ 101 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m (C-\alpha_i-\mu_i)\xi_i \\ 102 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j 103 | \end{aligned}$$ 104 | 所以 105 | $$\begin{aligned} 106 | \max_{\boldsymbol{\alpha},\boldsymbol{\mu}} \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu})&=\max_{\boldsymbol{\alpha},\boldsymbol{\mu}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\ 107 | &=\max_{\boldsymbol{\alpha}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j 108 | \end{aligned}$$ 109 | 又 110 | $$\begin{aligned} 111 | \alpha_i &\geq 0 \\ 112 | \mu_i &\geq 0 \\ 113 | C &= \alpha_i+\mu_i 114 | \end{aligned}$$ 115 | 消去$\mu_i$可得等价约束条件为: 116 | $$0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m$$ 117 | 118 | ## 6.52 119 | $$ 120 | \left\{\begin{array}{l} 121 | {\alpha_{i}\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i}\right)=0} \\ {\hat{\alpha}_{i}\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i}\right)=0} \\ {\alpha_{i} \hat{\alpha}_{i}=0, \xi_{i} \hat{\xi}_{i}=0} \\ 122 | {\left(C-\alpha_{i}\right) \xi_{i}=0,\left(C-\hat{\alpha}_{i}\right) \hat{\xi}_{i}=0} 123 | \end{array}\right. 124 | $$ 125 | [推导]: 126 | 将式(6.45)的约束条件全部恒等变形为小于等于0的形式可得: 127 | $$ 128 | \left\{\begin{array}{l} 129 | {f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \leq 0 } \\ 130 | {y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \leq 0 } \\ 131 | {-\xi_{i} \leq 0} \\ 132 | {-\hat{\xi}_{i} \leq 0} 133 | \end{array}\right. 134 | $$ 135 | 由于以上四个约束条件的拉格朗日乘子分别为$\alpha_i,\hat{\alpha}_i,\mu_i,\hat{\mu}_i$,所以由西瓜书附录式(B.3)可知,以上四个约束条件可相应转化为以下KKT条件: 136 | $$ 137 | \left\{\begin{array}{l} 138 | {\alpha_i\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \right) = 0 } \\ 139 | {\hat{\alpha}_i\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \right) = 0 } \\ 140 | {-\mu_i\xi_{i} = 0 \Rightarrow \mu_i\xi_{i} = 0 } \\ 141 | {-\hat{\mu}_i \hat{\xi}_{i} = 0 \Rightarrow \hat{\mu}_i \hat{\xi}_{i} = 0 } 142 | \end{array}\right. 143 | $$ 144 | 由式(6.49)和式(6.50)可知: 145 | $$ 146 | \begin{aligned} 147 | \mu_i=C-\alpha_i \\ 148 | \hat{\mu}_i=C-\hat{\alpha}_i 149 | \end{aligned} 150 | $$ 151 | 所以上述KKT条件可以进一步变形为: 152 | $$ 153 | \left\{\begin{array}{l} 154 | {\alpha_i\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \right) = 0 } \\ 155 | {\hat{\alpha}_i\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \right) = 0 } \\ 156 | {(C-\alpha_i)\xi_{i} = 0 } \\ 157 | {(C-\hat{\alpha}_i) \hat{\xi}_{i} = 0 } 158 | \end{array}\right. 159 | $$ 160 | 又因为样本$(\boldsymbol{x}_i,y_i)$只可能处在间隔带的某一侧,那么约束条件$f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i}=0$和$y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i}=0$不可能同时成立,所以$\alpha_i$和$\hat{\alpha}_i$中至少有一个为0,也即$\alpha_i\hat{\alpha}_i=0$。在此基础上再进一步分析可知,如果$\alpha_i=0$的话,那么根据约束$(C-\alpha_i)\xi_{i} = 0$可知此时$\xi_i=0$,同理,如果$\hat{\alpha}_i=0$的话,那么根据约束$(C-\hat{\alpha}_i)\hat{\xi}_{i} = 0$可知此时$\hat{\xi}_i=0$,所以$\xi_i$和$\hat{\xi}_i$中也是至少有一个为0,也即$\xi_{i} \hat{\xi}_{i}=0$。将$\alpha_i\hat{\alpha}_i=0,\xi_{i} \hat{\xi}_{i}=0$整合进上述KKT条件中即可得到式(6.52)。 161 | 162 | 163 | -------------------------------------------------------------------------------- /docs/chapter3/chapter3.md: -------------------------------------------------------------------------------- 1 | ## 3.7 2 | 3 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$ 4 | 5 | [推导]:令式(3.5)等于0: 6 | $$ 0 = w\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i $$ 7 | $$ w\sum_{i=1}^{m}x_i^2 = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}bx_i $$ 8 | 由于令式(3.6)等于0可得$ b=\cfrac{1}{m}\sum_{i=1}^{m}(y_i-wx_i) $,又$ \cfrac{1}{m}\sum_{i=1}^{m}y_i=\bar{y} $,$ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} $,则$ b=\bar{y}-w\bar{x} $,代入上式可得: 9 | $$ 10 | \begin{aligned} 11 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}(\bar{y}-w\bar{x})x_i \\ 12 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i+w\bar{x}\sum_{i=1}^{m}x_i \\ 13 | w(\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i) & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i \\ 14 | w & = \cfrac{\sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i} 15 | \end{aligned} 16 | $$ 17 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}y_i\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i $,$ \bar{x}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}x_i\sum_{i=1}^{m}x_i=\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2 $,代入上式即可得式(3.7): 18 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$ 19 | 20 | 【注】:式(3.7)还可以进一步化简为能用向量表达的形式,将$ \cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2=\bar{x}\sum_{i=1}^{m}x_i $代入分母可得: 21 | $$ 22 | \begin{aligned} 23 | w & = \cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i} \\ 24 | & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x})} 25 | \end{aligned} 26 | $$ 27 | 又因为$ \bar{y}\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i=\sum_{i=1}^{m}\bar{y}x_i=\sum_{i=1}^{m}\bar{x}y_i=m\bar{x}\bar{y}=\sum_{i=1}^{m}\bar{x}\bar{y} $,$\sum_{i=1}^{m}x_i\bar{x}=\bar{x}\sum_{i=1}^{m}x_i=\bar{x}\cdot m \cdot\frac{1}{m}\cdot\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2$,则上式可化为: 28 | $$ 29 | \begin{aligned} 30 | w & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x}-x_i\bar{y}+\bar{x}\bar{y})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+\bar{x}^2)} \\ 31 | & = \cfrac{\sum_{i=1}^{m}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{m}(x_i-\bar{x})^2} 32 | \end{aligned} 33 | $$ 34 | 若令$ \boldsymbol{x}=(x_1,x_2,...,x_m)^T $,$ \boldsymbol{x}_{d}=(x_1-\bar{x},x_2-\bar{x},...,x_m-\bar{x})^T $为去均值后的$ \boldsymbol{x} $,$ \boldsymbol{y}=(y_1,y_2,...,y_m)^T $,$ \boldsymbol{y}_{d}=(y_1-\bar{y},y_2-\bar{y},...,y_m-\bar{y})^T $为去均值后的$ \boldsymbol{y} $,其中$ \boldsymbol{x} $、$ \boldsymbol{x}_{d} $、$ \boldsymbol{y} $、$ \boldsymbol{y}_{d} $均为m行1列的列向量,代入上式可得: 35 | $$ w=\cfrac{\boldsymbol{x}_{d}^T\boldsymbol{y}_{d}}{\boldsymbol{x}_d^T\boldsymbol{x}_{d}}$$ 36 | ## 3.10 37 | 38 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$ 39 | 40 | [推导]:将$ E_{\hat{\boldsymbol w}}=(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w})^T(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w}) $展开可得: 41 | $$ E_{\hat{\boldsymbol w}}= \boldsymbol{y}^T\boldsymbol{y}-\boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}-\hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}+\hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w} $$ 42 | 对$ \hat{\boldsymbol w} $求导可得: 43 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= \cfrac{\partial \boldsymbol{y}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}+\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}} $$ 44 | 由向量的求导公式可得: 45 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= 0-\mathbf{X}^T\boldsymbol{y}-\mathbf{X}^T\boldsymbol{y}+(\mathbf{X}^T\mathbf{X}+\mathbf{X}^T\mathbf{X})\hat{\boldsymbol w} $$ 46 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$ 47 | 48 | ## 3.27 49 | 50 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}(-y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i+\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})) $$ 51 | 52 | [推导]:将式(3.26)代入式(3.25)可得: 53 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}\ln\left(y_ip_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})+(1-y_i)p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right) $$ 54 | 其中$ p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})=\cfrac{e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}},p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})=\cfrac{1}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}} $,代入上式可得: 55 | $$\begin{aligned} 56 | \ell(\boldsymbol{\beta})&=\sum_{i=1}^{m}\ln\left(\cfrac{y_ie^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}+1-y_i}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}\right) \\ 57 | &=\sum_{i=1}^{m}\left(\ln(y_ie^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}+1-y_i)-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right) 58 | \end{aligned}$$ 59 | 由于$ y_i $=0或1,则: 60 | $$ \ell(\boldsymbol{\beta}) = 61 | \begin{cases} 62 | \sum_{i=1}^{m}(-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})), & y_i=0 \\ 63 | \sum_{i=1}^{m}(\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})), & y_i=1 64 | \end{cases} $$ 65 | 两式综合可得: 66 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}\left(y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right) $$ 67 | 由于此式仍为极大似然估计的似然函数,所以最大化似然函数等价于最小化似然函数的相反数,也即在似然函数前添加负号即可得式(3.27)。 68 | 69 | 【注】:若式(3.26)中的似然项改写方式为$ p(y_i|\boldsymbol x_i;\boldsymbol w,b)=[p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{y_i}[p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{1-y_i} $,再将其代入式(3.25)可得: 70 | $$\begin{aligned} 71 | \ell(\boldsymbol{\beta})&=\sum_{i=1}^{m}\ln\left([p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{y_i}[p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{1-y_i}\right) \\ 72 | &=\sum_{i=1}^{m}\left[y_i\ln\left(p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)+(1-y_i)\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right] \\ 73 | &=\sum_{i=1}^{m} \left \{ y_i\left[\ln\left(p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)-\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right]+\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right\} \\ 74 | &=\sum_{i=1}^{m}\left[y_i\ln\left(\cfrac{p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})}{p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})}\right)+\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right] \\ 75 | &=\sum_{i=1}^{m}\left[y_i\ln\left(e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}\right)+\ln\left(\cfrac{1}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}\right)\right] \\ 76 | &=\sum_{i=1}^{m}\left(y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right) 77 | \end{aligned}$$ 78 | 显然,此种方式更易推导出式(3.27) 79 | 80 | ## 3.30 81 | 82 | $$\frac{\partial l(\beta)}{\partial \beta}=-\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-p_1(\hat{\boldsymbol x}_i;\beta))$$ 83 | 84 | [解析]:此式可以进行向量化,令$p_1(\hat{\boldsymbol x}_i;\beta)=\hat{y}_i$,代入上式得: 85 | $$\begin{aligned} 86 | \frac{\partial l(\beta)}{\partial \beta} &= -\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-\hat{y}_i) \\ 87 | & =\sum_{i=1}^{m}\hat{\boldsymbol x}_i(\hat{y}_i-y_i) \\ 88 | & ={\boldsymbol X^T}(\hat{\boldsymbol y}-\boldsymbol{y}) \\ 89 | & ={\boldsymbol X^T}(p_1(\boldsymbol X;\beta)-\boldsymbol{y}) \\ 90 | \end{aligned}$$ 91 | 92 | ## 3.32 93 | 94 | $$J=\cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}$$ 95 | 96 | [推导]: 97 | $$\begin{aligned} 98 | J &= \cfrac{\big|\big|\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\ 99 | &= \cfrac{\big|\big|(\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1)^T\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\ 100 | &= \cfrac{\big|\big|(\mu_0-\mu_1)^T\boldsymbol w\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\ 101 | &= \cfrac{[(\mu_0-\mu_1)^T\boldsymbol w]^T(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\ 102 | &= \cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} 103 | \end{aligned}$$ 104 | 105 | ## 3.37 106 | 107 | $$\boldsymbol S_b\boldsymbol w=\lambda\boldsymbol S_w\boldsymbol w$$ 108 | 109 | [推导]:由3.36可列拉格朗日函数: 110 | $$l(\boldsymbol w)=-\boldsymbol w^T\boldsymbol S_b\boldsymbol w+\lambda(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)$$ 111 | 对$\boldsymbol w$求偏导可得: 112 | $$\begin{aligned} 113 | \cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} &= -\cfrac{\partial(\boldsymbol w^T\boldsymbol S_b\boldsymbol w)}{\partial \boldsymbol w}+\lambda \cfrac{(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)}{\partial \boldsymbol w} \\ 114 | &= -(\boldsymbol S_b+\boldsymbol S_b^T)\boldsymbol w+\lambda(\boldsymbol S_w+\boldsymbol S_w^T)\boldsymbol w 115 | \end{aligned}$$ 116 | 又$\boldsymbol S_b=\boldsymbol S_b^T,\boldsymbol S_w=\boldsymbol S_w^T$,则: 117 | $$\cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} = -2\boldsymbol S_b\boldsymbol w+2\lambda\boldsymbol S_w\boldsymbol w$$ 118 | 令导函数等于0即可得式3.37。 119 | 120 | ## 3.43 121 | 122 | $$\begin{aligned} 123 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\ 124 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T 125 | \end{aligned}$$ 126 | [推导]:由式3.40、3.41、3.42可得: 127 | $$\begin{aligned} 128 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\ 129 | &= \sum_{i=1}^m(\boldsymbol x_i-\boldsymbol\mu)(\boldsymbol x_i-\boldsymbol\mu)^T-\sum_{i=1}^N\sum_{\boldsymbol x\in X_i}(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T \\ 130 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x-\boldsymbol\mu)^T-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T\right)\right) \\ 131 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x^T-\boldsymbol\mu^T)-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x^T-\boldsymbol\mu_i^T)\right)\right) \\ 132 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(\boldsymbol x\boldsymbol x^T - \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T-\boldsymbol x\boldsymbol x^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\ 133 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(- \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\ 134 | &= \sum_{i=1}^N\left(-\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol x^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol\mu^T+\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu_i^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol x^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\ 135 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T-m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\ 136 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\ 137 | &= \sum_{i=1}^Nm_i\left(-\boldsymbol\mu_i\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol\mu_i^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\ 138 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T 139 | \end{aligned}$$ 140 | 141 | ## 3.44 142 | $$\max\limits_{\mathbf{W}}\cfrac{ 143 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})}{tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})}$$ 144 | [解析]:此式是式3.35的推广形式,证明如下: 145 | 设$\mathbf{W}=[\boldsymbol w_1,\boldsymbol w_2,...,\boldsymbol w_i,...,\boldsymbol w_{N-1}]$,其中$\boldsymbol w_i$为$d$行1列的列向量,则: 146 | $$\left\{ 147 | \begin{aligned} 148 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i \\ 149 | tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i 150 | \end{aligned} 151 | \right.$$ 152 | 所以式3.44可变形为: 153 | $$\max\limits_{\mathbf{W}}\cfrac{ 154 | \sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i}{\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i}$$ 155 | 对比式3.35易知上式即为式3.35的推广形式。 156 | -------------------------------------------------------------------------------- /docs/chapter13/chapter13.md: -------------------------------------------------------------------------------- 1 | ## 13.1 2 | 3 | $$p(\boldsymbol{x})=\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)$$ 4 | [解析]: 该式即为 9.4.3 节的式(9.29),式(9.29)中的$k$个混合成分对应于此处的$N$个可能的类别 5 | 6 | ## 13.2 7 | $$ 8 | \begin{aligned} f(\boldsymbol{x}) &=\underset{j \in \mathcal{Y}}{\arg \max } p(y=j | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j, \Theta=i | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j | \Theta=i, \boldsymbol{x}) \cdot p(\Theta=i | \boldsymbol{x}) \end{aligned} 9 | $$ 10 | [解析]: 11 | 首先,该式的变量$\theta \in \{1,2,...,N\}$即为 9.4.3 节的式(9.30)中的 $\ z_j\in\{1,2,...k\}$ 12 | 从公式第 1 行到第 2 行是做了边际化(marginalization);具体来说第 2 行比第 1 行多了$\theta$为了消掉$\theta$对其进行求和(若是连续变量则为积分)$\sum_{i=1}^N$ 13 | [推导]:从公式第 2 行到第 3 行推导如下 14 | $$\begin{aligned} p(y = j,\theta = i \vert x) &= \cfrac {p(y=j, \theta=i,x)} {p(x)} \\ 15 | &=\cfrac{p(y=j ,\theta=i,x)}{p(\theta=i,x)}\cdot \cfrac{p(\theta=i,x)}{p(x)} \\ 16 | &=p(y=j\vert \theta=i,x)\cdot p(\theta=i\vert x)\end{aligned}$$ 17 | [解析]: 18 | 其中$p(y=j\vert x)$表示$x$的类别$y$为第$j$个类别标记的后验概率(注意条件是已知$x$); 19 | $p(y=j,\theta=i\vert x)$表示$x$的类别$y$为第$j$个类别标记且由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$ ); 20 | $p(y=j,\theta=i,x)$表示第$i$个高斯混合成分生成的$x$其类别$y$为第$j$个类别标记的概率(注意条件是已知$\theta$和$x$,这里修改了西瓜书式(13.3)下方对$p(y=j\vert\theta=i,x)$的表述; 21 | $p(\theta=i \vert x)$表示$x$由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$); 22 | 西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”,也就是说,如果已知$x$是由哪个高斯混合成分生成的,也就知道了其类别。而$p(y=j,\theta=i\vert x)$表示已知$\theta$和$x$ 的条件概率(其实已知$\theta$就足够,不需$x$的信息),因此 23 | $$p(y=j\vert \theta=i,x)= 24 | \begin{cases} 25 | 1,&i=j \\ 26 | 0,&i\not=j 27 | \end{cases}$$ 28 | ## 13.3 29 | $$ 30 | p(\Theta=i | \boldsymbol{x})=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} 31 | $$ 32 | [解析]:该式即为 9.4.3 节的式(9.30),具体推导参见有关式(9.30)的解释。 33 | ## 13.4 34 | $$ 35 | \begin{aligned} L L\left(D_{l} \cup D_{u}\right)=& \sum_{\left(x_{j}, y_{j}\right) \in D_{l}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right) \cdot p\left(y_{j} | \Theta=i, \boldsymbol{x}_{j}\right)\right) \\ &+\sum_{x_{j} \in D_{u}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)\right) \end{aligned} 36 | $$ 37 | [解析]:由式(13.2)对概率$p(y=j\vert\theta =i,x)=$的分析,式中第 1 项中的$p(y_j\vert\theta =i,x_j)$ 为 38 | $$p(y_j\vert \theta=i,x_j)= 39 | \begin{cases} 40 | 1,&y_i=i \\ 41 | 0,&y_i\not=i 42 | \end{cases}$$ 43 | 该式第 1 项针对有标记样本$(x_i,y_i) \in D_i$来说,因为有标记样本的类别是确定的,因此在计算它的对数似然时,它只可能来自$N$个高斯混合成分中的一个(西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”),所以计算第 1 项计算有标记样本似然时乘以了$p(y_j\vert\theta =i,x_j)$ ; 44 | 该式第 2 项针对未标记样本$x_j\in D_u$;来说的,因为未标记样本的类别不确定,即它可能来自$N$个高斯混合成分中的任何一个,所以第 1 项使用了式(13.1)。 45 | ## 13.5 46 | $$ 47 | \gamma_{j i}=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} 48 | $$ 49 | [解析]:该式与式(13.3)相同,即后验概率。 可通过有标记数据对模型参数$(\alpha_i,\mu_i,\Sigma_i)$进行初始化,具体来说: 50 | $$\alpha_i = \cfrac{l_i}{|D_l|},where |D_l| = \sum_{i=1}^N l_i$$ 51 | $$\mu_i = \cfrac{1}{l_i}\sum_{(x_j,y_j) \in D_l\wedge y_i=i}(x_j-\mu_j)(x_j-\mu_j)^T$$ 52 | $$ 53 | \Sigma_{i}=\frac{1}{l_{i}} \sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left( x_{j}- \mu_{i}\right)\left( x_{j}-\mu_{i}\right)^{\top} 54 | $$ 55 | 其中$l_i$表示第$i$类样本的有标记样本数目,$|D_l|$为有标记样本集样本总数,$\wedge$为“逻辑与”。 56 | ## 13.6 57 | $$ 58 | \boldsymbol{\mu}_{i}=\frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \boldsymbol{x}_{j}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{x}_{j}\right) 59 | $$ 60 | [推导]:类似于式(9.34)该式由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=0$而得,将式(13.4)的两项分别记为: 61 | $$LL(D_l)=\sum_{(x_j,y_j \in D_l)}ln(\sum_{s=1}^{N}\alpha_s \cdot p(x_j \vert \mu_s,\Sigma_s) \cdot p(y_i|\theta = s,x_j)$$ 62 | $$LL(D_u)=\sum_{x_j \in D_u} ln(\sum_{s=1}^N \alpha_s \cdot p(x_j | \mu_s,\Sigma_s))$$ 63 | 对于式(13.4)中的第 1 项$LL(D_l)$,由于$p(y_j\vert \theta=i,x_j)$取值非1即0(详见13.2,13.4分析),因此 64 | $$LL(D_l)=\sum_{(x_j,y_j)\in D_l} ln(\alpha_{y_j} \cdot p(x_j|\mu_{y_j}, \Sigma_{y_j}))$$ 65 | 若求$LL(D_l)$对$\mu_i$的偏导,则$LL(D_l)$求和号中只有$y_j=i$ 的项能留下来,即 66 | 67 | $$\begin{aligned} 68 | \cfrac{\partial LL(D_l) }{\partial \mu_i} &= 69 | \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\mu_i}\\ 70 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial p(x_j|\mu_i,\Sigma_i)}{\partial\mu_i}\\ 71 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \cdot \Sigma_i^{-1}(x_j-\mu_i)\\ 72 | &=\sum_{x_j \in D_u } \Sigma_i^{-1}(x_j-\mu_i) 73 | \end{aligned}$$ 74 | 75 | 对于式(13.4)中的第 2 项$LL(D_u)$,求导结果与式(9.33)的推导过程一样 76 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=\sum_{x_j \in {D_u}} \cfrac{\alpha_i}{\sum_{s=1}^N \alpha_s \cdotp(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i )\cdot \Sigma_i^{-1}(x_j-\mu_i)$$ 77 | $$=\sum_{x_j \in D_u }\gamma_{ji} \cdot \Sigma_i^{-1}(x_j-\mu_i)$$ 78 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}$为 79 | $$ 80 | \begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \mu_{i}} &=\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot\left(x_{j}-\mu_{i}\right)\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}-\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \mu_{i}-\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}\right) \end{aligned} 81 | $$ 82 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=0$,两边同时左乘$\Sigma_i$可将$\Sigma_i^{-1}$消掉,移项即得 83 | $$ 84 | \sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j} 85 | $$ 86 | 上式中, 可以作为常量提到求和号外面,而$\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} 1=l_{i}$,即第 类样本的有标记 样本数目,因此 87 | $$ 88 | \left(\sum_{x_{j} \in D_{u}} \gamma_{j i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \backslash y_{j}=i} 1\right) \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j} 89 | $$ 90 | 即得式(13.6); 91 | ## 13.7 92 | $$ 93 | \begin{aligned} \boldsymbol{\Sigma}_{i}=& \frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}}\right.\\+& \sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} ) \end{aligned} 94 | $$ 95 | [推导]:类似于13.6 由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}=0$得,化简过程同13.6过程类似 96 | 对于式(13.4)中的第 1 项$LL(D_l)$ ,类似于刚才式(13.6)的推导过程; 97 | $$ 98 | \begin{aligned} \frac{\partial L L\left(D_{l}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{\partial \ln \left(\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right)}{\partial \boldsymbol{\Sigma}_{i}} \\ &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot \frac{\partial p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)}{\partial \boldsymbol{\Sigma}_{i}} \\ 99 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}\\ 100 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\mathbf{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} 101 | \end{aligned} 102 | $$ 103 | 对于式(13.4)中的第 2 项$LL(D_u)$ ,求导结果与式(9.35)的推导过程一样; 104 | $$ 105 | \frac{\partial L L\left(D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} 106 | $$ 107 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}$为 108 | $$\begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\ 109 | &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right)\right.\\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) ) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} 110 | \end{aligned} 111 | $$ 112 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=0$,两边同时右乘$2\Sigma_i$可将 $\cfrac{1}{2}\Sigma_i^{-1}$消掉,移项即得 113 | $$ 114 | \begin{aligned} \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+& \sum_{\left(\boldsymbol{x}_{j}, y_{j} \in D_{l} \wedge y_{j}=i\right.} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top} \\=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{I}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{I} \\ &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{I} \end{aligned} 115 | $$ 116 | 两边同时左乘以$\Sigma_i$,上式变为 117 | $$ 118 | \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{\Sigma}_{i} 119 | $$ 120 | 即得式(13.7); 121 | ## 13.8 122 | $$ 123 | \alpha_{i}=\frac{1}{m}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) 124 | $$ 125 | [推导]:类似于式(9.36),写出$LL(D_l \cup D_u)$的拉格朗日形式 126 | $$\begin{aligned} 127 | \mathcal{L}(D_l \cup D_u,\lambda) &= LL(D_l \cup D_u)+\lambda(\sum_{s=1}^N \alpha_s -1)\\ 128 | & =LL(D_l)+LL(D_u)+\lambda(\sum_{s=1}^N \alpha_s - 1)\\ 129 | \end{aligned}$$ 130 | 类似于式(9.37),对$\alpha_i$求偏导。对于LL(D_u),求导结果与式(9.37)的推导过程一样: 131 | $$\cfrac{\partial LL(D_u)}{\partial\alpha_i} = \sum_{x_j \in D_u} \cfrac{1}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i)$$ 132 | 对于$LL(D_l)$,类似于类似于(13.6)和(13.7)的推导过程 133 | $$\begin{aligned} 134 | \cfrac{\partial LL(D_l)}{\partial\alpha_i} &= \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\alpha_i}\\ 135 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{ \alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial (\alpha_i \cdot p(x_j|\mu_i,\Sigma_i))}{\partial \alpha_i}\\ 136 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \\ 137 | &=\cfrac{1}{\alpha_i} \cdot \sum_{(x_i,y_i)\in D_l \wedge y_j=i} 1 \\ 138 | &=\cfrac{l_i}{\alpha_i} 139 | \end{aligned}$$ 140 | 上式推导过程中,重点注意变量是$\alpha_i$ ,$p(x_j|\mu_i,\Sigma_i)$是常量;最后一行$\alpha_i$相对于求和变量为常量,因此作为公因子提到求和号外面; 为第$i$类样本的有标记样本数目。 141 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}$为 142 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i} = \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda$$ 143 | 令$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}=0$并且两边同乘以$\alpha_i$,得 144 | $$ \alpha_i \cdot \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda \cdot \alpha_i=0$$ 145 | 结合式(9.30)发现,求和号内即为后验概率$\gamma_{ji}$,即 146 | $$l_i+\sum_{x_i \in D_u} \gamma_{ji}+\lambda \alpha_i = 0$$ 147 | 对所有混合成分求和,得 148 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$ 149 | 这里$\Sigma_{i=1}^N \alpha_i =1$ ,因此$\sum_{i=1}^N \lambda \alpha_i=\lambda\sum_{i=1}^N \alpha_i=\lambda$ 150 | 根据(9.30)中$\gamma_{ji}$表达式可知 151 | $$\sum_{i=1}^N \gamma_{ji} = \sum_{i =1}^{N} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}= \cfrac{\sum_{i =1}^{N}\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\sum_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}=1$$ 152 | 再结合加法满足交换律,所以 153 | $$\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}=\sum_{x_i \in D_u} \sum_{i=1}^N \gamma_{ji} =\sum_{x_i \in D_u} 1=u$$ 154 | 以上分析过程中,$\sum_{x_j\in D_u}$ 形式与$\sum_{j=1}^u$等价,其中u为未标记样本集的样本个数; $\sum_{i=1}^Nl_i=l$其中$l$为有标记样本集的样本个数;将这些结果代入 155 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$ 156 | 解出$l+u+\lambda = 0$ 且$l+u =m$ 其中$m$为样本总个数,移项即得$\lambda = -m$ 157 | 最后带入整理解得 158 | $$l_i + \Sigma_{X_j \in{D_u}} \gamma_{ji}-m \alpha_i = 0$$ 159 | 整理即得式(13.8); 160 | 161 | 162 | -------------------------------------------------------------------------------- /docs/chapter9/chapter9.md: -------------------------------------------------------------------------------- 1 | ## 9.5 2 | 3 | $$ 4 | JC=\frac{a}{a+b+c} 5 | $$ 6 | 7 | [解析]:给定两个集合$A$和$B$,则Jaccard系数定义为如下公式 8 | 9 | 10 | $$ 11 | JC=\frac{|A\bigcap B|}{|A\bigcup B|}=\frac{|A\bigcap B|}{|A|+|B|-|A\bigcap B|} 12 | $$ 13 | Jaccard系数可以用来描述两个集合的相似程度。 14 | 15 | 推论:假设全集$U$共有$n$个元素,且$A\subseteq U$,$B\subseteq U$,则每一个元素的位置共有四种情况: 16 | 17 | 1、元素同时在集合$A$和$B$中,这样的元素个数记为$M_{11}$; 18 | 19 | 2、元素出现在集合$A$中,但没有出现在集合$B$中,这样的元素个数记为$M_{10}$; 20 | 21 | 3、元素没有出现在集合$A$中,但出现在集合$B$中,这样的元素个数记为$M_{01}$; 22 | 23 | 4、元素既没有出现在集合$A$中,也没有出现在集合$B$中,这样的元素个数记为$M_{00}$。 24 | 25 | 根据Jaccard系数的定义,此时的Jaccard系数为如下公式 26 | $$ 27 | JC=\frac{M_{11}}{M_{11}+M_{10}+M_{01}} 28 | $$ 29 | 由于聚类属于无监督学习,事先并不知道聚类后样本所属类别的类别标记所代表的意义,即便参考模型的类别标记意义是已知的,我们也无法知道聚类后的类别标记与参考模型的类别标记是如何对应的,况且聚类后的类别总数与参考模型的类别总数还可能不一样,因此只用单个样本无法衡量聚类性能的好坏。 30 | 31 | 由于外部指标的基本思想就是以参考模型的类别划分为参照,因此如果某一个样本对中的两个样本在聚类结果中同属于一个类,在参考模型中也同属于一个类,或者这两个样本在聚类结果中不同属于一个类,在参考模型中也不同属于一个类,那么对于这两个样本来说这是一个好的聚类结果。 32 | 33 | 总的来说所有样本对中的两个样本共存在四种情况:
34 | 1、样本对中的两个样本在聚类结果中属于同一个类,在参考模型中也属于同一个类;
35 | 2、样本对中的两个样本在聚类结果中属于同一个类,在参考模型中不属于同一个类;
36 | 3、样本对中的两个样本在聚类结果中不属于同一个类,在参考模型中属于同一个类;
37 | 4、样本对中的两个样本在聚类结果中不属于同一个类,在参考模型中也不属于同一个类。 38 | 39 | 综上所述,即所有样本对存在着书中公式(9.1)-(9.4)的四种情况,现在假设集合$A$中存放着两个样本都同属于聚类结果的同一个类的样本对,即$A=SS\bigcup SD$,集合$B$中存放着两个样本都同属于参考模型的同一个类的样本对,即$B=SS\bigcup DS$,那么根据Jaccard系数的定义有: 40 | $$ 41 | JC=\frac{|A\bigcap B|}{|A\bigcup B|}=\frac{|SS|}{|SS\bigcup SD\bigcup DS|}=\frac{a}{a+b+c} 42 | $$ 43 | 也可直接将书中公式(9.1)-(9.4)的四种情况类比推论,即$M_{11}=a$,$M_{10}=b$,$M_{01}=c$,所以 44 | $$ 45 | JC=\frac{M_{11}}{M_{11}+M_{10}+M_{01}}=\frac{a}{a+b+c} 46 | $$ 47 | 48 | ## 9.6 49 | $$ 50 | FMI=\sqrt{\frac{a}{a+b}\cdot \frac{a}{a+c}} 51 | $$ 52 | 53 | [解析]:其中$\frac{a}{a+b}$和$\frac{a}{a+c}$为Wallace提出的两个非对称指标,$a$代表两个样本在聚类结果和参考模型中均属于同一类的样本对的个数,$a+b$代表两个样本在聚类结果中属于同一类的样本对的个数,$a+c$代表两个样本在参考模型中属于同一类的样本对的个数,这两个非对称指标均可理解为样本对中的两个样本在聚类结果和参考模型中均属于同一类的概率。由于指标的非对称性,这两个概率值往往不一样,因此Fowlkes和Mallows提出利用几何平均数将这两个非对称指标转化为一个对称指标,即Fowlkes and Mallows Index, FMI。 54 | 55 | ## 9.7 56 | $$ 57 | RI=\frac{2(a+d)}{m(m-1)} 58 | $$ 59 | [解析]:Rand Index定义如下: 60 | $$ 61 | RI=\frac{a+d}{a+b+c+d}=\frac{a+d}{m(m-1)/2}=\frac{2(a+d)}{m(m-1)} 62 | $$ 63 | 即可以理解为两个样本都属于聚类结果和参考模型中的同一类的样本对的个数与两个样本都分别不属于聚类结果和参考模型中的同一类的样本对的个数的总和在所有样本对中出现的频率,可以简单理解为聚类结果与参考模型的一致性。 64 | 65 | 参看 https://en.wikipedia.org/wiki/Rand_index 66 | 67 | ## 9.33 68 | 69 | $$ 70 | \sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0 71 | $$ 72 | 73 | [推导]:根据公式(9.28)可知: 74 | $$ 75 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})} 76 | $$ 77 | 78 | 79 | 又根据公式(9.32),由 80 | $$ 81 | \frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=0 82 | $$ 83 | 可得 84 | $$\begin{aligned} 85 | \frac {\partial LL(D)}{\partial\boldsymbol\mu_{i}}&=\frac {\partial}{\partial \boldsymbol\mu_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\ 86 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\mu_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\ 87 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol{\mu_{i}}}(p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}}))}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})} \\ 88 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}\frac{\partial}{\partial \boldsymbol\mu_{i}}\left(-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 89 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(\left(\boldsymbol\Sigma_{i}^{-1}+\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\right)\cdot\left(\boldsymbol{x_{j}-\mu_{i}}\right)\cdot(-1)\right) \\ 90 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 91 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^T\right)^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 92 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 93 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-2\cdot\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 94 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}}) \\ 95 | \end{aligned}$$ 96 | 令上式等于0可得: 97 | $$\frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})=0 $$ 98 | 左右两边同时乘上$\boldsymbol\Sigma_{i}$可得: 99 | $$\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0 $$ 100 | 101 | ## 9.35 102 | 103 | $$ 104 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji} 105 | $$ 106 | 107 | [推导]:根据公式(9.28)可知: 108 | $$ 109 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})} 110 | $$ 111 | 又根据公式(9.32),由 112 | $$ 113 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=0 114 | $$ 115 | 可得 116 | $$\begin{aligned} 117 | \frac {\partial LL(D)}{\partial\boldsymbol\Sigma_{i}}&=\frac {\partial}{\partial \boldsymbol\Sigma_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\ 118 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\Sigma_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\ 119 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\ 120 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\\ 121 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{ln\left(\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\ 122 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\ 123 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\ 124 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left(-\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}\right)^T-\frac{1}{2}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) 125 | \end{aligned}$$ 126 | 127 | 为求得 128 | $$ 129 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right) 130 | $$ 131 | 132 | 首先分析对$\boldsymbol \Sigma_{i}$中单一元素的求导,用$r$代表矩阵$\boldsymbol\Sigma_{i}$的行索引,$c$代表矩阵$\boldsymbol\Sigma_{i}$的列索引,则 133 | $$\begin{aligned} 134 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\frac{\partial\boldsymbol\Sigma_{i}^{-1}}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right) \\ 135 | &=-\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right) 136 | \end{aligned}$$ 137 | 设$B=\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)$,则 138 | $$\begin{aligned} 139 | B^T&=\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T \\ 140 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\left(\boldsymbol\Sigma_{i}^{-1}\right)^T \\ 141 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1} 142 | \end{aligned}$$ 143 | 所以 144 | $$\begin{aligned} 145 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B\end{aligned}$$ 146 | 其中$B$为$n\times1$阶矩阵,$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$为$n$阶方阵,且$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$仅在$\left(r,c\right)$位置处的元素值为1,其它位置处的元素值均为$0$,所以 147 | $$\begin{aligned} 148 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B=-B_{r}\cdot B_{c}=-\left(B\cdot B^T\right)_{rc}=\left(-B\cdot B^T\right)_{rc}\end{aligned}$$ 149 | 即对$\boldsymbol\Sigma_{i}$中特定位置的元素的求导结果对应于$\left(-B\cdot B^T\right)$中相同位置的元素值,所以 150 | $$\begin{aligned} 151 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=-B\cdot B^T\\ 152 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T\\ 153 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1} 154 | \end{aligned}$$ 155 | 156 | 因此最终结果为 157 | $$ 158 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left( -\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\right)\right)=0 159 | $$ 160 | 161 | 整理可得 162 | $$ 163 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji} 164 | $$ 165 | 166 | ## 9.38 167 | 168 | $$ 169 | \alpha_{i}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji} 170 | $$ 171 | 172 | [推导]:基于公式(9.37)进行恒等变形: 173 | $$ 174 | \sum_{j=1}^m\frac{p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\lambda=0 175 | $$ 176 | 177 | $$ 178 | \Rightarrow\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0 179 | $$ 180 | 181 | 对所有混合成分进行求和: 182 | $$ 183 | \Rightarrow\sum_{i=1}^k\left(\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda\right)=0 184 | $$ 185 | 186 | $$ 187 | \Rightarrow\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\sum_{i=1}^k\alpha_{i}\lambda=0 188 | $$ 189 | 190 | $$ 191 | \Rightarrow\lambda=-\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}=-m 192 | $$ 193 | 194 | 又 195 | $$ 196 | \sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0 197 | $$ 198 | 199 | $$ 200 | \Rightarrow\sum_{j=1}^m\gamma_{ji}+\alpha_{i}\lambda=0 201 | $$ 202 | 203 | $$ 204 | \Rightarrow\alpha_{i}=-\frac{\sum_{j=1}^m\gamma_{ji}}{\lambda}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji} 205 | $$ 206 | 207 | 208 | ## 附录 209 | 210 | 参考公式 211 | $$ 212 | \frac{\partial\boldsymbol x^TB\boldsymbol x}{\partial\boldsymbol x}=\left(B+B^T\right)\boldsymbol x 213 | $$ 214 | $$ 215 | \frac{\partial}{\partial A}ln|A|=\left(A^{-1}\right)^T 216 | $$ 217 | $$ 218 | \frac{\partial}{\partial x}\left(A^{-1}\right)=-A^{-1}\frac{\partial A}{\partial x}A^{-1} 219 | $$ 220 | 参考资料
221 | [1] Meilă, Marina. "Comparing clusterings—an information based distance." Journal of multivariate analysis 98.5 (2007): 873-895.
222 | [2] Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. "On clustering validation techniques." Journal of intelligent information systems 17.2-3 (2001): 107-145.
223 | [3] Petersen, K. B. & Pedersen, M. S. *The Matrix Cookbook*.
224 | [4] Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer. 225 | 226 | 227 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | --------------------------------------------------------------------------------