├── docs ├── .nojekyll ├── chapter2 │ ├── resources │ │ └── images │ │ │ └── lrank.png │ └── chapter2.md ├── _sidebar.md ├── index.html ├── chapter1 │ └── chapter1.md ├── README.md ├── chapter4 │ └── chapter4.md ├── chapter7 │ └── chapter7.md ├── chapter5 │ └── chapter5.md ├── chapter16 │ └── chapter16.md ├── chapter12 │ └── chapter12.md ├── chapter8 │ └── chapter8.md ├── chapter14 │ └── chapter14.md ├── chapter10 │ └── chapter10.md ├── chapter11 │ └── chapter11.md ├── chapter6 │ └── chapter6.md ├── chapter3 │ └── chapter3.md ├── chapter13 │ └── chapter13.md └── chapter9 │ └── chapter9.md ├── ISSUE_TEMPLATE.md ├── res ├── xigua.jpg ├── example.png └── qrcode.jpeg ├── CONTRIBUTING.md ├── .gitignore ├── PULL_REQUEST_TEMPLATE.md ├── README.md └── LICENSE /docs/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 请在这里写上你具体哪一个章节哪一个公式不理解,如果能写清哪里不理解那就最好啦~ -------------------------------------------------------------------------------- /res/xigua.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/xigua.jpg -------------------------------------------------------------------------------- /res/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/example.png -------------------------------------------------------------------------------- /res/qrcode.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/res/qrcode.jpeg -------------------------------------------------------------------------------- /docs/chapter2/resources/images/lrank.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juzhong180236/Pumpkinbook/HEAD/docs/chapter2/resources/images/lrank.png -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # 使用说明 2 | 3 | 南瓜书仅仅是西瓜书的一些细微补充,内容都是以西瓜书的内容为前置知识进行表述的。所以,南瓜书的最佳使用方法是以西瓜书为主线,遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。 4 | 5 | ## 西瓜书待推导或待解析公式征集 6 | 7 | 若南瓜书里没有你想要查阅的公式,请[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号,我们看到后会尽快进行补充。 -------------------------------------------------------------------------------- /docs/_sidebar.md: -------------------------------------------------------------------------------- 1 | - 目录 2 | - [第1章 绪论](chapter1/chapter1.md) 3 | - [第2章 模型评估](chapter2/chapter2.md) 4 | - [第3章 线性模型](chapter3/chapter3.md) 5 | - [第4章 决策树](chapter4/chapter4.md) 6 | - [第5章 神经网络](chapter5/chapter5.md) 7 | - [第6章 支持向量机](chapter6/chapter6.md) 8 | - [第7章 贝叶斯分类器](chapter7/chapter7.md) 9 | - [第8章 集成学习](chapter8/chapter8.md) 10 | - [第9章 聚类](chapter9/chapter9.md) 11 | - [第10章 降维与度量学习](chapter10/chapter10.md) 12 | - [第11章 特征选择与稀疏学习](chapter11/chapter11.md) 13 | - [第12章 计算学习理论](chapter12/chapter12.md) 14 | - [第13章 半监督学习](chapter13/chapter13.md) 15 | - [第14章 概率图模型](chapter14/chapter14.md) 16 | - [第16章 强化学习](chapter16/chapter16.md) 17 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | 5 | # Runtime data 6 | pids 7 | *.pid 8 | *.seed 9 | 10 | # Directory for instrumented libs generated by jscoverage/JSCover 11 | lib-cov 12 | 13 | # Coverage directory used by tools like istanbul 14 | coverage 15 | 16 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 17 | .grunt 18 | 19 | # Compiled binary addons (http://nodejs.org/api/addons.html) 20 | build/Release 21 | 22 | # Dependency directory 23 | # Deployed apps should consider commenting this line out: 24 | # see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git 25 | node_modules 26 | 27 | _book/ 28 | book.pdf 29 | book.epub 30 | book.mobi 31 | 32 | .idea 33 | -------------------------------------------------------------------------------- /PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # 提交PR后请联系 Datawhale 的同学 2 | 3 | 提交时请检查目录结构和公式是否符合规范,无问题即可提交PR 4 | 5 | ## 如果你想加入这个项目: 6 | 7 | 请直接 fork 我们,直接提交Pull Request,我们会不定时检查。 8 | 9 | ## How to Pull Request 10 | 11 | First time contributing to datawhalechina/pumpkin-book? 12 | 13 | If you know how to fix an issue, consider opening a pull request for it. 14 | 15 | ### 目录结构规范: 16 | 17 | ```markdown 18 | pumpkin-book 19 | ├─docs 20 | | ├─chapter1 # 第1章 21 | | | ├─resources # 资源文件夹 22 | | | | └─images # 图片资源 23 | | | └─chapter1.md # 第1章公式全解 24 | | ├─chapter2 25 | ... 26 | ``` 27 | 28 | ### 公式全解文档规范: 29 | 30 | ```markdown 31 | ## 公式编号 32 | 33 | $$(公式的LaTeX表达式)$$ 34 | 35 | [推导]:(公式推导步骤) or [解析]:(公式解析说明) 36 | 37 | ## 附录(可选) 38 | 39 | (附录内容) 40 | ``` 41 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 | 5 |
14 |
15 | > 书名:机器学习
35 |
36 | > 书名:机器学习
79 |
80 | ### 主要贡献者(按首字母排名)
81 |
82 | [@awyd234](https://github.com/awyd234)
83 | [@feijuan](https://github.com/feijuan)
84 | [@Ggmatch](https://github.com/Ggmatch)
85 | [@Heitao5200](https://github.com/Heitao5200)
86 | [@huaqing89](https://github.com/huaqing89)
87 | [@juxiao](https://github.com/juxiao)
88 | [@jbb0523](https://blog.csdn.net/jbb0523)
89 | [@LongJH](https://github.com/LongJH)
90 | [@LilRachel](https://github.com/LilRachel)
91 | [@LeoLRH](https://github.com/LeoLRH)
92 | [@Majingmin](https://github.com/Majingmin)
93 | [@MrBigFan](https://github.com/MrBigFan)
94 | [@Nono17](https://github.com/Nono17)
95 | [@spareribs](https://github.com/spareribs)
96 | [@sunchaothu](https://github.com/sunchaothu)
97 | [@StevenLzq](https://github.com/StevenLzq)
98 | [@Sm1les](https://github.com/Sm1les)
99 | [@shanry](https://github.com/shanry)
100 | [@Ye980226](https://github.com/Ye980226)
101 |
102 |
103 | ## 关注我们
104 |
105 | 
0,使得$|f(\boldsymbol x1)-f(\boldsymbol x2)|≤L|\boldsymbol x1-\boldsymbol x2|$。通俗理解就是,对于函数上的每一对点,都存在一个实数L,使得它们连线斜率的绝对值不大于这个L,其中最小的L称为*Lipschitz*常数。
4 | 将公式变形可以更好的理解:$$\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x) \right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}\leqslant L (\forall \boldsymbol x,\boldsymbol x{}'),$$
5 | 进一步,如果$\boldsymbol x{}'\to \boldsymbol x$,即$$\lim_{\boldsymbol x{}'\to \boldsymbol x}\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x)\right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}$$
6 | “ *Lipschitz*连续”很常见,知乎有一个问答(https://www.zhihu.com/question/51809602) 对*Lipschitz*连续的解释很形象:以陆地为例, 连续就是说这块地上没有特别陡的坡;其中最陡的地方有多陡呢?这就是所谓的*Lipschitz*常数。
7 |
8 | ## 11.10
9 |
10 | $$
11 | \hat{f}(x) \simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2}
12 | $$
13 |
14 | [推导]:
15 | $$
16 | \begin{aligned}
17 | \hat{f}(x) &\simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2} \\
18 | &= f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \langle\frac{L}{2}(x-x_{k}),x-x_{k}\rangle \\
19 | &= f(x_{k})+\langle \nabla f(x_{k})+\frac{L}{2}(x-x_{k}),x-x_{k} \rangle \\
20 | &= f(x_{k})+\frac{L}{2}\langle\frac{2}{L}\nabla f(x_{k})+(x-x_{k}),x-x_{k} \rangle \\
21 | &= f(x_{k})+\frac{L}{2}\langle x-x_{k}+\frac{1}{L}\nabla f(x_{k})+\frac{1}{L}\nabla f(x_{k}),x-x_{k}+\frac{1}{L}\nabla f(x_{k})-\frac{1}{L}\nabla f(x_{k}) \rangle \\
22 | &= f(x_{k})+\frac{L}{2}\left\| x-x_{k}+\frac{1}{L}\nabla f(x_{k}) \right\|_{2}^{2} -\frac{1}{2L}\left\|\nabla f(x_{k})\right\|_{2}^{2} \\
23 | &= \frac{L}{2}\left\| x-(x_{k}-\frac{1}{L}\nabla f(x_{k})) \right\|_{2}^{2} + const \qquad (因为f(x_{k})和\nabla f(x_{k})是常数)
24 | \end{aligned}
25 | $$
26 |
27 | ## 11.13
28 | $$\boldsymbol x_{\boldsymbol k+\boldsymbol 1}=\underset{\boldsymbol x}{argmin}\frac{L}{2}\left \| \boldsymbol x -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}$$
29 | [推导]:假设目标函数为$g(\boldsymbol x)$,则
30 | $$
31 | \begin{aligned}
32 | g(\boldsymbol x)
33 | & =\frac{L}{2}\left \|\boldsymbol x \boldsymbol -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}\\
34 | & =\frac{L}{2}\sum_{i=1}^{d}\left \| x^{i} -z^{i}\right \|_{2}^{2}+\lambda \sum_{i=1}^{d}\left \| x^{i} \right \|_{1} \\
35 | & =\sum_{i=1}^{d}(\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |)&
36 | \end{aligned}
37 | $$
38 | 由上式可见, $g(\boldsymbol x)$可以拆成 d个独立的函 数,求解式(11.13)可以分别求解d个独立的目标函数。
39 | 针对目标函数$g(x^{i})=\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |$,通过求导求解极值:
40 | $$\frac{dg(x^{i})}{dx^{i}}=L(x^{i}-z^{i})+\lambda sgn(x^{i})$$
41 | 其中$$sgn(x^{i})=\left\{\begin{matrix}
42 | 1, &x^{i}>0\\
43 | -1,& x^{i}<0
44 | \end{matrix}\right.$$
45 | 令导数为0,可得:$$x^{i}=z^{i}-\frac{\lambda }{L}sgn(x^{i})$$可分为三种情况:
46 | 1. 当$z^{i}>\frac{\lambda }{L}$时:
47 | (1)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾。
48 | (2)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}>0$,成立。
49 | 2. 当$z^{i}<-\frac{\lambda }{L}$时:
50 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。
51 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}<0$,成立。
52 | 3. 当$\left |z^{i} \right |<\frac{\lambda }{L}$时:
53 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。
54 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾,此时$x^{i}=0$为函数的极小值。
55 | 综上所述可得函数闭式解如下:
56 | $$x_{k+1}^{i}=\left\{\begin{matrix}
57 | z^{i}-\frac{\lambda }{L}, &\frac{\lambda }{L}< z^{i}\\
58 | 0, & \left |z^{i} \right |\leqslant \frac{\lambda }{L}\\
59 | z^{i}+\frac{\lambda }{L}, & z^{i}<-\frac{\lambda }{L}
60 | \end{matrix}\right.$$
61 |
62 | ## 11.18
63 | $$\begin{aligned}
64 | \underset{\boldsymbol B}{min}\left \|\boldsymbol X-\boldsymbol B\boldsymbol A \right \|_{F}^{2}
65 | & =\underset{b_{i}}{min}\left \| \boldsymbol X-\sum_{j=1}^{k}b_{j}\alpha ^{j} \right \|_{F}^{2}\\
66 | & =\underset{b_{i}}{min}\left \| \left (\boldsymbol X-\sum_{j\neq i}b_{j}\alpha ^{j} \right )- b_{i}\alpha ^{i}\right \|_{F}^{2} \\
67 | & =\underset{b_{i}}{min}\left \|\boldsymbol E_{\boldsymbol i}-b_{i}\alpha ^{i} \right \|_{F}^{2} &
68 | \end{aligned}
69 | $$
70 | [推导]:此处只推导一下$BA=\sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$,其中$\boldsymbol b_{\boldsymbol j}$表示**B**的第j列,$\boldsymbol \alpha ^{\boldsymbol j}$表示**A**的第j行。
71 | 然后,用$b_{j}^{i}$,$\alpha _{j}^{i}$分别表示**B**和**A**的第i行第j列的元素,首先计算**BA**:
72 | $$
73 | \begin{aligned}
74 | \boldsymbol B\boldsymbol A
75 | & =\begin{bmatrix}
76 | b_{1}^{1} &b_{2}^{1} & \cdot & \cdot & \cdot & b_{k}^{1}\\
77 | b_{1}^{2} &b_{2}^{2} & \cdot & \cdot & \cdot & b_{k}^{2}\\
78 | \cdot & \cdot & \cdot & & & \cdot \\
79 | \cdot & \cdot & & \cdot & &\cdot \\
80 | \cdot & \cdot & & & \cdot & \cdot \\
81 | b_{1}^{d}& b_{2}^{d} & \cdot & \cdot &\cdot & b_{k}^{d}
82 | \end{bmatrix}_{d\times k}\cdot
83 | \begin{bmatrix}
84 | \alpha_{1}^{1} &\alpha_{2}^{1} & \cdot & \cdot & \cdot & \alpha_{m}^{1}\\
85 | \alpha_{1}^{2} &\alpha_{2}^{2} & \cdot & \cdot & \cdot & \alpha_{m}^{2}\\
86 | \cdot & \cdot & \cdot & & & \cdot \\
87 | \cdot & \cdot & & \cdot & &\cdot \\
88 | \cdot & \cdot & & & \cdot & \cdot \\
89 | \alpha_{1}^{k}& \alpha_{2}^{k} & \cdot & \cdot &\cdot & \alpha_{m}^{k}
90 | \end{bmatrix}_{k\times m} \\
91 | & =\begin{bmatrix}
92 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\
93 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\
94 | \cdot & \cdot & \cdot & & & \cdot \\
95 | \cdot & \cdot & & \cdot & &\cdot \\
96 | \cdot & \cdot & & & \cdot & \cdot \\
97 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
98 | \end{bmatrix}_{d\times m} &
99 | \end{aligned}
100 | $$
101 | 然后计算$\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$:
102 | $$
103 | \begin{aligned}
104 | \boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}
105 | & =\begin{bmatrix}
106 | b_{1}^{j}\\ b_{w}^{j}
107 | \\ \cdot
108 | \\ \cdot
109 | \\ \cdot
110 | \\ b_{d}^{j}
111 | \end{bmatrix}\cdot
112 | \begin{bmatrix}
113 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j}
114 | \end{bmatrix}\\
115 | & =\begin{bmatrix}
116 | b_{j}^{1}\alpha _{1}^{j} &b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{1}\alpha _{m}^{j}\\
117 | b_{j}^{2}\alpha _{1}^{j} &b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{2}\alpha _{m}^{j}\\
118 | \cdot & \cdot & \cdot & & & \cdot \\
119 | \cdot & \cdot & & \cdot & &\cdot \\
120 | \cdot & \cdot & & & \cdot & \cdot \\
121 | b_{j}^{d}\alpha _{1}^{j}& b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & b_{j}^{d}\alpha _{m}^{j}
122 | \end{bmatrix}_{d\times m} &
123 | \end{aligned}
124 | $$
125 | 求和可得:
126 | $$
127 | \begin{aligned}
128 | \sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}
129 | & = \sum_{j=1}^{k}\left (\begin{bmatrix}
130 | b_{1}^{j}\\ b_{w}^{j}
131 | \\ \cdot
132 | \\ \cdot
133 | \\ \cdot
134 | \\ b_{d}^{j}
135 | \end{bmatrix}\cdot
136 | \begin{bmatrix}
137 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j}
138 | \end{bmatrix} \right )\\
139 | & =\begin{bmatrix}
140 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\
141 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\
142 | \cdot & \cdot & \cdot & & & \cdot \\
143 | \cdot & \cdot & & \cdot & &\cdot \\
144 | \cdot & \cdot & & & \cdot & \cdot \\
145 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
146 | \end{bmatrix}_{d\times m} &
147 | \end{aligned}
148 | $$
149 |
--------------------------------------------------------------------------------
/docs/chapter6/chapter6.md:
--------------------------------------------------------------------------------
1 | ## 6.3
2 | $$
3 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \geqslant+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \leqslant-1,} & {y_{i}=-1}\end{array}\right.
4 | $$
5 | [推导]:假设这个超平面是$\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}+b^{\prime}=0$,对于$\left(\boldsymbol{x}_{i}, y_{i}\right) \in D$,有:
6 | $$
7 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}>0,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}<0,} & {y_{i}=-1}\end{array}\right.
8 | $$
9 | 根据几何间隔,将以上关系修正为:
10 | $$
11 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \geq+\zeta,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \leq-\zeta,} & {y_{i}=-1}\end{array}\right.
12 | $$
13 | 其中$\zeta$为某个大于零的常数,两边同除以$\zeta$,再次修正以上关系为:
14 | $$
15 | \left\{\begin{array}{ll}{\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \geq+1,} & {y_{i}=+1} \\ {\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \leq-1,} & {y_{i}=-1}\end{array}\right.
16 | $$
17 | 令:$\boldsymbol{w}=\frac{1}{\zeta} \boldsymbol{w}^{\prime}, b=\frac{b^{\prime}}{\zeta}$,则以上关系可写为:
18 | $$
19 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \geq+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \leq-1,} & {y_{i}=-1}\end{array}\right.
20 | $$
21 |
22 | ## 6.8
23 | $$
24 | L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right)
25 | $$
26 | [推导]:
27 | 待求目标:
28 | $$\begin{aligned}
29 | \min_{\boldsymbol{x}}\quad f(\boldsymbol{x})\\
30 | s.t.\quad h(\boldsymbol{x})&=0\\
31 | g(\boldsymbol{x}) &\leq 0
32 | \end{aligned}$$
33 |
34 | 等式约束和不等式约束:$h(\boldsymbol{x})=0, g(\boldsymbol{x}) \leq 0$分别是由一个等式方程和一个不等式方程组成的方程组。
35 |
36 | 拉格朗日乘子:$\boldsymbol{\lambda}=\left(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{m}\right)$ $\qquad\boldsymbol{\mu}=\left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$
37 |
38 | 拉格朗日函数:$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\boldsymbol{\lambda} h(\boldsymbol{x})+\boldsymbol{\mu} g(\boldsymbol{x})$
39 |
40 | ## 6.9-6.10
41 | $$\begin{aligned}
42 | w &= \sum_{i=1}^m\alpha_iy_i\boldsymbol{x}_i \\
43 | 0 &=\sum_{i=1}^m\alpha_iy_i
44 | \end{aligned}$$
45 | [推导]:式(6.8)可作如下展开:
46 | $$\begin{aligned}
47 | L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)) \\
48 | & = \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m(\alpha_i-\alpha_iy_i \boldsymbol{w}^T\boldsymbol{x}_i-\alpha_iy_ib)\\
49 | & =\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib
50 | \end{aligned}$$
51 | 对$\boldsymbol{w}$和$b$分别求偏导数并令其等于0:
52 |
53 | $$\frac {\partial L}{\partial \boldsymbol{w}}=\frac{1}{2}\times2\times\boldsymbol{w} + 0 - \sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i-0= 0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i$$
54 |
55 | $$\frac {\partial L}{\partial b}=0+0-0-\sum_{i=1}^{m}\alpha_iy_i=0 \Longrightarrow \sum_{i=1}^{m}\alpha_iy_i=0$$
56 |
57 | ## 6.11
58 | $$\begin{aligned}
59 | \max_{\boldsymbol{\alpha}} & \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
60 | s.t. & \sum_{i=1}^m \alpha_i y_i =0 \\
61 | & \alpha_i \geq 0 \quad i=1,2,\dots ,m
62 | \end{aligned}$$
63 | [推导]:将式 (6.9)代入 (6.8) ,即可将$L(\boldsymbol{w},b,\boldsymbol{\alpha})$ 中的 $\boldsymbol{w}$ 和 $b$ 消去,再考虑式 (6.10) 的约束,就得到式 (6.6) 的对偶问题:
64 | $$\begin{aligned}
65 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \\
66 | &=\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i-\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_
67 | i -b\sum _{i=1}^m\alpha_iy_i \\
68 | & = -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i -b\sum _{i=1}^m\alpha_iy_i
69 | \end{aligned}$$
70 | 又$\sum\limits_{i=1}^{m}\alpha_iy_i=0$,所以上式最后一项可化为0,于是得:
71 | $$\begin{aligned}
72 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
73 | &=-\frac {1}{2}(\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i)^T(\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i)+\sum _{i=1}^m\alpha_i \\
74 | &=-\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
75 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
76 | \end{aligned}$$
77 | 所以
78 | $$\max_{\boldsymbol{\alpha}}\min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) =\max_{\boldsymbol{\alpha}} \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j $$
79 |
80 |
81 |
82 |
83 | ## 6.39
84 | $$ C=\alpha_i +\mu_i $$
85 | [推导]:对式(6.36)关于$\xi_i$求偏导并令其等于0可得:
86 |
87 | $$\frac{\partial L}{\partial \xi_i}=0+C \times 1 - \alpha_i \times 1-\mu_i
88 | \times 1 =0\Longrightarrow C=\alpha_i +\mu_i$$
89 |
90 | ## 6.40
91 | $$\begin{aligned}
92 | \max_{\boldsymbol{\alpha}}&\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
93 | s.t. &\sum_{i=1}^m \alpha_i y_i=0 \\
94 | & 0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m
95 | \end{aligned}$$
96 | 将式6.37-6.39代入6.36可以得到6.35的对偶问题:
97 | $$\begin{aligned}
98 | \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu}) &= \frac{1}{2}||\boldsymbol{w}||^2+C\sum_{i=1}^m \xi_i+\sum_{i=1}^m \alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))-\sum_{i=1}^m\mu_i \xi_i \\
99 | &=\frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))+C\sum_{i=1}^m \xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
100 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m C\xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
101 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m (C-\alpha_i-\mu_i)\xi_i \\
102 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
103 | \end{aligned}$$
104 | 所以
105 | $$\begin{aligned}
106 | \max_{\boldsymbol{\alpha},\boldsymbol{\mu}} \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu})&=\max_{\boldsymbol{\alpha},\boldsymbol{\mu}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
107 | &=\max_{\boldsymbol{\alpha}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
108 | \end{aligned}$$
109 | 又
110 | $$\begin{aligned}
111 | \alpha_i &\geq 0 \\
112 | \mu_i &\geq 0 \\
113 | C &= \alpha_i+\mu_i
114 | \end{aligned}$$
115 | 消去$\mu_i$可得等价约束条件为:
116 | $$0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m$$
117 |
118 | ## 6.52
119 | $$
120 | \left\{\begin{array}{l}
121 | {\alpha_{i}\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i}\right)=0} \\ {\hat{\alpha}_{i}\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i}\right)=0} \\ {\alpha_{i} \hat{\alpha}_{i}=0, \xi_{i} \hat{\xi}_{i}=0} \\
122 | {\left(C-\alpha_{i}\right) \xi_{i}=0,\left(C-\hat{\alpha}_{i}\right) \hat{\xi}_{i}=0}
123 | \end{array}\right.
124 | $$
125 | [推导]:
126 | 将式(6.45)的约束条件全部恒等变形为小于等于0的形式可得:
127 | $$
128 | \left\{\begin{array}{l}
129 | {f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \leq 0 } \\
130 | {y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \leq 0 } \\
131 | {-\xi_{i} \leq 0} \\
132 | {-\hat{\xi}_{i} \leq 0}
133 | \end{array}\right.
134 | $$
135 | 由于以上四个约束条件的拉格朗日乘子分别为$\alpha_i,\hat{\alpha}_i,\mu_i,\hat{\mu}_i$,所以由西瓜书附录式(B.3)可知,以上四个约束条件可相应转化为以下KKT条件:
136 | $$
137 | \left\{\begin{array}{l}
138 | {\alpha_i\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \right) = 0 } \\
139 | {\hat{\alpha}_i\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \right) = 0 } \\
140 | {-\mu_i\xi_{i} = 0 \Rightarrow \mu_i\xi_{i} = 0 } \\
141 | {-\hat{\mu}_i \hat{\xi}_{i} = 0 \Rightarrow \hat{\mu}_i \hat{\xi}_{i} = 0 }
142 | \end{array}\right.
143 | $$
144 | 由式(6.49)和式(6.50)可知:
145 | $$
146 | \begin{aligned}
147 | \mu_i=C-\alpha_i \\
148 | \hat{\mu}_i=C-\hat{\alpha}_i
149 | \end{aligned}
150 | $$
151 | 所以上述KKT条件可以进一步变形为:
152 | $$
153 | \left\{\begin{array}{l}
154 | {\alpha_i\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \right) = 0 } \\
155 | {\hat{\alpha}_i\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \right) = 0 } \\
156 | {(C-\alpha_i)\xi_{i} = 0 } \\
157 | {(C-\hat{\alpha}_i) \hat{\xi}_{i} = 0 }
158 | \end{array}\right.
159 | $$
160 | 又因为样本$(\boldsymbol{x}_i,y_i)$只可能处在间隔带的某一侧,那么约束条件$f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i}=0$和$y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i}=0$不可能同时成立,所以$\alpha_i$和$\hat{\alpha}_i$中至少有一个为0,也即$\alpha_i\hat{\alpha}_i=0$。在此基础上再进一步分析可知,如果$\alpha_i=0$的话,那么根据约束$(C-\alpha_i)\xi_{i} = 0$可知此时$\xi_i=0$,同理,如果$\hat{\alpha}_i=0$的话,那么根据约束$(C-\hat{\alpha}_i)\hat{\xi}_{i} = 0$可知此时$\hat{\xi}_i=0$,所以$\xi_i$和$\hat{\xi}_i$中也是至少有一个为0,也即$\xi_{i} \hat{\xi}_{i}=0$。将$\alpha_i\hat{\alpha}_i=0,\xi_{i} \hat{\xi}_{i}=0$整合进上述KKT条件中即可得到式(6.52)。
161 |
162 |
163 |
--------------------------------------------------------------------------------
/docs/chapter3/chapter3.md:
--------------------------------------------------------------------------------
1 | ## 3.7
2 |
3 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
4 |
5 | [推导]:令式(3.5)等于0:
6 | $$ 0 = w\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i $$
7 | $$ w\sum_{i=1}^{m}x_i^2 = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}bx_i $$
8 | 由于令式(3.6)等于0可得$ b=\cfrac{1}{m}\sum_{i=1}^{m}(y_i-wx_i) $,又$ \cfrac{1}{m}\sum_{i=1}^{m}y_i=\bar{y} $,$ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} $,则$ b=\bar{y}-w\bar{x} $,代入上式可得:
9 | $$
10 | \begin{aligned}
11 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}(\bar{y}-w\bar{x})x_i \\
12 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i+w\bar{x}\sum_{i=1}^{m}x_i \\
13 | w(\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i) & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i \\
14 | w & = \cfrac{\sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i}
15 | \end{aligned}
16 | $$
17 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}y_i\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i $,$ \bar{x}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}x_i\sum_{i=1}^{m}x_i=\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2 $,代入上式即可得式(3.7):
18 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
19 |
20 | 【注】:式(3.7)还可以进一步化简为能用向量表达的形式,将$ \cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2=\bar{x}\sum_{i=1}^{m}x_i $代入分母可得:
21 | $$
22 | \begin{aligned}
23 | w & = \cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i} \\
24 | & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x})}
25 | \end{aligned}
26 | $$
27 | 又因为$ \bar{y}\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i=\sum_{i=1}^{m}\bar{y}x_i=\sum_{i=1}^{m}\bar{x}y_i=m\bar{x}\bar{y}=\sum_{i=1}^{m}\bar{x}\bar{y} $,$\sum_{i=1}^{m}x_i\bar{x}=\bar{x}\sum_{i=1}^{m}x_i=\bar{x}\cdot m \cdot\frac{1}{m}\cdot\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2$,则上式可化为:
28 | $$
29 | \begin{aligned}
30 | w & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x}-x_i\bar{y}+\bar{x}\bar{y})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+\bar{x}^2)} \\
31 | & = \cfrac{\sum_{i=1}^{m}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{m}(x_i-\bar{x})^2}
32 | \end{aligned}
33 | $$
34 | 若令$ \boldsymbol{x}=(x_1,x_2,...,x_m)^T $,$ \boldsymbol{x}_{d}=(x_1-\bar{x},x_2-\bar{x},...,x_m-\bar{x})^T $为去均值后的$ \boldsymbol{x} $,$ \boldsymbol{y}=(y_1,y_2,...,y_m)^T $,$ \boldsymbol{y}_{d}=(y_1-\bar{y},y_2-\bar{y},...,y_m-\bar{y})^T $为去均值后的$ \boldsymbol{y} $,其中$ \boldsymbol{x} $、$ \boldsymbol{x}_{d} $、$ \boldsymbol{y} $、$ \boldsymbol{y}_{d} $均为m行1列的列向量,代入上式可得:
35 | $$ w=\cfrac{\boldsymbol{x}_{d}^T\boldsymbol{y}_{d}}{\boldsymbol{x}_d^T\boldsymbol{x}_{d}}$$
36 | ## 3.10
37 |
38 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
39 |
40 | [推导]:将$ E_{\hat{\boldsymbol w}}=(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w})^T(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w}) $展开可得:
41 | $$ E_{\hat{\boldsymbol w}}= \boldsymbol{y}^T\boldsymbol{y}-\boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}-\hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}+\hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w} $$
42 | 对$ \hat{\boldsymbol w} $求导可得:
43 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= \cfrac{\partial \boldsymbol{y}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}+\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}} $$
44 | 由向量的求导公式可得:
45 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= 0-\mathbf{X}^T\boldsymbol{y}-\mathbf{X}^T\boldsymbol{y}+(\mathbf{X}^T\mathbf{X}+\mathbf{X}^T\mathbf{X})\hat{\boldsymbol w} $$
46 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
47 |
48 | ## 3.27
49 |
50 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}(-y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i+\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})) $$
51 |
52 | [推导]:将式(3.26)代入式(3.25)可得:
53 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}\ln\left(y_ip_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})+(1-y_i)p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right) $$
54 | 其中$ p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})=\cfrac{e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}},p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})=\cfrac{1}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}} $,代入上式可得:
55 | $$\begin{aligned}
56 | \ell(\boldsymbol{\beta})&=\sum_{i=1}^{m}\ln\left(\cfrac{y_ie^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}+1-y_i}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}\right) \\
57 | &=\sum_{i=1}^{m}\left(\ln(y_ie^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}+1-y_i)-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right)
58 | \end{aligned}$$
59 | 由于$ y_i $=0或1,则:
60 | $$ \ell(\boldsymbol{\beta}) =
61 | \begin{cases}
62 | \sum_{i=1}^{m}(-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})), & y_i=0 \\
63 | \sum_{i=1}^{m}(\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})), & y_i=1
64 | \end{cases} $$
65 | 两式综合可得:
66 | $$ \ell(\boldsymbol{\beta})=\sum_{i=1}^{m}\left(y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right) $$
67 | 由于此式仍为极大似然估计的似然函数,所以最大化似然函数等价于最小化似然函数的相反数,也即在似然函数前添加负号即可得式(3.27)。
68 |
69 | 【注】:若式(3.26)中的似然项改写方式为$ p(y_i|\boldsymbol x_i;\boldsymbol w,b)=[p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{y_i}[p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{1-y_i} $,再将其代入式(3.25)可得:
70 | $$\begin{aligned}
71 | \ell(\boldsymbol{\beta})&=\sum_{i=1}^{m}\ln\left([p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{y_i}[p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})]^{1-y_i}\right) \\
72 | &=\sum_{i=1}^{m}\left[y_i\ln\left(p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)+(1-y_i)\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right] \\
73 | &=\sum_{i=1}^{m} \left \{ y_i\left[\ln\left(p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)-\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right]+\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right\} \\
74 | &=\sum_{i=1}^{m}\left[y_i\ln\left(\cfrac{p_1(\hat{\boldsymbol x}_i;\boldsymbol{\beta})}{p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})}\right)+\ln\left(p_0(\hat{\boldsymbol x}_i;\boldsymbol{\beta})\right)\right] \\
75 | &=\sum_{i=1}^{m}\left[y_i\ln\left(e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}\right)+\ln\left(\cfrac{1}{1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i}}\right)\right] \\
76 | &=\sum_{i=1}^{m}\left(y_i\boldsymbol{\beta}^T\hat{\boldsymbol x}_i-\ln(1+e^{\boldsymbol{\beta}^T\hat{\boldsymbol x}_i})\right)
77 | \end{aligned}$$
78 | 显然,此种方式更易推导出式(3.27)
79 |
80 | ## 3.30
81 |
82 | $$\frac{\partial l(\beta)}{\partial \beta}=-\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-p_1(\hat{\boldsymbol x}_i;\beta))$$
83 |
84 | [解析]:此式可以进行向量化,令$p_1(\hat{\boldsymbol x}_i;\beta)=\hat{y}_i$,代入上式得:
85 | $$\begin{aligned}
86 | \frac{\partial l(\beta)}{\partial \beta} &= -\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-\hat{y}_i) \\
87 | & =\sum_{i=1}^{m}\hat{\boldsymbol x}_i(\hat{y}_i-y_i) \\
88 | & ={\boldsymbol X^T}(\hat{\boldsymbol y}-\boldsymbol{y}) \\
89 | & ={\boldsymbol X^T}(p_1(\boldsymbol X;\beta)-\boldsymbol{y}) \\
90 | \end{aligned}$$
91 |
92 | ## 3.32
93 |
94 | $$J=\cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}$$
95 |
96 | [推导]:
97 | $$\begin{aligned}
98 | J &= \cfrac{\big|\big|\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
99 | &= \cfrac{\big|\big|(\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1)^T\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
100 | &= \cfrac{\big|\big|(\mu_0-\mu_1)^T\boldsymbol w\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
101 | &= \cfrac{[(\mu_0-\mu_1)^T\boldsymbol w]^T(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
102 | &= \cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}
103 | \end{aligned}$$
104 |
105 | ## 3.37
106 |
107 | $$\boldsymbol S_b\boldsymbol w=\lambda\boldsymbol S_w\boldsymbol w$$
108 |
109 | [推导]:由3.36可列拉格朗日函数:
110 | $$l(\boldsymbol w)=-\boldsymbol w^T\boldsymbol S_b\boldsymbol w+\lambda(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)$$
111 | 对$\boldsymbol w$求偏导可得:
112 | $$\begin{aligned}
113 | \cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} &= -\cfrac{\partial(\boldsymbol w^T\boldsymbol S_b\boldsymbol w)}{\partial \boldsymbol w}+\lambda \cfrac{(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)}{\partial \boldsymbol w} \\
114 | &= -(\boldsymbol S_b+\boldsymbol S_b^T)\boldsymbol w+\lambda(\boldsymbol S_w+\boldsymbol S_w^T)\boldsymbol w
115 | \end{aligned}$$
116 | 又$\boldsymbol S_b=\boldsymbol S_b^T,\boldsymbol S_w=\boldsymbol S_w^T$,则:
117 | $$\cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} = -2\boldsymbol S_b\boldsymbol w+2\lambda\boldsymbol S_w\boldsymbol w$$
118 | 令导函数等于0即可得式3.37。
119 |
120 | ## 3.43
121 |
122 | $$\begin{aligned}
123 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
124 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
125 | \end{aligned}$$
126 | [推导]:由式3.40、3.41、3.42可得:
127 | $$\begin{aligned}
128 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
129 | &= \sum_{i=1}^m(\boldsymbol x_i-\boldsymbol\mu)(\boldsymbol x_i-\boldsymbol\mu)^T-\sum_{i=1}^N\sum_{\boldsymbol x\in X_i}(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T \\
130 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x-\boldsymbol\mu)^T-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T\right)\right) \\
131 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x^T-\boldsymbol\mu^T)-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x^T-\boldsymbol\mu_i^T)\right)\right) \\
132 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(\boldsymbol x\boldsymbol x^T - \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T-\boldsymbol x\boldsymbol x^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
133 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(- \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
134 | &= \sum_{i=1}^N\left(-\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol x^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol\mu^T+\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu_i^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol x^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
135 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T-m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
136 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
137 | &= \sum_{i=1}^Nm_i\left(-\boldsymbol\mu_i\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol\mu_i^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
138 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
139 | \end{aligned}$$
140 |
141 | ## 3.44
142 | $$\max\limits_{\mathbf{W}}\cfrac{
143 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})}{tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})}$$
144 | [解析]:此式是式3.35的推广形式,证明如下:
145 | 设$\mathbf{W}=[\boldsymbol w_1,\boldsymbol w_2,...,\boldsymbol w_i,...,\boldsymbol w_{N-1}]$,其中$\boldsymbol w_i$为$d$行1列的列向量,则:
146 | $$\left\{
147 | \begin{aligned}
148 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i \\
149 | tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i
150 | \end{aligned}
151 | \right.$$
152 | 所以式3.44可变形为:
153 | $$\max\limits_{\mathbf{W}}\cfrac{
154 | \sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i}{\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i}$$
155 | 对比式3.35易知上式即为式3.35的推广形式。
156 |
--------------------------------------------------------------------------------
/docs/chapter13/chapter13.md:
--------------------------------------------------------------------------------
1 | ## 13.1
2 |
3 | $$p(\boldsymbol{x})=\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)$$
4 | [解析]: 该式即为 9.4.3 节的式(9.29),式(9.29)中的$k$个混合成分对应于此处的$N$个可能的类别
5 |
6 | ## 13.2
7 | $$
8 | \begin{aligned} f(\boldsymbol{x}) &=\underset{j \in \mathcal{Y}}{\arg \max } p(y=j | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j, \Theta=i | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j | \Theta=i, \boldsymbol{x}) \cdot p(\Theta=i | \boldsymbol{x}) \end{aligned}
9 | $$
10 | [解析]:
11 | 首先,该式的变量$\theta \in \{1,2,...,N\}$即为 9.4.3 节的式(9.30)中的 $\ z_j\in\{1,2,...k\}$
12 | 从公式第 1 行到第 2 行是做了边际化(marginalization);具体来说第 2 行比第 1 行多了$\theta$为了消掉$\theta$对其进行求和(若是连续变量则为积分)$\sum_{i=1}^N$
13 | [推导]:从公式第 2 行到第 3 行推导如下
14 | $$\begin{aligned} p(y = j,\theta = i \vert x) &= \cfrac {p(y=j, \theta=i,x)} {p(x)} \\
15 | &=\cfrac{p(y=j ,\theta=i,x)}{p(\theta=i,x)}\cdot \cfrac{p(\theta=i,x)}{p(x)} \\
16 | &=p(y=j\vert \theta=i,x)\cdot p(\theta=i\vert x)\end{aligned}$$
17 | [解析]:
18 | 其中$p(y=j\vert x)$表示$x$的类别$y$为第$j$个类别标记的后验概率(注意条件是已知$x$);
19 | $p(y=j,\theta=i\vert x)$表示$x$的类别$y$为第$j$个类别标记且由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$ );
20 | $p(y=j,\theta=i,x)$表示第$i$个高斯混合成分生成的$x$其类别$y$为第$j$个类别标记的概率(注意条件是已知$\theta$和$x$,这里修改了西瓜书式(13.3)下方对$p(y=j\vert\theta=i,x)$的表述;
21 | $p(\theta=i \vert x)$表示$x$由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$);
22 | 西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”,也就是说,如果已知$x$是由哪个高斯混合成分生成的,也就知道了其类别。而$p(y=j,\theta=i\vert x)$表示已知$\theta$和$x$ 的条件概率(其实已知$\theta$就足够,不需$x$的信息),因此
23 | $$p(y=j\vert \theta=i,x)=
24 | \begin{cases}
25 | 1,&i=j \\
26 | 0,&i\not=j
27 | \end{cases}$$
28 | ## 13.3
29 | $$
30 | p(\Theta=i | \boldsymbol{x})=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
31 | $$
32 | [解析]:该式即为 9.4.3 节的式(9.30),具体推导参见有关式(9.30)的解释。
33 | ## 13.4
34 | $$
35 | \begin{aligned} L L\left(D_{l} \cup D_{u}\right)=& \sum_{\left(x_{j}, y_{j}\right) \in D_{l}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right) \cdot p\left(y_{j} | \Theta=i, \boldsymbol{x}_{j}\right)\right) \\ &+\sum_{x_{j} \in D_{u}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)\right) \end{aligned}
36 | $$
37 | [解析]:由式(13.2)对概率$p(y=j\vert\theta =i,x)=$的分析,式中第 1 项中的$p(y_j\vert\theta =i,x_j)$ 为
38 | $$p(y_j\vert \theta=i,x_j)=
39 | \begin{cases}
40 | 1,&y_i=i \\
41 | 0,&y_i\not=i
42 | \end{cases}$$
43 | 该式第 1 项针对有标记样本$(x_i,y_i) \in D_i$来说,因为有标记样本的类别是确定的,因此在计算它的对数似然时,它只可能来自$N$个高斯混合成分中的一个(西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”),所以计算第 1 项计算有标记样本似然时乘以了$p(y_j\vert\theta =i,x_j)$ ;
44 | 该式第 2 项针对未标记样本$x_j\in D_u$;来说的,因为未标记样本的类别不确定,即它可能来自$N$个高斯混合成分中的任何一个,所以第 1 项使用了式(13.1)。
45 | ## 13.5
46 | $$
47 | \gamma_{j i}=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
48 | $$
49 | [解析]:该式与式(13.3)相同,即后验概率。 可通过有标记数据对模型参数$(\alpha_i,\mu_i,\Sigma_i)$进行初始化,具体来说:
50 | $$\alpha_i = \cfrac{l_i}{|D_l|},where |D_l| = \sum_{i=1}^N l_i$$
51 | $$\mu_i = \cfrac{1}{l_i}\sum_{(x_j,y_j) \in D_l\wedge y_i=i}(x_j-\mu_j)(x_j-\mu_j)^T$$
52 | $$
53 | \Sigma_{i}=\frac{1}{l_{i}} \sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left( x_{j}- \mu_{i}\right)\left( x_{j}-\mu_{i}\right)^{\top}
54 | $$
55 | 其中$l_i$表示第$i$类样本的有标记样本数目,$|D_l|$为有标记样本集样本总数,$\wedge$为“逻辑与”。
56 | ## 13.6
57 | $$
58 | \boldsymbol{\mu}_{i}=\frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \boldsymbol{x}_{j}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{x}_{j}\right)
59 | $$
60 | [推导]:类似于式(9.34)该式由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=0$而得,将式(13.4)的两项分别记为:
61 | $$LL(D_l)=\sum_{(x_j,y_j \in D_l)}ln(\sum_{s=1}^{N}\alpha_s \cdot p(x_j \vert \mu_s,\Sigma_s) \cdot p(y_i|\theta = s,x_j)$$
62 | $$LL(D_u)=\sum_{x_j \in D_u} ln(\sum_{s=1}^N \alpha_s \cdot p(x_j | \mu_s,\Sigma_s))$$
63 | 对于式(13.4)中的第 1 项$LL(D_l)$,由于$p(y_j\vert \theta=i,x_j)$取值非1即0(详见13.2,13.4分析),因此
64 | $$LL(D_l)=\sum_{(x_j,y_j)\in D_l} ln(\alpha_{y_j} \cdot p(x_j|\mu_{y_j}, \Sigma_{y_j}))$$
65 | 若求$LL(D_l)$对$\mu_i$的偏导,则$LL(D_l)$求和号中只有$y_j=i$ 的项能留下来,即
66 |
67 | $$\begin{aligned}
68 | \cfrac{\partial LL(D_l) }{\partial \mu_i} &=
69 | \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\mu_i}\\
70 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial p(x_j|\mu_i,\Sigma_i)}{\partial\mu_i}\\
71 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \cdot \Sigma_i^{-1}(x_j-\mu_i)\\
72 | &=\sum_{x_j \in D_u } \Sigma_i^{-1}(x_j-\mu_i)
73 | \end{aligned}$$
74 |
75 | 对于式(13.4)中的第 2 项$LL(D_u)$,求导结果与式(9.33)的推导过程一样
76 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=\sum_{x_j \in {D_u}} \cfrac{\alpha_i}{\sum_{s=1}^N \alpha_s \cdotp(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i )\cdot \Sigma_i^{-1}(x_j-\mu_i)$$
77 | $$=\sum_{x_j \in D_u }\gamma_{ji} \cdot \Sigma_i^{-1}(x_j-\mu_i)$$
78 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}$为
79 | $$
80 | \begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \mu_{i}} &=\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot\left(x_{j}-\mu_{i}\right)\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}-\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \mu_{i}-\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}\right) \end{aligned}
81 | $$
82 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=0$,两边同时左乘$\Sigma_i$可将$\Sigma_i^{-1}$消掉,移项即得
83 | $$
84 | \sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
85 | $$
86 | 上式中, 可以作为常量提到求和号外面,而$\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} 1=l_{i}$,即第 类样本的有标记 样本数目,因此
87 | $$
88 | \left(\sum_{x_{j} \in D_{u}} \gamma_{j i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \backslash y_{j}=i} 1\right) \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
89 | $$
90 | 即得式(13.6);
91 | ## 13.7
92 | $$
93 | \begin{aligned} \boldsymbol{\Sigma}_{i}=& \frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}}\right.\\+& \sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} ) \end{aligned}
94 | $$
95 | [推导]:类似于13.6 由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}=0$得,化简过程同13.6过程类似
96 | 对于式(13.4)中的第 1 项$LL(D_l)$ ,类似于刚才式(13.6)的推导过程;
97 | $$
98 | \begin{aligned} \frac{\partial L L\left(D_{l}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{\partial \ln \left(\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right)}{\partial \boldsymbol{\Sigma}_{i}} \\ &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot \frac{\partial p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)}{\partial \boldsymbol{\Sigma}_{i}} \\
99 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}\\
100 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\mathbf{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
101 | \end{aligned}
102 | $$
103 | 对于式(13.4)中的第 2 项$LL(D_u)$ ,求导结果与式(9.35)的推导过程一样;
104 | $$
105 | \frac{\partial L L\left(D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
106 | $$
107 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}$为
108 | $$\begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\
109 | &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right)\right.\\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) ) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
110 | \end{aligned}
111 | $$
112 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=0$,两边同时右乘$2\Sigma_i$可将 $\cfrac{1}{2}\Sigma_i^{-1}$消掉,移项即得
113 | $$
114 | \begin{aligned} \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+& \sum_{\left(\boldsymbol{x}_{j}, y_{j} \in D_{l} \wedge y_{j}=i\right.} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top} \\=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{I}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{I} \\ &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{I} \end{aligned}
115 | $$
116 | 两边同时左乘以$\Sigma_i$,上式变为
117 | $$
118 | \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{\Sigma}_{i}
119 | $$
120 | 即得式(13.7);
121 | ## 13.8
122 | $$
123 | \alpha_{i}=\frac{1}{m}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right)
124 | $$
125 | [推导]:类似于式(9.36),写出$LL(D_l \cup D_u)$的拉格朗日形式
126 | $$\begin{aligned}
127 | \mathcal{L}(D_l \cup D_u,\lambda) &= LL(D_l \cup D_u)+\lambda(\sum_{s=1}^N \alpha_s -1)\\
128 | & =LL(D_l)+LL(D_u)+\lambda(\sum_{s=1}^N \alpha_s - 1)\\
129 | \end{aligned}$$
130 | 类似于式(9.37),对$\alpha_i$求偏导。对于LL(D_u),求导结果与式(9.37)的推导过程一样:
131 | $$\cfrac{\partial LL(D_u)}{\partial\alpha_i} = \sum_{x_j \in D_u} \cfrac{1}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i)$$
132 | 对于$LL(D_l)$,类似于类似于(13.6)和(13.7)的推导过程
133 | $$\begin{aligned}
134 | \cfrac{\partial LL(D_l)}{\partial\alpha_i} &= \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\alpha_i}\\
135 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{ \alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial (\alpha_i \cdot p(x_j|\mu_i,\Sigma_i))}{\partial \alpha_i}\\
136 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \\
137 | &=\cfrac{1}{\alpha_i} \cdot \sum_{(x_i,y_i)\in D_l \wedge y_j=i} 1 \\
138 | &=\cfrac{l_i}{\alpha_i}
139 | \end{aligned}$$
140 | 上式推导过程中,重点注意变量是$\alpha_i$ ,$p(x_j|\mu_i,\Sigma_i)$是常量;最后一行$\alpha_i$相对于求和变量为常量,因此作为公因子提到求和号外面; 为第$i$类样本的有标记样本数目。
141 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}$为
142 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i} = \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda$$
143 | 令$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}=0$并且两边同乘以$\alpha_i$,得
144 | $$ \alpha_i \cdot \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda \cdot \alpha_i=0$$
145 | 结合式(9.30)发现,求和号内即为后验概率$\gamma_{ji}$,即
146 | $$l_i+\sum_{x_i \in D_u} \gamma_{ji}+\lambda \alpha_i = 0$$
147 | 对所有混合成分求和,得
148 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
149 | 这里$\Sigma_{i=1}^N \alpha_i =1$ ,因此$\sum_{i=1}^N \lambda \alpha_i=\lambda\sum_{i=1}^N \alpha_i=\lambda$
150 | 根据(9.30)中$\gamma_{ji}$表达式可知
151 | $$\sum_{i=1}^N \gamma_{ji} = \sum_{i =1}^{N} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}= \cfrac{\sum_{i =1}^{N}\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\sum_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}=1$$
152 | 再结合加法满足交换律,所以
153 | $$\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}=\sum_{x_i \in D_u} \sum_{i=1}^N \gamma_{ji} =\sum_{x_i \in D_u} 1=u$$
154 | 以上分析过程中,$\sum_{x_j\in D_u}$ 形式与$\sum_{j=1}^u$等价,其中u为未标记样本集的样本个数; $\sum_{i=1}^Nl_i=l$其中$l$为有标记样本集的样本个数;将这些结果代入
155 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
156 | 解出$l+u+\lambda = 0$ 且$l+u =m$ 其中$m$为样本总个数,移项即得$\lambda = -m$
157 | 最后带入整理解得
158 | $$l_i + \Sigma_{X_j \in{D_u}} \gamma_{ji}-m \alpha_i = 0$$
159 | 整理即得式(13.8);
160 |
161 |
162 |
--------------------------------------------------------------------------------
/docs/chapter9/chapter9.md:
--------------------------------------------------------------------------------
1 | ## 9.5
2 |
3 | $$
4 | JC=\frac{a}{a+b+c}
5 | $$
6 |
7 | [解析]:给定两个集合$A$和$B$,则Jaccard系数定义为如下公式
8 |
9 |
10 | $$
11 | JC=\frac{|A\bigcap B|}{|A\bigcup B|}=\frac{|A\bigcap B|}{|A|+|B|-|A\bigcap B|}
12 | $$
13 | Jaccard系数可以用来描述两个集合的相似程度。
14 |
15 | 推论:假设全集$U$共有$n$个元素,且$A\subseteq U$,$B\subseteq U$,则每一个元素的位置共有四种情况:
16 |
17 | 1、元素同时在集合$A$和$B$中,这样的元素个数记为$M_{11}$;
18 |
19 | 2、元素出现在集合$A$中,但没有出现在集合$B$中,这样的元素个数记为$M_{10}$;
20 |
21 | 3、元素没有出现在集合$A$中,但出现在集合$B$中,这样的元素个数记为$M_{01}$;
22 |
23 | 4、元素既没有出现在集合$A$中,也没有出现在集合$B$中,这样的元素个数记为$M_{00}$。
24 |
25 | 根据Jaccard系数的定义,此时的Jaccard系数为如下公式
26 | $$
27 | JC=\frac{M_{11}}{M_{11}+M_{10}+M_{01}}
28 | $$
29 | 由于聚类属于无监督学习,事先并不知道聚类后样本所属类别的类别标记所代表的意义,即便参考模型的类别标记意义是已知的,我们也无法知道聚类后的类别标记与参考模型的类别标记是如何对应的,况且聚类后的类别总数与参考模型的类别总数还可能不一样,因此只用单个样本无法衡量聚类性能的好坏。
30 |
31 | 由于外部指标的基本思想就是以参考模型的类别划分为参照,因此如果某一个样本对中的两个样本在聚类结果中同属于一个类,在参考模型中也同属于一个类,或者这两个样本在聚类结果中不同属于一个类,在参考模型中也不同属于一个类,那么对于这两个样本来说这是一个好的聚类结果。
32 |
33 | 总的来说所有样本对中的两个样本共存在四种情况:
34 | 1、样本对中的两个样本在聚类结果中属于同一个类,在参考模型中也属于同一个类;
35 | 2、样本对中的两个样本在聚类结果中属于同一个类,在参考模型中不属于同一个类;
36 | 3、样本对中的两个样本在聚类结果中不属于同一个类,在参考模型中属于同一个类;
37 | 4、样本对中的两个样本在聚类结果中不属于同一个类,在参考模型中也不属于同一个类。
38 |
39 | 综上所述,即所有样本对存在着书中公式(9.1)-(9.4)的四种情况,现在假设集合$A$中存放着两个样本都同属于聚类结果的同一个类的样本对,即$A=SS\bigcup SD$,集合$B$中存放着两个样本都同属于参考模型的同一个类的样本对,即$B=SS\bigcup DS$,那么根据Jaccard系数的定义有:
40 | $$
41 | JC=\frac{|A\bigcap B|}{|A\bigcup B|}=\frac{|SS|}{|SS\bigcup SD\bigcup DS|}=\frac{a}{a+b+c}
42 | $$
43 | 也可直接将书中公式(9.1)-(9.4)的四种情况类比推论,即$M_{11}=a$,$M_{10}=b$,$M_{01}=c$,所以
44 | $$
45 | JC=\frac{M_{11}}{M_{11}+M_{10}+M_{01}}=\frac{a}{a+b+c}
46 | $$
47 |
48 | ## 9.6
49 | $$
50 | FMI=\sqrt{\frac{a}{a+b}\cdot \frac{a}{a+c}}
51 | $$
52 |
53 | [解析]:其中$\frac{a}{a+b}$和$\frac{a}{a+c}$为Wallace提出的两个非对称指标,$a$代表两个样本在聚类结果和参考模型中均属于同一类的样本对的个数,$a+b$代表两个样本在聚类结果中属于同一类的样本对的个数,$a+c$代表两个样本在参考模型中属于同一类的样本对的个数,这两个非对称指标均可理解为样本对中的两个样本在聚类结果和参考模型中均属于同一类的概率。由于指标的非对称性,这两个概率值往往不一样,因此Fowlkes和Mallows提出利用几何平均数将这两个非对称指标转化为一个对称指标,即Fowlkes and Mallows Index, FMI。
54 |
55 | ## 9.7
56 | $$
57 | RI=\frac{2(a+d)}{m(m-1)}
58 | $$
59 | [解析]:Rand Index定义如下:
60 | $$
61 | RI=\frac{a+d}{a+b+c+d}=\frac{a+d}{m(m-1)/2}=\frac{2(a+d)}{m(m-1)}
62 | $$
63 | 即可以理解为两个样本都属于聚类结果和参考模型中的同一类的样本对的个数与两个样本都分别不属于聚类结果和参考模型中的同一类的样本对的个数的总和在所有样本对中出现的频率,可以简单理解为聚类结果与参考模型的一致性。
64 |
65 | 参看 https://en.wikipedia.org/wiki/Rand_index
66 |
67 | ## 9.33
68 |
69 | $$
70 | \sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0
71 | $$
72 |
73 | [推导]:根据公式(9.28)可知:
74 | $$
75 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
76 | $$
77 |
78 |
79 | 又根据公式(9.32),由
80 | $$
81 | \frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=0
82 | $$
83 | 可得
84 | $$\begin{aligned}
85 | \frac {\partial LL(D)}{\partial\boldsymbol\mu_{i}}&=\frac {\partial}{\partial \boldsymbol\mu_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
86 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\mu_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
87 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol{\mu_{i}}}(p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}}))}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})} \\
88 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}\frac{\partial}{\partial \boldsymbol\mu_{i}}\left(-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
89 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(\left(\boldsymbol\Sigma_{i}^{-1}+\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\right)\cdot\left(\boldsymbol{x_{j}-\mu_{i}}\right)\cdot(-1)\right) \\
90 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
91 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^T\right)^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
92 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
93 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-2\cdot\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
94 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}}) \\
95 | \end{aligned}$$
96 | 令上式等于0可得:
97 | $$\frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})=0 $$
98 | 左右两边同时乘上$\boldsymbol\Sigma_{i}$可得:
99 | $$\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0 $$
100 |
101 | ## 9.35
102 |
103 | $$
104 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
105 | $$
106 |
107 | [推导]:根据公式(9.28)可知:
108 | $$
109 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
110 | $$
111 | 又根据公式(9.32),由
112 | $$
113 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=0
114 | $$
115 | 可得
116 | $$\begin{aligned}
117 | \frac {\partial LL(D)}{\partial\boldsymbol\Sigma_{i}}&=\frac {\partial}{\partial \boldsymbol\Sigma_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
118 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\Sigma_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
119 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
120 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\\
121 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{ln\left(\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
122 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
123 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
124 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left(-\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}\right)^T-\frac{1}{2}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)
125 | \end{aligned}$$
126 |
127 | 为求得
128 | $$
129 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
130 | $$
131 |
132 | 首先分析对$\boldsymbol \Sigma_{i}$中单一元素的求导,用$r$代表矩阵$\boldsymbol\Sigma_{i}$的行索引,$c$代表矩阵$\boldsymbol\Sigma_{i}$的列索引,则
133 | $$\begin{aligned}
134 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\frac{\partial\boldsymbol\Sigma_{i}^{-1}}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right) \\
135 | &=-\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
136 | \end{aligned}$$
137 | 设$B=\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)$,则
138 | $$\begin{aligned}
139 | B^T&=\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T \\
140 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\left(\boldsymbol\Sigma_{i}^{-1}\right)^T \\
141 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
142 | \end{aligned}$$
143 | 所以
144 | $$\begin{aligned}
145 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B\end{aligned}$$
146 | 其中$B$为$n\times1$阶矩阵,$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$为$n$阶方阵,且$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$仅在$\left(r,c\right)$位置处的元素值为1,其它位置处的元素值均为$0$,所以
147 | $$\begin{aligned}
148 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B=-B_{r}\cdot B_{c}=-\left(B\cdot B^T\right)_{rc}=\left(-B\cdot B^T\right)_{rc}\end{aligned}$$
149 | 即对$\boldsymbol\Sigma_{i}$中特定位置的元素的求导结果对应于$\left(-B\cdot B^T\right)$中相同位置的元素值,所以
150 | $$\begin{aligned}
151 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=-B\cdot B^T\\
152 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T\\
153 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
154 | \end{aligned}$$
155 |
156 | 因此最终结果为
157 | $$
158 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left( -\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\right)\right)=0
159 | $$
160 |
161 | 整理可得
162 | $$
163 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
164 | $$
165 |
166 | ## 9.38
167 |
168 | $$
169 | \alpha_{i}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
170 | $$
171 |
172 | [推导]:基于公式(9.37)进行恒等变形:
173 | $$
174 | \sum_{j=1}^m\frac{p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\lambda=0
175 | $$
176 |
177 | $$
178 | \Rightarrow\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
179 | $$
180 |
181 | 对所有混合成分进行求和:
182 | $$
183 | \Rightarrow\sum_{i=1}^k\left(\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda\right)=0
184 | $$
185 |
186 | $$
187 | \Rightarrow\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\sum_{i=1}^k\alpha_{i}\lambda=0
188 | $$
189 |
190 | $$
191 | \Rightarrow\lambda=-\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}=-m
192 | $$
193 |
194 | 又
195 | $$
196 | \sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
197 | $$
198 |
199 | $$
200 | \Rightarrow\sum_{j=1}^m\gamma_{ji}+\alpha_{i}\lambda=0
201 | $$
202 |
203 | $$
204 | \Rightarrow\alpha_{i}=-\frac{\sum_{j=1}^m\gamma_{ji}}{\lambda}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
205 | $$
206 |
207 |
208 | ## 附录
209 |
210 | 参考公式
211 | $$
212 | \frac{\partial\boldsymbol x^TB\boldsymbol x}{\partial\boldsymbol x}=\left(B+B^T\right)\boldsymbol x
213 | $$
214 | $$
215 | \frac{\partial}{\partial A}ln|A|=\left(A^{-1}\right)^T
216 | $$
217 | $$
218 | \frac{\partial}{\partial x}\left(A^{-1}\right)=-A^{-1}\frac{\partial A}{\partial x}A^{-1}
219 | $$
220 | 参考资料
221 | [1] Meilă, Marina. "Comparing clusterings—an information based distance." Journal of multivariate analysis 98.5 (2007): 873-895.
222 | [2] Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. "On clustering validation techniques." Journal of intelligent information systems 17.2-3 (2001): 107-145.
223 | [3] Petersen, K. B. & Pedersen, M. S. *The Matrix Cookbook*.
224 | [4] Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer.
225 |
226 |
227 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.