├── docs ├── .nojekyll ├── chapter2 │ ├── resources │ │ └── images │ │ │ └── lrank.png │ └── chapter2.md ├── _sidebar.md ├── chapter1 │ └── chapter1.md ├── index.html ├── README.md ├── chapter4 │ └── chapter4.md ├── chapter7 │ └── chapter7.md ├── chapter8 │ └── chapter8.md ├── chapter5 │ └── chapter5.md ├── chapter16 │ └── chapter16.md ├── chapter14 │ └── chapter14.md ├── chapter10 │ └── chapter10.md ├── chapter6 │ └── chapter6.md ├── chapter11 │ └── chapter11.md ├── chapter3 │ └── chapter3.md ├── chapter9 │ └── chapter9.md └── chapter13 │ └── chapter13.md ├── res ├── example.png ├── qrcode.jpeg └── xigua.jpg ├── .gitignore ├── README.md └── LICENSE /docs/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /res/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/example.png -------------------------------------------------------------------------------- /res/qrcode.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/qrcode.jpeg -------------------------------------------------------------------------------- /res/xigua.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/xigua.jpg -------------------------------------------------------------------------------- /docs/chapter2/resources/images/lrank.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/docs/chapter2/resources/images/lrank.png -------------------------------------------------------------------------------- /docs/_sidebar.md: -------------------------------------------------------------------------------- 1 | - 目录 2 | - [第1章 绪论](chapter1/chapter1.md) 3 | - [第2章 模型评估](chapter2/chapter2.md) 4 | - [第3章 线性模型](chapter3/chapter3.md) 5 | - [第4章 决策树](chapter4/chapter4.md) 6 | - [第5章 神经网络](chapter5/chapter5.md) 7 | - [第6章 支持向量机](chapter6/chapter6.md) 8 | - [第7章 贝叶斯分类器](chapter7/chapter7.md) 9 | - [第8章 集成学习](chapter8/chapter8.md) 10 | - [第9章 聚类](chapter9/chapter9.md) 11 | - [第10章 降维与度量学习](chapter10/chapter10.md) 12 | - [第11章 特征选择与稀疏学习](chapter11/chapter11.md) 13 | - [第13章 半监督学习](chapter13/chapter13.md) 14 | - [第14章 概率图模型](chapter14/chapter14.md) 15 | - [第16章 强化学习](chapter16/chapter16.md) 16 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | 5 | # Runtime data 6 | pids 7 | *.pid 8 | *.seed 9 | 10 | # Directory for instrumented libs generated by jscoverage/JSCover 11 | lib-cov 12 | 13 | # Coverage directory used by tools like istanbul 14 | coverage 15 | 16 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 17 | .grunt 18 | 19 | # Compiled binary addons (http://nodejs.org/api/addons.html) 20 | build/Release 21 | 22 | # Dependency directory 23 | # Deployed apps should consider commenting this line out: 24 | # see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git 25 | node_modules 26 | 27 | _book/ 28 | book.pdf 29 | book.epub 30 | book.mobi 31 | 32 | .idea 33 | -------------------------------------------------------------------------------- /docs/chapter1/chapter1.md: -------------------------------------------------------------------------------- 1 | ## 1.2 2 | $$\begin{aligned} 3 | \sum_{f}E_{ote}(\mathfrak{L}_a\vert X,f) &= \sum_f\sum_h\sum_{x\in\mathcal{X}-X}P(x)\mathbb{I}(h(x)\neq f(x))P(h\vert X,\mathfrak{L}_a) \\ 4 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\sum_f\mathbb{I}(h(x)\neq f(x)) \\ 5 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\cfrac{1}{2}2^{\vert \mathcal{X} \vert} \\ 6 | &=\cfrac{1}{2}2^{\vert \mathcal{X} \vert}\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a) \\ 7 | &=2^{\vert \mathcal{X} \vert-1}\sum_{x\in\mathcal{X}-X}P(x) \cdot 1\\ 8 | \end{aligned}$$ 9 | 10 | [解析]:第一步到第二步是因为$\sum_i^m\sum_j^n\sum_k^o a_ib_jc_k=\sum_i^m a_i \cdot \sum_j^n b_j \cdot \sum_k^o c_k$; 11 | 第二步到第三步:首先要知道此时$f$的定义为**任何能将样本映射到{0,1}的函数+均匀分布**,也即不止一个$f$且每个$f$出现的概率相等,例如样本空间只有两个样本时:$ \mathcal{X}=\{x_1,x_2\},\vert \mathcal{X} \vert=2$,那么所有的真实目标函数$f$为: 12 | $$\begin{aligned} 13 | f_1:f_1(x_1)=0,f_1(x_2)=0;\\ 14 | f_2:f_2(x_1)=0,f_2(x_2)=1;\\ 15 | f_3:f_3(x_1)=1,f_3(x_2)=0;\\ 16 | f_4:f_4(x_1)=1,f_4(x_2)=1; 17 | \end{aligned}$$ 18 | 一共$2^{\vert \mathcal{X} \vert}=2^2=4$个真实目标函数。所以此时通过算法$\mathfrak{L}_a$学习出来的模型$h(x)$对每个样本无论预测值为0还是1必然有一半的$f$与之预测值相等,所以$\sum_f\mathbb{I}(h(x)\neq f(x)) = \cfrac{1}{2}2^{\vert \mathcal{X} \vert} $; 19 | 第三步一直到最后有点概率论的基础应该都能看懂了。 -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 | 5 |
9 |
10 | > 书名:机器学习
70 |
71 |
72 | # 主要贡献者(按首字母排名)
73 | [@awyd234](https://github.com/awyd234)
74 | [@Heitao5200](https://github.com/Heitao5200)
75 | [@juxiao](https://github.com/juxiao)
76 | [@LongJH](https://github.com/LongJH)
77 | [@LilRachel](https://github.com/LilRachel)
78 | [@Majingmin](https://github.com/Majingmin)
79 | [@spareribs](https://github.com/spareribs)
80 | [@sunchaothu](https://github.com/sunchaothu)
81 | [@StevenLzq](https://github.com/StevenLzq)
82 | [@Sm1les](https://github.com/Sm1les)
83 | [@Ye980226](https://github.com/Ye980226)
84 |
85 | # 关注我们
86 |
87 | 
根据《统计学习方法》§2可知,假设误分类点集合为$M$,$\boldsymbol x_i \in M$为误分类点,$\boldsymbol x_i$的真实标签为$y_i$,模型的预测值为$\hat{y_i}$,对于误分类点$\boldsymbol x_i$来说,此时$\boldsymbol w^T \boldsymbol x_i \gt 0,\hat{y_i}=1,y_i=0$或$\boldsymbol w^T \boldsymbol x_i \lt 0,\hat{y_i}=0,y_i=1$,综合考虑两种情形可得:
8 | $$(\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i>0$$
9 | 所以可以推得损失函数为:
10 | $$L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i$$
11 | 损失函数的梯度为:
12 | $$\nabla_w L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol x_i$$
13 | 随机选取一个误分类点$(\boldsymbol x_i,y_i)$,对$\boldsymbol w$进行更新:
14 | $$\boldsymbol w \leftarrow \boldsymbol w-\eta(\hat{y_i}-y_i)\boldsymbol x_i=\boldsymbol w+\eta(y_i-\hat{y_i})\boldsymbol x_i$$
15 | 显然式5.2为$\boldsymbol w$的第$i$个分量$w_i$的变化情况
16 | ## 5.12
17 | $$\Delta \theta_j = -\eta g_j$$
18 | [推导]:因为
19 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}$$
20 | 又
21 | $$
22 | \begin{aligned}
23 | \cfrac{\partial E_k}{\partial \theta_j} &= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial \hat{y}_j^k}{\partial \theta_j} \\
24 | &= (\hat{y}_j^k-y_j^k) \cdot f’(\beta_j-\theta_j) \cdot (-1) \\
25 | &= -(\hat{y}_j^k-y_j^k)f’(\beta_j-\theta_j) \\
26 | &= g_j
27 | \end{aligned}
28 | $$
29 | 所以
30 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}=-\eta g_j$$
31 | ## 5.13
32 | $$\Delta v_{ih} = \eta e_h x_i$$
33 | [推导]:因为
34 | $$\Delta v_{ih} = -\eta \cfrac{\partial E_k}{\partial v_{ih}}$$
35 | 又
36 | $$
37 | \begin{aligned}
38 | \cfrac{\partial E_k}{\partial v_{ih}} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot \cfrac{\partial \alpha_h}{\partial v_{ih}} \\
39 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot x_i \\
40 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
41 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
42 | &= \sum_{j=1}^{l} (-g_j) \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
43 | &= -f’(\alpha_h-\gamma_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i\\
44 | &= -b_h(1-b_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i \\
45 | &= -e_h \cdot x_i
46 | \end{aligned}
47 | $$
48 | 所以
49 | $$\Delta v_{ih} = -\eta \cdot -e_h \cdot x_i=\eta e_h x_i$$
50 | ## 5.14
51 | $$\Delta \gamma_h= -\eta e_h$$
52 | [推导]:因为
53 | $$\Delta \gamma_h = -\eta \cfrac{\partial E_k}{\partial \gamma_h}$$
54 | 又
55 | $$
56 | \begin{aligned}
57 | \cfrac{\partial E_k}{\partial \gamma_h} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \gamma_h} \\
58 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot (-1) \\
59 | &= -\sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h)\\
60 | &=e_h
61 | \end{aligned}
62 | $$
63 | 所以
64 | $$\Delta \gamma_h= -\eta e_h$$
--------------------------------------------------------------------------------
/docs/chapter16/chapter16.md:
--------------------------------------------------------------------------------
1 | ## 16.2
2 | $$
3 | Q_{n}(k)=\frac{1}{n}\left((n-1)\times Q_{n-1}(k)+v_{n}\right)
4 | $$
5 |
6 | [推导]:
7 | $$
8 | Q_{n}(k)=\frac{1}{n}\sum_{i=1}^{n}v_{i}=\frac{1}{n}\left(\sum_{i=1}^{n-1}v_{i}+v_{n}\right)=\frac{1}{n}\left((n-1)Q_{n-1}(k)+v_{n}\right)
9 | $$
10 |
11 | ## 16.4
12 |
13 | $$
14 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}}
15 | $$
16 |
17 | $$
18 | \tau越小则平均奖赏高的摇臂被选取的概率越高
19 | $$
20 |
21 | [解析]:
22 | $$
23 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}}\propto e^{\frac{Q(k)}{\tau }}\propto\frac{Q(k)}{\tau }\propto\frac{1}{\tau}
24 | $$
25 |
26 | ## 16.7
27 |
28 | $$
29 | \begin{aligned}
30 | V_{T}^{\pi}(x)&=\mathbb{E}_{\pi}[\frac{1}{T}\sum_{t=1}^{T}r_{t}\mid x_{0}=x]\\
31 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\
32 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}'])\\
33 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}V_{T-1}^{\pi}(x{}')])
34 | \end{aligned}
35 | $$
36 |
37 | [解析]:
38 |
39 | 因为
40 | $$
41 | \pi(x,a)=P(state=x\mid action=a)
42 | $$
43 | 表示在状态 **x**下选择动作 **a**的概率,
44 |
45 | 又因为动作事件之间两两互斥且和为动作空间,由全概率展开公式
46 | $$
47 | P(A)=\sum_{i=1}^{\infty}P(B_{i})P(A\mid B_{i})
48 | $$
49 | 可得
50 | $$
51 | \begin{aligned}
52 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\
53 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}'])
54 | \end{aligned}
55 | $$
56 | 其中
57 | $$
58 | r_{1}=\pi(x,a)P_{x\rightarrow x{}'}^{a}R_{x\rightarrow x{}'}^{a}
59 | $$
60 | 最后一个等式用到了递归形式。
61 |
62 |
63 |
64 | ## 16.8
65 |
66 | $$
67 | V_{\gamma }^{\pi}(x)=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}'))
68 | $$
69 |
70 | [推导]:
71 | $$
72 | \begin{aligned}
73 | V_{\gamma }^{\pi}(x)&=\mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x]\\
74 | &=\mathbb{E}_{\pi}[r_{1}+\sum_{t=1}^{\infty}\gamma^{t}r_{t+1}\mid x_{0}=x]\\
75 | &=\mathbb{E}_{\pi}[r_{1}+\gamma\sum_{t=1}^{\infty}\gamma^{t-1}r_{t+1}\mid x_{0}=x]\\
76 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma \mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x{}'])\\
77 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}'))
78 | \end{aligned}
79 | $$
80 |
81 | ## 16.16
82 |
83 | $$
84 | V^{\pi}(x)\leq V^{\pi{}'}(x)
85 | $$
86 |
87 | [推导]:
88 | $$
89 | \begin{aligned}
90 | V^{\pi}(x)&\leq Q^{\pi}(x,\pi{}'(x))\\
91 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi}(x{}'))\\
92 | &\leq \sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma Q^{\pi}(x{}',\pi{}'(x{}')))\\
93 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma \sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}')))\\
94 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi{}'}(x{}'))\\
95 | &=V^{\pi{}'}(x)
96 | \end{aligned}
97 | $$
98 | 其中,使用了动作改变条件
99 | $$
100 | Q^{\pi}(x,\pi{}'(x))\geq V^{\pi}(x)
101 | $$
102 | 以及状态-动作值函数
103 | $$
104 | Q^{\pi}(x{}',\pi{}'(x{}'))=\sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}'))
105 | $$
106 | 于是,当前状态的最优值函数为
107 |
108 | $$
109 | V^{\ast}(x)=V^{\pi{}'}(x)\geq V^{\pi}(x)
110 | $$
111 |
112 |
113 |
114 | ## 16.31
115 |
116 | $$
117 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\alpha (R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}')-Q_{t}^{\pi}(x,a))
118 | $$
119 |
120 | [推导]:对比公式16.29
121 | $$
122 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\frac{1}{t+1}(r_{t+1}-Q_{t}^{\pi}(x,a))
123 | $$
124 | 以及由
125 | $$
126 | \frac{1}{t+1}=\alpha
127 | $$
128 | 可知
129 | $$
130 | r_{t+1}=R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}')
131 | $$
132 | 而由γ折扣累积奖赏可估计得到。
133 |
134 |
135 |
136 |
--------------------------------------------------------------------------------
/docs/chapter14/chapter14.md:
--------------------------------------------------------------------------------
1 | ## 14.26
2 |
3 | $$p(x^t)T(x^{t-1}|x^t)=p(x^{t-1})T(x^t|x^{t-1})$$
4 |
5 | [解析]:假设变量$x$所在的空间有$n$个状态($s_1,s_2,..,s_n$), 定义在该空间上的一个转移矩阵$T(n\times n)$如果满足一定的条件则该马尔可夫过程存在一个稳态分布$\pi$, 使得
6 | $$
7 | \begin{aligned}
8 | \pi T=\pi
9 | \end{aligned}
10 | \tag{1}
11 | $$
12 | 其中, $\pi$是一个是一个$n$维向量,代表$s_1,s_2,..,s_n$对应的概率. 反过来, 如果我们希望采样得到符合某个分布$\pi$的一系列变量$x_1,x_2,..,x_t$, 应当采用哪一个转移矩阵$T(n\times n)$呢?
13 |
14 | 事实上,转移矩阵只需要满足马尔可夫细致平稳条件
15 | $$
16 | \begin{aligned}
17 | \pi (i)T(i,j)=\pi (j)T(j,i)
18 | \end{aligned}
19 | \tag{2}
20 | $$
21 | 即公式$14.26$,这里采用的符号与西瓜书略有区别以便于理解. 证明如下
22 | $$
23 | \begin{aligned}
24 | \pi T(j) = \sum _i \pi (i)T(i,j) = \sum _i \pi (j)T(j,i) = \pi(j)
25 | \end{aligned}
26 | \tag{3}
27 | $$
28 | 假设采样得到的序列为$x_1,x_2,..,x_{t-1},x_t$,则可以使用$MH$算法来使得$x_{t-1}$(假设为状态$s_i$)转移到$x_t$(假设为状态$s_j$)的概率满足式$(2)$.
29 |
30 | ## 14.28
31 |
32 | $$A(x^* | x^{t-1}) = \min\left ( 1,\frac{p(x^*)Q(x^{t-1} | x^*) }{p(x^{t-1})Q(x^* | x^{t-1})} \right )$$
33 |
34 | [推导]:这个公式其实是拒绝采样的一个trick,因为基于式$14.27$只需要
35 | $$
36 | \begin{aligned}
37 | A(x^* | x^{t-1}) &= p(x^*)Q(x^{t-1} | x^*) \\
38 | A(x^{t-1} | x^*) &= p(x^{t-1})Q(x^* | x^{t-1})
39 | \end{aligned}
40 | \tag{4}
41 | $$
42 | 即可满足式$14.26$,但是实际上等号右边的数值可能比较小,比如各为0.1和0.2,那么好不容易才到的样本只有百分之十几得到利用,所以不妨将接受率设为0.5和1,则细致平稳分布条件依然满足,样本利用率大大提高, 所以可以将$(4)$改进为
43 | $$
44 | \begin{aligned}
45 | A(x^* | x^{t-1}) &= \frac{p(x^*)Q(x^{t-1} | x^*)}{norm} \\
46 | A(x^{t-1} | x^*) &= \frac{p(x^{t-1})Q(x^* | x^{t-1}) }{norm}
47 | \end{aligned}
48 | \tag{5}
49 | $$
50 | 其中
51 | $$
52 | \begin{aligned}
53 | norm = \max\left (p(x^{t-1})Q(x^* | x^{t-1}),p(x^*)Q(x^{t-1} | x^*) \right )
54 | \end{aligned}
55 | \tag{6}
56 | $$
57 | 即教材的$14.28$.
58 |
59 | ## 14.32
60 |
61 | $${\rm ln}p(x)=\mathcal{L}(q)+{\rm KL}(q \parallel p)$$
62 |
63 | [推导]:根据条件概率公式$p(x,z)=p(z|x)*p(x)$,可以得到$p(x)=\frac{p(x,z)}{p(z|x)}$
64 |
65 | 然后两边同时作用${\rm ln}$函数,可得${\rm ln}p(x)={\rm ln}\frac{p(x,z)}{p(z|x)}$ (1)
66 |
67 | 因为$q(z)$是概率密度函数,所以$1=\int q(z)dz$
68 |
69 | 等式两边同时乘以${\rm ln}p(x)$,因为${\rm ln}p(x)$是不关于变量$z$的函数,所以${\rm ln}p(x)$可以拿进积分里面,得到${\rm ln}p(x)=\int q(z){\rm ln}p(x)dz$
70 | $$
71 | \begin{aligned}
72 | {\rm ln}p(x)&=\int q(z){\rm ln}p(x) \\
73 | &=\int q(z){\rm ln}\frac{p(x,z)}{p(z|x)}\qquad(带入公式(1))\\
74 | &=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\cdot\frac{q(z)}{p(z|x)}\bigg\} \\
75 | &=\int q(z)\bigg({\rm ln}\frac{p(x,z)}{q(z)}-{\rm ln}\frac{p(z|x)}{q(z)}\bigg) \\
76 | &=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\bigg\}-\int q(z){\rm ln}\frac{p(z|x)}{q(z)} \\
77 | &=\mathcal{L}(q)+{\rm KL}(q \parallel p)\qquad(根据\mathcal{L}和{\rm KL}的定义)
78 | \end{aligned}
79 | $$
80 |
81 |
82 | ## 14.36
83 |
84 | $$
85 | \begin{aligned}
86 | \mathcal{L}(q)&=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}} \\
87 | &=\int q_{j}\bigg\{\int p(x,z)\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const} \\
88 | &=\int q_{j}{\rm ln}\tilde{p}({\rm \mathbf{x},\mathbf{z_{j}}})d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const}
89 | \end{aligned}
90 | $$
91 |
92 | [推导]:
93 | $$
94 | \mathcal{L}(q)=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}}=\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}-\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}
95 | $$
96 | 公式可以看做两个积分相减,我们先来看左边积分$\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}$的推导。
97 | $$
98 | \begin{aligned}
99 | \int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} &= \int q_{j}\prod_{i\ne j}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} \\
100 | &= \int q_{j}\bigg\{\int{\rm ln}p({\rm \mathbf{x},\mathbf{z}})\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}\qquad (先对{\rm\mathbf{z_{j}}}求积分,再对{\rm\mathbf{z_{i}}}求积分)
101 | \end{aligned}
102 | $$
103 | 这个就是教材中的$14.36$左边的积分部分。
104 |
105 | 我们现在看下右边积分的推导$\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}$的推导。
106 |
107 | 在此之前我们看下$\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}$的计算
108 | $$
109 | \begin{aligned}
110 | \int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&= \int q_{i^{\prime}}\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}\qquad (选取一个变量q_{i^{\prime}}, i^{\prime}\ne k) \\
111 | &=\int q_{i^{\prime}}\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{i^{\prime}}}}
112 | \end{aligned}
113 | $$
114 | $\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}$部分与变量$q_{i^{\prime}}$无关,所以可以拿到积分外面。又因为$\int q_{i^{\prime}}d{\rm\mathbf{z_{i^{\prime}}}}=1$,所以
115 | $$
116 | \begin{aligned}
117 | \int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&=\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}} \\
118 | &= \int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (所有k以外的变量都可以通过上面的方式消除)
119 | \end{aligned}
120 | $$
121 | 有了这个结论,我们再来看公式
122 | $$
123 | \begin{aligned}
124 | \int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}&= \int\prod_{i}q_{i}{\rm ln}q_{j}d{\rm\mathbf{z}} + \sum_{k\ne j}\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}} \\
125 | &= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + \sum_{z\ne j}\int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (根据上面结论) \\
126 | &= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + {\rm const} \qquad (这里我们关心的是q_{j},其他变量可以视为{\rm const})
127 | \end{aligned}
128 | $$
129 | 这个就是$14.36$右边的积分部分。
130 |
131 | ## 14.40
132 |
133 | $$
134 | \begin{aligned}
135 | q_j^*(\mathbf{z}_j) = \frac{ \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) }{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \mathrm{d}\mathbf{z}_j}
136 | \end{aligned}
137 | $$
138 |
139 | [推导]:由$14.39$去对数并积分
140 | $$
141 | \begin{aligned}
142 | \int q_j^*(\mathbf{z}_j)\mathrm{d}\mathbf{z}_j &=\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right )\cdot\exp(const) \, \mathrm{d}\mathbf{z}_j \\
143 | &=\exp(const) \int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \, \mathrm{d}\mathbf{z}_j \\
144 | &= 1
145 | \end{aligned}
146 | \tag{7}
147 | $$
148 | 所以
149 | $$
150 | \exp(const) = \dfrac{1}{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \, \mathrm{d}\mathbf{z}_j} \\
151 | \tag{8}
152 | $$
153 |
154 | $$
155 | \begin{aligned}
156 | q_j^*(\mathbf{z}_j) &= \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right )\cdot\exp(const) \, \mathrm{d}\mathbf{z}_j \\
157 | &= \frac{ \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) }{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \mathrm{d}\mathbf{z}_j}
158 | \end{aligned}
159 | \tag{9}
160 | $$
161 |
--------------------------------------------------------------------------------
/docs/chapter10/chapter10.md:
--------------------------------------------------------------------------------
1 | ## 10.4
2 | $$\sum^m_{i=1}dist^2_{ij}=tr(\boldsymbol B)+mb_{jj}$$
3 | [推导]:
4 | $$\begin{aligned}
5 | \sum^m_{i=1}dist^2_{ij}&= \sum^m_{i=1}b_{ii}+\sum^m_{i=1}b_{jj}-2\sum^m_{i=1}b_{ij}\\
6 | &=tr(B)+mb_{jj}
7 | \end{aligned}$$
8 |
9 | ## 10.10
10 | $$b_{ij}=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}-dist^2_{\cdot j}+dist^2_{\cdot\cdot})$$
11 | [推导]:由公式(10.3)可得,
12 | $$b_{ij}=-\frac{1}{2}(dist^2_{ij}-b_{ii}-b_{jj})$$
13 | 由公式(10.6)和(10.9)可得,
14 | $$\begin{aligned}
15 | tr(\boldsymbol B)&=\frac{1}{2m}\sum^m_{i=1}\sum^m_{j=1}dist^2_{ij}\\
16 | &=\frac{m}{2}dist^2_{\cdot\cdot}
17 | \end{aligned}$$
18 | 由公式(10.4)和(10.8)可得,
19 | $$\begin{aligned}
20 | b_{jj}&=\frac{1}{m}\sum^m_{i=1}dist^2_{ij}-\frac{1}{m}tr(\boldsymbol B)\\
21 | &=dist^2_{\cdot j}-\frac{1}{2}dist^2_{\cdot\cdot}
22 | \end{aligned}$$
23 | 由公式(10.5)和(10.7)可得,
24 | $$\begin{aligned}
25 | b_{ii}&=\frac{1}{m}\sum^m_{j=1}dist^2_{ij}-\frac{1}{m}tr(\boldsymbol B)\\
26 | &=dist^2_{i\cdot}-\frac{1}{2}dist^2_{\cdot\cdot}
27 | \end{aligned}$$
28 | 综合可得,
29 | $$\begin{aligned}
30 | b_{ij}&=-\frac{1}{2}(dist^2_{ij}-b_{ii}-b_{jj})\\
31 | &=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}+\frac{1}{2}dist^2_{\cdot\cdot}-dist^2_{\cdot j}+\frac{1}{2}dist^2_{\cdot\cdot})\\
32 | &=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}-dist^2_{\cdot j}+dist^2_{\cdot\cdot})
33 | \end{aligned}$$
34 |
35 | ## 10.14
36 | $$\begin{aligned}
37 | \sum^m_{i=1}\| \sum^{d'}_{j=1}z_{ij}\boldsymbol w_j-\boldsymbol x_i \|^2_2&=\sum^m_{i=1}\boldsymbol z^T_i\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z^T_i\boldsymbol W^T\boldsymbol x_i + const\\
38 | &\propto -tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)
39 | \end{aligned}$$
40 | [推导]:已知$\boldsymbol W^T \boldsymbol W=\boldsymbol I$和$\boldsymbol z_i=\boldsymbol W^T \boldsymbol x_i$,
41 | $$\begin{aligned}
42 | \sum^m_{i=1}\| \sum^{d'}_{j=1}z_{ij}\boldsymbol w_j-\boldsymbol x_i \|^2_2&=\sum^m_{i=1}\| \boldsymbol W\boldsymbol z_i-\boldsymbol x_i \|^2_2\\
43 | &=\sum^m_{i=1}(\boldsymbol W\boldsymbol z_i)^T(\boldsymbol W\boldsymbol z_i)-2\sum^m_{i=1}(\boldsymbol W\boldsymbol z_i)^T\boldsymbol x_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
44 | &=\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol W^T\boldsymbol x_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
45 | &=\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
46 | &=-\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
47 | &=-tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
48 | &\propto -tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)
49 | \end{aligned}$$
50 | 其中,$\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i$是常数。
51 |
52 | ## 10.17
53 | $$
54 | \boldsymbol X\boldsymbol X^T\boldsymbol w_i=\lambda _i\boldsymbol w_i
55 | $$
56 | [推导]:已知
57 | $$\begin{aligned}
58 | &\min\limits_{\boldsymbol W}-tr(\boldsymbol W^T\boldsymbol X\boldsymbol X^T\boldsymbol W)\\
59 | &s.t. \boldsymbol W^T\boldsymbol W=\boldsymbol I.
60 | \end{aligned}$$
61 | 运用拉格朗日乘子法可得,
62 | $$\begin{aligned}
63 | J(\boldsymbol W)&=-tr(\boldsymbol W^T\boldsymbol X\boldsymbol X^T\boldsymbol W+\boldsymbol\lambda'(\boldsymbol W^T\boldsymbol W-\boldsymbol I))\\
64 | \cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W} &=\boldsymbol X\boldsymbol X^T\boldsymbol W+\boldsymbol\lambda'\boldsymbol W
65 | \end{aligned}$$
66 | 令$\cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W}=\boldsymbol 0$,故
67 | $$\begin{aligned}
68 | \boldsymbol X\boldsymbol X^T\boldsymbol W&=-\boldsymbol\lambda'\boldsymbol W\\
69 | \boldsymbol X\boldsymbol X^T\boldsymbol W&=\boldsymbol\lambda\boldsymbol W\\
70 | \end{aligned}$$
71 | 其中,$\boldsymbol W=\{\boldsymbol w_1,\boldsymbol w_2,\cdot\cdot\cdot,\boldsymbol w_d\}$和$\boldsymbol \lambda=\boldsymbol{diag}(\lambda_1,\lambda_2,\cdot\cdot\cdot,\lambda_d)$。
72 |
73 | ## 10.28
74 | $$w_{ij}=\cfrac{\sum\limits_{k\in Q_i}C_{jk}^{-1}}{\sum\limits_{l,s\in Q_i}C_{ls}^{-1}}$$
75 | [推导]:已知
76 | $$\begin{aligned}
77 | \min\limits_{\boldsymbol W}&\sum^m_{i=1}\| \boldsymbol x_i-\sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2\\
78 | s.t.&\sum_{j \in Q_i}w_{ij}=1
79 | \end{aligned}$$
80 | 转换为
81 | $$\begin{aligned}
82 | \sum^m_{i=1}\| \boldsymbol x_i-\sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2 &=\sum^m_{i=1}\| \sum_{j \in Q_i}w_{ij}\boldsymbol x_i- \sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2 \\
83 | &=\sum^m_{i=1}\| \sum_{j \in Q_i}w_{ij}(\boldsymbol x_i- \boldsymbol x_j) \|^2_2\\
84 | &=\sum^m_{i=1}\boldsymbol W^T_i(\boldsymbol x_i-\boldsymbol x_j)(\boldsymbol x_i-\boldsymbol x_j)^T\boldsymbol W_i\\
85 | &=\sum^m_{i=1}\boldsymbol W^T_i\boldsymbol C_i\boldsymbol W_i
86 | \end{aligned}$$
87 | 其中,$\boldsymbol W_i=(w_{i1},w_{i2},\cdot\cdot\cdot,w_{ik})^T$,$k$是$Q_i$集合的长度,$\boldsymbol C_i=(\boldsymbol x_i-\boldsymbol x_j)(\boldsymbol x_i-\boldsymbol x_j)^T$,$j \in Q_i$。
88 | $$
89 | \sum_{j\in Q_i}w_{ij}=\boldsymbol W_i^T\boldsymbol 1_k=1
90 | $$
91 | 其中,$\boldsymbol 1_k$为k维全1向量。
92 | 运用拉格朗日乘子法可得,
93 | $$
94 | J(\boldsymbol W)==\sum^m_{i=1}\boldsymbol W^T_i\boldsymbol C_i\boldsymbol W_i+\lambda(\boldsymbol W_i^T\boldsymbol 1_k-1)
95 | $$
96 | $$\begin{aligned}
97 | \cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W_i} &=2\boldsymbol C_i\boldsymbol W_i+\lambda'\boldsymbol 1_k
98 | \end{aligned}$$
99 | 令$\cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W_i}=0$,故
100 | $$\begin{aligned}
101 | \boldsymbol W_i&=-\cfrac{1}{2}\lambda\boldsymbol C_i^{-1}\boldsymbol 1_k\\
102 | \boldsymbol W_i&=\lambda\boldsymbol C_i^{-1}\boldsymbol 1_k\\
103 | \end{aligned}$$
104 | 其中,$\lambda$为一个常数。利用$\boldsymbol W^T_i\boldsymbol 1_k=1$,对$\boldsymbol W_i$归一化,可得
105 | $$
106 | \boldsymbol W_i=\cfrac{\boldsymbol C^{-1}_i\boldsymbol 1_k}{\boldsymbol 1_k\boldsymbol C^{-1}_i\boldsymbol 1_k}
107 | $$
108 |
109 | ## 10.31
110 | $$\begin{aligned}
111 | &\min\limits_{\boldsymbol Z}tr(\boldsymbol Z \boldsymbol M \boldsymbol Z^T)\\
112 | &s.t. \boldsymbol Z^T\boldsymbol Z=\boldsymbol I.
113 | \end{aligned}$$
114 | [推导]:
115 | $$\begin{aligned}
116 | \min\limits_{\boldsymbol Z}\sum^m_{i=1}\| \boldsymbol z_i-\sum_{j \in Q_i}w_{ij}\boldsymbol z_j \|^2_2&=\sum^m_{i=1}\|\boldsymbol Z\boldsymbol I_i-\boldsymbol Z\boldsymbol W_i\|^2_2\\
117 | &=\sum^m_{i=1}\|\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\|^2_2\\
118 | &=\sum^m_{i=1}(\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i))^T\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\\
119 | &=\sum^m_{i=1}(\boldsymbol I_i-\boldsymbol W_i)^T\boldsymbol Z^T\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\\
120 | &=tr((\boldsymbol I-\boldsymbol W)^T\boldsymbol Z^T\boldsymbol Z(\boldsymbol I-\boldsymbol W))\\
121 | &=tr(\boldsymbol Z(\boldsymbol I-\boldsymbol W)(\boldsymbol I-\boldsymbol W)^T\boldsymbol Z^T)\\
122 | &=tr(\boldsymbol Z\boldsymbol M\boldsymbol Z^T)
123 | \end{aligned}$$
124 | 其中,$\boldsymbol M=(\boldsymbol I-\boldsymbol W)(\boldsymbol I-\boldsymbol W)^T$。
125 | [解析]:约束条件$\boldsymbol Z^T\boldsymbol Z=\boldsymbol I$是为了得到标准化(标准正交空间)的低维数据。
126 |
--------------------------------------------------------------------------------
/docs/chapter6/chapter6.md:
--------------------------------------------------------------------------------
1 | ## 6.3
2 | $$
3 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \geqslant+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \leqslant-1,} & {y_{i}=-1}\end{array}\right.
4 | $$
5 | [推导]:假设这个超平面是$\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}+b^{\prime}=0$,对于$\left(\boldsymbol{x}_{i}, y_{i}\right) \in D$,有:
6 | $$
7 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}>0,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}<0,} & {y_{i}=-1}\end{array}\right.
8 | $$
9 | 根据几何间隔,将以上关系修正为:
10 | $$
11 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \geq+\zeta,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \leq-\zeta,} & {y_{i}=-1}\end{array}\right.
12 | $$
13 | 其中$\zeta$为某个大于零的常数,两边同除以$\zeta$,再次修正以上关系为:
14 | $$
15 | \left\{\begin{array}{ll}{\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \geq+1,} & {y_{i}=+1} \\ {\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \leq-1,} & {y_{i}=-1}\end{array}\right.
16 | $$
17 | 令:$\boldsymbol{w}=\frac{1}{\zeta} \boldsymbol{w}^{\prime}, b=\frac{b^{\prime}}{\zeta}$,则以上关系可写为:
18 | $$
19 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \geq+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \leq-1,} & {y_{i}=-1}\end{array}\right.
20 | $$
21 |
22 | ## 6.8
23 | $$
24 | L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right)
25 | $$
26 | [推导]:
27 | 待求目标:
28 | $$\begin{aligned}
29 | \min_{\boldsymbol{x}}\quad f(\boldsymbol{x})\\
30 | s.t.\quad h(\boldsymbol{x})&=0\\
31 | g(\boldsymbol{x}) &\leq 0
32 | \end{aligned}$$
33 |
34 | 等式约束和不等式约束:$h(\boldsymbol{x})=0, g(\boldsymbol{x}) \leq 0$分别是由一个等式方程和一个不等式方程组成的方程组。
35 |
36 | 拉格朗日乘子:$\boldsymbol{\lambda}=\left(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{m}\right)$ $\qquad\boldsymbol{\mu}=\left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$
37 |
38 | 拉格朗日函数:$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\boldsymbol{\lambda} h(\boldsymbol{x})+\boldsymbol{\mu} g(\boldsymbol{x})$
39 |
40 | ## 6.9-6.10
41 | $$\begin{aligned}
42 | w &= \sum_{i=1}^m\alpha_iy_i\boldsymbol{x}_i \\
43 | 0 &=\sum_{i=1}^m\alpha_iy_i
44 | \end{aligned}$$
45 | [推导]:式(6.8)可作如下展开:
46 | $$\begin{aligned}
47 | L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)) \\
48 | & = \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m(\alpha_i-\alpha_iy_i \boldsymbol{w}^T\boldsymbol{x}_i-\alpha_iy_ib)\\
49 | & =\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib
50 | \end{aligned}$$
51 | 对$\boldsymbol{w}$和$b$分别求偏导数并令其等于0:
52 |
53 | $$\frac {\partial L}{\partial \boldsymbol{w}}=\frac{1}{2}\times2\times\boldsymbol{w} + 0 - \sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i-0= 0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i$$
54 |
55 | $$\frac {\partial L}{\partial b}=0+0-0-\sum_{i=1}^{m}\alpha_iy_i=0 \Longrightarrow \sum_{i=1}^{m}\alpha_iy_i=0$$
56 |
57 | ## 6.11
58 | $$\begin{aligned}
59 | \max_{\boldsymbol{\alpha}} & \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
60 | s.t. & \sum_{i=1}^m \alpha_i y_i =0 \\
61 | & \alpha_i \geq 0 \quad i=1,2,\dots ,m
62 | \end{aligned}$$
63 | [推导]:将式 (6.9)代人 (6.8) ,即可将$L(\boldsymbol{w},b,\boldsymbol{\alpha})$ 中的 $\boldsymbol{w}$ 和 $b$ 消去,再考虑式 (6.10) 的约束,就得到式 (6.6) 的对偶问题:
64 | $$\begin{aligned}
65 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \\
66 | &=\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i-\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_
67 | i -b\sum _{i=1}^m\alpha_iy_i \\
68 | & = -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i -b\sum _{i=1}^m\alpha_iy_i
69 | \end{aligned}$$
70 | 又$\sum\limits_{i=1}^{m}\alpha_iy_i=0$,所以上式最后一项可化为0,于是得:
71 | $$\begin{aligned}
72 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
73 | &=-\frac {1}{2}(\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i)^T(\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i)+\sum _{i=1}^m\alpha_i \\
74 | &=-\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
75 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
76 | \end{aligned}$$
77 | 所以
78 | $$\max_{\boldsymbol{\alpha}}\min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) =\max_{\boldsymbol{\alpha}} \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j $$
79 |
80 |
81 |
82 |
83 | ## 6.39
84 | $$ C=\alpha_i +\mu_i $$
85 | [推导]:对式(6.36)关于$\xi_i$求偏导并令其等于0可得:
86 |
87 | $$\frac{\partial L}{\partial \xi_i}=0+C \times 1 - \alpha_i \times 1-\mu_i
88 | \times 1 =0\Longrightarrow C=\alpha_i +\mu_i$$
89 |
90 | ## 6.40
91 | $$\begin{aligned}
92 | \max_{\boldsymbol{\alpha}}&\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
93 | s.t. &\sum_{i=1}^m \alpha_i y_i=0 \\
94 | & 0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m
95 | \end{aligned}$$
96 | 将式6.37-6.39代入6.36可以得到6.35的对偶问题:
97 | $$\begin{aligned}
98 | \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu}) &= \frac{1}{2}||\boldsymbol{w}||^2+C\sum_{i=1}^m \xi_i+\sum_{i=1}^m \alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))-\sum_{i=1}^m\mu_i \xi_i \\
99 | &=\frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))+C\sum_{i=1}^m \xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
100 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m C\xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
101 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m (C-\alpha_i-\mu_i)\xi_i \\
102 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
103 | \end{aligned}$$
104 | 所以
105 | $$\begin{aligned}
106 | \max_{\boldsymbol{\alpha},\boldsymbol{\mu}} \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu})&=\max_{\boldsymbol{\alpha},\boldsymbol{\mu}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
107 | &=\max_{\boldsymbol{\alpha}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
108 | \end{aligned}$$
109 | 又
110 | $$\begin{aligned}
111 | \alpha_i &\geq 0 \\
112 | \mu_i &\geq 0 \\
113 | C &= \alpha_i+\mu_i
114 | \end{aligned}$$
115 | 消去$\mu_i$可得等价约束条件为:
116 | $$0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m$$
117 |
118 |
119 |
--------------------------------------------------------------------------------
/docs/chapter11/chapter11.md:
--------------------------------------------------------------------------------
1 | ## 11.9
2 | $$\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x) \right \|_{2}^{2} \leqslant L\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2} (\forall \boldsymbol x,\boldsymbol x{}'),$$
3 | [解析]:*L-Lipschitz*条件定义为:对于函数$f(\boldsymbol x)$,若其任意定义域中的**x1**,**x2**,都存在L>0,使得$|f(\boldsymbol x1)-f(\boldsymbol x2)|≤L|\boldsymbol x1-\boldsymbol x2|$。通俗理解就是,对于函数上的每一对点,都存在一个实数L,使得它们连线斜率的绝对值不大于这个L,其中最小的L称为*Lipschitz*常数。
4 | 将公式变形可以更好的理解:$$\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x) \right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}\leqslant L (\forall \boldsymbol x,\boldsymbol x{}'),$$
5 | 进一步,如果$\boldsymbol x{}'\to \boldsymbol x$,即$$\lim_{\boldsymbol x{}'\to \boldsymbol x}\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x)\right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}$$
6 | “ *Lipschitz*连续”很常见,知乎有一个问答(https://www.zhihu.com/question/51809602) 对*Lipschitz*连续的解释很形象:以陆地为例, 连续就是说这块地上没有特别陡的坡;其中最陡的地方有多陡呢?这就是所谓的*Lipschitz*常数。
7 |
8 | ## 11.10
9 |
10 | $$
11 | \hat{f}(x) \simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2}
12 | $$
13 |
14 | [推导]:
15 | $$
16 | \begin{aligned}
17 | \hat{f}(x) &\simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2} \\
18 | &= f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \langle\frac{L}{2}(x-x_{k}),x-x_{k}\rangle \\
19 | &= f(x_{k})+\langle \nabla f(x_{k})+\frac{L}{2}(x-x_{k}),x-x_{k} \rangle \\
20 | &= f(x_{k})+\frac{L}{2}\langle\frac{2}{L}\nabla f(x_{k})+(x-x_{k}),x-x_{k} \rangle \\
21 | &= f(x_{k})+\frac{L}{2}\langle x-x_{k}+\frac{1}{L}\nabla f(x_{k})+\frac{1}{L}\nabla f(x_{k}),x-x_{k}+\frac{1}{L}\nabla f(x_{k})-\frac{1}{L}\nabla f(x_{k}) \rangle \\
22 | &= f(x_{k})+\frac{L}{2}\left\| x-x_{k}+\frac{1}{L}\nabla f(x_{k}) \right\|_{2}^{2} -\frac{1}{2L}\left\|\nabla f(x_{k})\right\|_{2}^{2} \\
23 | &= \frac{L}{2}\left\| x-(x_{k}-\frac{1}{L}\nabla f(x_{k})) \right\|_{2}^{2} + const \qquad (因为f(x_{k})和\nabla f(x_{k})是常数)
24 | \end{aligned}
25 | $$
26 |
27 | ## 11.13
28 | $$\boldsymbol x_{\boldsymbol k+\boldsymbol 1}=\underset{\boldsymbol x}{argmin}\frac{L}{2}\left \| \boldsymbol x -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}$$
29 | [推导]:假设目标函数为$g(\boldsymbol x)$,则
30 | $$
31 | \begin{aligned}
32 | g(\boldsymbol x)
33 | & =\frac{L}{2}\left \|\boldsymbol x \boldsymbol -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}\\
34 | & =\frac{L}{2}\sum_{i=1}^{d}\left \| x^{i} -z^{i}\right \|_{2}^{2}+\lambda \sum_{i=1}^{d}\left \| x^{i} \right \|_{1} \\
35 | & =\sum_{i=1}^{d}(\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |)&
36 | \end{aligned}
37 | $$
38 | 由上式可见, $g(\boldsymbol x)$可以拆成 d个独立的函 数,求解式(11.13)可以分别求解d个独立的目标函数。
39 | 针对目标函数$g(x^{i})=\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |$,通过求导求解极值:
40 | $$\frac{dg(x^{i})}{dx^{i}}=L(x^{i}-z^{i})+\lambda sgn(x^{i})$$
41 | 其中$$sgn(x^{i})=\left\{\begin{matrix}
42 | 1, &x^{i}>0\\
43 | -1,& x^{i}<0
44 | \end{matrix}\right.$$
45 | 令导数为0,可得:$$x^{i}=z^{i}-\frac{\lambda }{L}sgn(x^{i})$$可分为三种情况:
46 | 1. 当$z^{i}>\frac{\lambda }{L}$时:
47 | (1)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾。
48 | (2)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}>0$,成立。
49 | 2. 当$z^{i}<-\frac{\lambda }{L}$时:
50 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。
51 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}<0$,成立。
52 | 3. 当$\left |z^{i} \right |<\frac{\lambda }{L}$时:
53 | (1)假设此时的根$x^{i}>0$,则$sgn(x^{i})=1$,所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$,与假设矛盾。
54 | (2)假设此时的根$x^{i}<0$,则$sgn(x^{i})=-1$,所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$,与假设矛盾,此时$x^{i}=0$为函数的极小值。
55 | 综上所述可得函数闭式解如下:
56 | $$x_{k+1}^{i}=\left\{\begin{matrix}
57 | z^{i}-\frac{\lambda }{L}, &\frac{\lambda }{L}< z^{i}\\
58 | 0, & \left |z^{i} \right |\leqslant \frac{\lambda }{L}\\
59 | z^{i}+\frac{\lambda }{L}, & z^{i}<-\frac{\lambda }{L}
60 | \end{matrix}\right.$$
61 |
62 | ## 11.18
63 | $$\begin{aligned}
64 | \underset{\boldsymbol B}{min}\left \|\boldsymbol X-\boldsymbol B\boldsymbol A \right \|_{F}^{2}
65 | & =\underset{b_{i}}{min}\left \| \boldsymbol X-\sum_{j=1}^{k}b_{j}\alpha ^{j} \right \|_{F}^{2}\\
66 | & =\underset{b_{i}}{min}\left \| \left (\boldsymbol X-\sum_{j\neq i}b_{j}\alpha ^{j} \right )- b_{i}\alpha ^{i}\right \|_{F}^{2} \\
67 | & =\underset{b_{i}}{min}\left \|\boldsymbol E_{\boldsymbol i}-b_{i}\alpha ^{i} \right \|_{F}^{2} &
68 | \end{aligned}
69 | $$
70 | [推导]:此处只推导一下$BA=\sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$,其中$\boldsymbol b_{\boldsymbol j}$表示**B**的第j列,$\boldsymbol \alpha ^{\boldsymbol j}$表示**A**的第j行。
71 | 然后,用$b_{j}^{i}$,$\alpha _{j}^{i}$分别表示**B**和**A**的第i行第j列的元素,首先计算**BA**:
72 | $$
73 | \begin{aligned}
74 | \boldsymbol B\boldsymbol A
75 | & =\begin{bmatrix}
76 | b_{1}^{1} &b_{2}^{1} & \cdot & \cdot & \cdot & b_{k}^{1}\\
77 | b_{1}^{2} &b_{2}^{2} & \cdot & \cdot & \cdot & b_{k}^{2}\\
78 | \cdot & \cdot & \cdot & & & \cdot \\
79 | \cdot & \cdot & & \cdot & &\cdot \\
80 | \cdot & \cdot & & & \cdot & \cdot \\
81 | b_{1}^{d}& b_{2}^{d} & \cdot & \cdot &\cdot & b_{k}^{d}
82 | \end{bmatrix}_{d\times k}\cdot
83 | \begin{bmatrix}
84 | \alpha_{1}^{1} &\alpha_{2}^{1} & \cdot & \cdot & \cdot & \alpha_{m}^{1}\\
85 | \alpha_{1}^{2} &\alpha_{2}^{2} & \cdot & \cdot & \cdot & \alpha_{m}^{2}\\
86 | \cdot & \cdot & \cdot & & & \cdot \\
87 | \cdot & \cdot & & \cdot & &\cdot \\
88 | \cdot & \cdot & & & \cdot & \cdot \\
89 | \alpha_{1}^{k}& \alpha_{2}^{k} & \cdot & \cdot &\cdot & \alpha_{m}^{k}
90 | \end{bmatrix}_{k\times m} \\
91 | & =\begin{bmatrix}
92 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\
93 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\
94 | \cdot & \cdot & \cdot & & & \cdot \\
95 | \cdot & \cdot & & \cdot & &\cdot \\
96 | \cdot & \cdot & & & \cdot & \cdot \\
97 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
98 | \end{bmatrix}_{d\times m} &
99 | \end{aligned}
100 | $$
101 | 然后计算$\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$:
102 | $$
103 | \begin{aligned}
104 | \boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}
105 | & =\begin{bmatrix}
106 | b_{1}^{j}\\ b_{w}^{j}
107 | \\ \cdot
108 | \\ \cdot
109 | \\ \cdot
110 | \\ b_{d}^{j}
111 | \end{bmatrix}\cdot
112 | \begin{bmatrix}
113 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j}
114 | \end{bmatrix}\\
115 | & =\begin{bmatrix}
116 | b_{j}^{1}\alpha _{1}^{j} &b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{1}\alpha _{m}^{j}\\
117 | b_{j}^{2}\alpha _{1}^{j} &b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & b_{j}^{2}\alpha _{m}^{j}\\
118 | \cdot & \cdot & \cdot & & & \cdot \\
119 | \cdot & \cdot & & \cdot & &\cdot \\
120 | \cdot & \cdot & & & \cdot & \cdot \\
121 | b_{j}^{d}\alpha _{1}^{j}& b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & b_{j}^{d}\alpha _{m}^{j}
122 | \end{bmatrix}_{d\times m} &
123 | \end{aligned}
124 | $$
125 | 求和可得:
126 | $$
127 | \begin{aligned}
128 | \sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}
129 | & = \sum_{j=1}^{k}\left (\begin{bmatrix}
130 | b_{1}^{j}\\ b_{w}^{j}
131 | \\ \cdot
132 | \\ \cdot
133 | \\ \cdot
134 | \\ b_{d}^{j}
135 | \end{bmatrix}\cdot
136 | \begin{bmatrix}
137 | \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot & \cdot & \cdot & \alpha _{m}^{j}
138 | \end{bmatrix} \right )\\
139 | & =\begin{bmatrix}
140 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\
141 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j} & \cdot & \cdot & \cdot & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\
142 | \cdot & \cdot & \cdot & & & \cdot \\
143 | \cdot & \cdot & & \cdot & &\cdot \\
144 | \cdot & \cdot & & & \cdot & \cdot \\
145 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j} & \cdot & \cdot &\cdot & \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
146 | \end{bmatrix}_{d\times m} &
147 | \end{aligned}
148 | $$
149 |
--------------------------------------------------------------------------------
/docs/chapter3/chapter3.md:
--------------------------------------------------------------------------------
1 | ## 3.7
2 |
3 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
4 |
5 | [推导]:令式(3.5)等于0:
6 | $$ 0 = w\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i $$
7 | $$ w\sum_{i=1}^{m}x_i^2 = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}bx_i $$
8 | 由于令式(3.6)等于0可得$ b=\cfrac{1}{m}\sum_{i=1}^{m}(y_i-wx_i) $,又$ \cfrac{1}{m}\sum_{i=1}^{m}y_i=\bar{y} $,$ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} $,则$ b=\bar{y}-w\bar{x} $,代入上式可得:
9 | $$
10 | \begin{aligned}
11 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}(\bar{y}-w\bar{x})x_i \\
12 | w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i+w\bar{x}\sum_{i=1}^{m}x_i \\
13 | w(\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i) & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i \\
14 | w & = \cfrac{\sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i}
15 | \end{aligned}
16 | $$
17 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}y_i\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i $,$ \bar{x}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}x_i\sum_{i=1}^{m}x_i=\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2 $,代入上式即可得式(3.7):
18 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
19 |
20 | 【注】:式(3.7)还可以进一步化简为能用向量表达的形式,将$ \cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2=\bar{x}\sum_{i=1}^{m}x_i $代入分母可得:
21 | $$
22 | \begin{aligned}
23 | w & = \cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i} \\
24 | & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x})}
25 | \end{aligned}
26 | $$
27 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i=\sum_{i=1}^{m}\bar{y}x_i=\sum_{i=1}^{m}\bar{x}y_i=m\bar{x}\bar{y}=\sum_{i=1}^{m}\bar{x}\bar{y} $,则上式可化为:
28 | $$
29 | \begin{aligned}
30 | w & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x}-x_i\bar{y}+\bar{x}\bar{y})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+\bar{x}^2)} \\
31 | & = \cfrac{\sum_{i=1}^{m}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{m}(x_i-\bar{x})^2}
32 | \end{aligned}
33 | $$
34 | 若令$ \boldsymbol{x}=(x_1,x_2,...,x_m) $,$ \boldsymbol{x}_{d} $为去均值后的$ \boldsymbol{x} $,$ \boldsymbol{y}=(y_1,y_2,...,y_m) $,$ \boldsymbol{y}_{d} $为去均值后的$ \boldsymbol{y} $,其中$ \boldsymbol{x} $、$ \boldsymbol{x}_{d} $、$ \boldsymbol{y} $、$ \boldsymbol{y}_{d} $均为m行1列的列向量,代入上式可得:
35 | $$ w=\cfrac{\boldsymbol{y}_{d}^T\boldsymbol{x}_{d}}{\boldsymbol{x}_d^T\boldsymbol{x}_{d}}$$
36 | ## 3.10
37 |
38 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
39 |
40 | [推导]:将$ E_{\hat{\boldsymbol w}}=(\boldsymbol{y}-\boldsymbol{X}\hat{\boldsymbol w})^T(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w}) $展开可得:
41 | $$ E_{\hat{\boldsymbol w}}= \boldsymbol{y}^T\boldsymbol{y}-\boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}-\hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}+\hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w} $$
42 | 对$ \hat{\boldsymbol w} $求导可得:
43 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= \cfrac{\partial \boldsymbol{y}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}+\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}} $$
44 | 由向量的求导公式可得:
45 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= 0-\mathbf{X}^T\boldsymbol{y}-\mathbf{X}^T\boldsymbol{y}+(\mathbf{X}^T\mathbf{X}+\mathbf{X}^T\mathbf{X})\hat{\boldsymbol w} $$
46 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
47 |
48 | ## 3.27
49 |
50 | $$ l(\beta)=\sum_{i=1}^{m}(-y_i\beta^T\hat{\boldsymbol x}_i+\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})) $$
51 |
52 | [推导]:将式(3.26)代入式(3.25)可得:
53 | $$ l(\beta)=\sum_{i=1}^{m}\ln\left(y_ip_1(\hat{\boldsymbol x}_i;\beta)+(1-y_i)p_0(\hat{\boldsymbol x}_i;\beta)\right) $$
54 | 其中$ p_1(\hat{\boldsymbol x}_i;\beta)=\cfrac{e^{\beta^T\hat{\boldsymbol x}_i}}{1+e^{\beta^T\hat{\boldsymbol x}_i}},p_0(\hat{\boldsymbol x}_i;\beta)=\cfrac{1}{1+e^{\beta^T\hat{\boldsymbol x}_i}} $,代入上式可得:
55 | $$\begin{aligned}
56 | l(\beta)&=\sum_{i=1}^{m}\ln\left(\cfrac{y_ie^{\beta^T\hat{\boldsymbol x}_i}+1-y_i}{1+e^{\beta^T\hat{\boldsymbol x}_i}}\right) \\
57 | &=\sum_{i=1}^{m}\left(\ln(y_ie^{\beta^T\hat{\boldsymbol x}_i}+1-y_i)-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})\right)
58 | \end{aligned}$$
59 | 由于$ y_i $=0或1,则:
60 | $$ l(\beta) =
61 | \begin{cases}
62 | \sum_{i=1}^{m}(-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})), & y_i=0 \\
63 | \sum_{i=1}^{m}(\beta^T\hat{\boldsymbol x}_i-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})), & y_i=1
64 | \end{cases} $$
65 | 两式综合可得:
66 | $$ l(\beta)=\sum_{i=1}^{m}\left(y_i\beta^T\hat{\boldsymbol x}_i-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})\right) $$
67 | 由于此式仍为极大似然估计的似然函数,所以最大化似然函数等价于最小化似然函数的相反数,也即在似然函数前添加负号即可得式(3.27)。
68 |
69 | 【注】:若式(3.26)中的似然项改写方式为$ p(y_i|\boldsymbol x_i;\boldsymbol w,b)=[p_1(\hat{\boldsymbol x}_i;\beta)]^{y_i}[p_0(\hat{\boldsymbol x}_i;\beta)]^{1-y_i} $,再将其代入式(3.25)可得:
70 | $$ l(\beta)=\sum_{i=1}^{m}\left(y_i\ln(p_1(\hat{\boldsymbol x}_i;\beta))+(1-y_i)\ln(p_0(\hat{\boldsymbol x}_i;\beta))\right) $$
71 | 此式显然更易推导出式(3.27)
72 |
73 | ## 3.30
74 |
75 | $$\frac{\partial l(\beta)}{\partial \beta}=-\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-p_1(\hat{\boldsymbol x}_i;\beta))$$
76 |
77 | [解析]:此式可以进行向量化,令$p_1(\hat{\boldsymbol x}_i;\beta)=\hat{y}_i$,代入上式得:
78 | $$\begin{aligned}
79 | \frac{\partial l(\beta)}{\partial \beta} &= -\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-\hat{y}_i) \\
80 | & =\sum_{i=1}^{m}\hat{\boldsymbol x}_i(\hat{y}_i-y_i) \\
81 | & ={\boldsymbol X^T}(\hat{\boldsymbol y}-\boldsymbol{y}) \\
82 | & ={\boldsymbol X^T}(p_1(\boldsymbol X;\beta)-\boldsymbol{y}) \\
83 | \end{aligned}$$
84 |
85 | ## 3.32
86 |
87 | $$J=\cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}$$
88 |
89 | [推导]:
90 | $$\begin{aligned}
91 | J &= \cfrac{\big|\big|\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
92 | &= \cfrac{\big|\big|(\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1)^T\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
93 | &= \cfrac{\big|\big|(\mu_0-\mu_1)^T\boldsymbol w\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
94 | &= \cfrac{[(\mu_0-\mu_1)^T\boldsymbol w]^T(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
95 | &= \cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}
96 | \end{aligned}$$
97 |
98 | ## 3.37
99 |
100 | $$\boldsymbol S_b\boldsymbol w=\lambda\boldsymbol S_w\boldsymbol w$$
101 |
102 | [推导]:由3.36可列拉格朗日函数:
103 | $$l(\boldsymbol w)=-\boldsymbol w^T\boldsymbol S_b\boldsymbol w+\lambda(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)$$
104 | 对$\boldsymbol w$求偏导可得:
105 | $$\begin{aligned}
106 | \cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} &= -\cfrac{\partial(\boldsymbol w^T\boldsymbol S_b\boldsymbol w)}{\partial \boldsymbol w}+\lambda \cfrac{(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)}{\partial \boldsymbol w} \\
107 | &= -(\boldsymbol S_b+\boldsymbol S_b^T)\boldsymbol w+\lambda(\boldsymbol S_w+\boldsymbol S_w^T)\boldsymbol w
108 | \end{aligned}$$
109 | 又$\boldsymbol S_b=\boldsymbol S_b^T,\boldsymbol S_w=\boldsymbol S_w^T$,则:
110 | $$\cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} = -2\boldsymbol S_b\boldsymbol w+2\lambda\boldsymbol S_w\boldsymbol w$$
111 | 令导函数等于0即可得式3.37。
112 |
113 | ## 3.43
114 |
115 | $$\begin{aligned}
116 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
117 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
118 | \end{aligned}$$
119 | [推导]:由式3.40、3.41、3.42可得:
120 | $$\begin{aligned}
121 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
122 | &= \sum_{i=1}^m(\boldsymbol x_i-\boldsymbol\mu)(\boldsymbol x_i-\boldsymbol\mu)^T-\sum_{i=1}^N\sum_{\boldsymbol x\in X_i}(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T \\
123 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x-\boldsymbol\mu)^T-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T\right)\right) \\
124 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x^T-\boldsymbol\mu^T)-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x^T-\boldsymbol\mu_i^T)\right)\right) \\
125 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(\boldsymbol x\boldsymbol x^T - \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T-\boldsymbol x\boldsymbol x^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
126 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(- \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
127 | &= \sum_{i=1}^N\left(-\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol x^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol\mu^T+\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu_i^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol x^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
128 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T-m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
129 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
130 | &= \sum_{i=1}^Nm_i\left(-\boldsymbol\mu_i\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol\mu_i^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
131 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
132 | \end{aligned}$$
133 |
134 | ## 3.44
135 | $$\max\limits_{\mathbf{W}}\cfrac{
136 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})}{tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})}$$
137 | [解析]:此式是式3.35的推广形式,证明如下:
138 | 设$\mathbf{W}=[\boldsymbol w_1,\boldsymbol w_2,...,\boldsymbol w_i,...,\boldsymbol w_{N-1}]$,其中$\boldsymbol w_i$为$d$行1列的列向量,则:
139 | $$\left\{
140 | \begin{aligned}
141 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i \\
142 | tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i
143 | \end{aligned}
144 | \right.$$
145 | 所以式3.44可变形为:
146 | $$\max\limits_{\mathbf{W}}\cfrac{
147 | \sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i}{\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i}$$
148 | 对比式3.35易知上式即为式3.35的推广形式。
149 |
--------------------------------------------------------------------------------
/docs/chapter9/chapter9.md:
--------------------------------------------------------------------------------
1 | ## 9.33
2 |
3 | $$
4 | \sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0
5 | $$
6 |
7 | [推导]:根据公式(9.28)可知:
8 | $$
9 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
10 | $$
11 |
12 |
13 | 又根据公式(9.32),由
14 | $$
15 | \frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=0
16 | $$
17 | 可得
18 | $$\begin{aligned}
19 | \frac {\partial LL(D)}{\partial\boldsymbol\mu_{i}}&=\frac {\partial}{\partial \boldsymbol\mu_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
20 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\mu_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
21 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol{\mu_{i}}}(p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}}))}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})} \\
22 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}\frac{\partial}{\partial \boldsymbol\mu_{i}}\left(-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
23 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(\left(\boldsymbol\Sigma_{i}^{-1}+\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\right)\cdot\left(\boldsymbol{x_{j}-\mu_{i}}\right)\cdot(-1)\right) \\
24 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
25 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^T\right)^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
26 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
27 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-2\cdot\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
28 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}}) \\
29 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0
30 | \end{aligned}$$
31 |
32 | ## 9.35
33 |
34 | $$
35 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
36 | $$
37 |
38 | [推导]:根据公式(9.28)可知:
39 | $$
40 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
41 | $$
42 | 又根据公式(9.32),由
43 | $$
44 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=0
45 | $$
46 | 可得
47 | $$\begin{aligned}
48 | \frac {\partial LL(D)}{\partial\boldsymbol\Sigma_{i}}&=\frac {\partial}{\partial \boldsymbol\Sigma_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
49 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\Sigma_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
50 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
51 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\\
52 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{ln\left(\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
53 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
54 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
55 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left(-\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}\right)^T-\frac{1}{2}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)
56 | \end{aligned}$$
57 |
58 | 为求得
59 | $$
60 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
61 | $$
62 |
63 | 首先分析对$\boldsymbol \Sigma_{i}$中单一元素的求导,用$r$代表矩阵$\boldsymbol\Sigma_{i}$的行索引,$c$代表矩阵$\boldsymbol\Sigma_{i}$的列索引,则
64 | $$\begin{aligned}
65 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\frac{\partial\boldsymbol\Sigma_{i}^{-1}}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right) \\
66 | &=-\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
67 | \end{aligned}$$
68 | 设$B=\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)$,则
69 | $$\begin{aligned}
70 | B^T&=\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T \\
71 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\left(\boldsymbol\Sigma_{i}^{-1}\right)^T \\
72 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
73 | \end{aligned}$$
74 | 所以
75 | $$\begin{aligned}
76 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B\end{aligned}$$
77 | 其中$B$为$n\times1$阶矩阵,$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$为$n$阶方阵,且$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$仅在$\left(r,c\right)$位置处的元素值为1,其它位置处的元素值均为$0$,所以
78 | $$\begin{aligned}
79 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B=-B_{r}\cdot B_{c}=-\left(B\cdot B^T\right)_{rc}=\left(-B\cdot B^T\right)_{rc}\end{aligned}$$
80 | 即对$\boldsymbol\Sigma_{i}$中特定位置的元素的求导结果对应于$\left(-B\cdot B^T\right)$中相同位置的元素值,所以
81 | $$\begin{aligned}
82 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=-B\cdot B^T\\
83 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T\\
84 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
85 | \end{aligned}$$
86 |
87 | 因此最终结果为
88 | $$
89 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left( -\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\right)\right)=0
90 | $$
91 |
92 | 整理可得
93 | $$
94 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
95 | $$
96 |
97 | ## 9.38
98 |
99 | $$
100 | \alpha_{i}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
101 | $$
102 |
103 | [推导]:基于公式(9.37)进行恒等变形:
104 | $$
105 | \sum_{j=1}^m\frac{p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\lambda=0
106 | $$
107 |
108 | $$
109 | \Rightarrow\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
110 | $$
111 |
112 | 对所有混合成分进行求和:
113 | $$
114 | \Rightarrow\sum_{i=1}^k\left(\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda\right)=0
115 | $$
116 |
117 | $$
118 | \Rightarrow\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\sum_{i=1}^k\alpha_{i}\lambda=0
119 | $$
120 |
121 | $$
122 | \Rightarrow\lambda=-\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}=-m
123 | $$
124 |
125 | 又
126 | $$
127 | \sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
128 | $$
129 |
130 | $$
131 | \Rightarrow\sum_{j=1}^m\gamma_{ji}+\alpha_{i}\lambda=0
132 | $$
133 |
134 | $$
135 | \Rightarrow\alpha_{i}=-\frac{\sum_{j=1}^m\gamma_{ji}}{\lambda}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
136 | $$
137 |
138 |
139 |
140 | ## 附录
141 | 参考公式
142 | $$
143 | \frac{\partial\boldsymbol x^TB\boldsymbol x}{\partial\boldsymbol x}=\left(B+B^T\right)\boldsymbol x
144 | $$
145 | $$
146 | \frac{\partial}{\partial A}ln|A|=\left(A^{-1}\right)^T
147 | $$
148 | $$
149 | \frac{\partial}{\partial x}\left(A^{-1}\right)=-A^{-1}\frac{\partial A}{\partial x}A^{-1}
150 | $$
151 | 参考资料
152 | Petersen, K. B. & Pedersen, M. S. *The Matrix Cookbook*.
153 | Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer.
154 |
155 |
156 |
--------------------------------------------------------------------------------
/docs/chapter13/chapter13.md:
--------------------------------------------------------------------------------
1 | ## 13.1
2 |
3 | $$p(\boldsymbol{x})=\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)$$
4 | [解析]: 该式即为 9.4.3 节的式(9.29),式(9.29)中的$k$个混合成分对应于此处的$N$个可能的类别
5 |
6 | ## 13.2
7 | $$
8 | \begin{aligned} f(\boldsymbol{x}) &=\underset{j \in \mathcal{Y}}{\arg \max } p(y=j | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j, \Theta=i | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j | \Theta=i, \boldsymbol{x}) \cdot p(\Theta=i | \boldsymbol{x}) \end{aligned}
9 | $$
10 | [解析]:
11 | 首先,该式的变量$\theta \in \{1,2,...,N\}$即为 9.4.3 节的式(9.30)中的 $\ z_j\in\{1,2,...k\}$
12 | 从公式第 1 行到第 2 行是做了边际化(marginalization);具体来说第 2 行比第 1 行多了$\theta$为了消掉$\theta$对其进行求和(若是连续变量则为积分)$\sum_{i=1}^N$
13 | [推导]:从公式第 2 行到第 3 行推导如下
14 | $$\begin{aligned} p(y = j,\theta = i \vert x) &= \cfrac {p(y=j, \theta=i,x)} {p(x)} \\
15 | &=\cfrac{p(y=j ,\theta=i,x)}{p(\theta=i,x)}\cdot \cfrac{p(\theta=i,x)}{p(x)} \\
16 | &=p(y=j\vert \theta=i,x)\cdot p(\theta=i\vert x)\end{aligned}$$
17 | [解析]:
18 | 其中$p(y=j\vert x)$表示$x$的类别$y$为第$j$个类别标记的后验概率(注意条件是已知$x$);
19 | $p(y=j,\theta=i\vert x)$表示$x$的类别$y$为第$j$个类别标记且由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$ );
20 | $p(y=j,\theta=i,x)$表示第$i$个高斯混合成分生成的$x$其类别$y$为第$j$个类别标记的概率(注意条件是已知$\theta$和$x$,这里修改了西瓜书式(13.3)下方对$p(y=j\vert\theta=i,x)$的表述;
21 | $p(\theta=i \vert x)$表示$x$由第$i$个高斯混合成分生成的后验概率(注意条件是已知$x$);
22 | 西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”,也就是说,如果已知$x$是由哪个高斯混合成分生成的,也就知道了其类别。而$p(y=j,\theta=i\vert x)$表示已知$\theta$和$x$ 的条件概率(其实已知$\theta$就足够,不需$x$的信息),因此
23 | $$p(y=j\vert \theta=i,x)=
24 | \begin{cases}
25 | 1,&i=j \\
26 | 0,&i\not=j
27 | \end{cases}$$
28 | ## 13.3
29 | $$
30 | p(\Theta=i | \boldsymbol{x})=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
31 | $$
32 | [解析]:该式即为 9.4.3 节的式(9.30),具体推导参见有关式(9.30)的解释。
33 | ## 13.4
34 | $$
35 | \begin{aligned} L L\left(D_{l} \cup D_{u}\right)=& \sum_{\left(x_{j}, y_{j}\right) \in D_{l}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right) \cdot p\left(y_{j} | \Theta=i, \boldsymbol{x}_{j}\right)\right) \\ &+\sum_{x_{j} \in D_{u}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)\right) \end{aligned}
36 | $$
37 | [解析]:由式(13.2)对概率$p(y=j\vert\theta =i,x)=$的分析,式中第 1 项中的$p(y_j\vert\theta =i,x_j)$ 为
38 | $$p(y_j\vert \theta=i,x_j)=
39 | \begin{cases}
40 | 1,&y_i=i \\
41 | 0,&y_i\not=i
42 | \end{cases}$$
43 | 该式第 1 项针对有标记样本$(x_i,y_i) \in D_i$来说,因为有标记样本的类别是确定的,因此在计算它的对数似然时,它只可能来自$N$个高斯混合成分中的一个(西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成,且每个类别对应一个高斯混合成分”),所以计算第 1 项计算有标记样本似然时乘以了$p(y_j\vert\theta =i,x_j)$ ;
44 | 该式第 2 项针对未标记样本$x_j\in D_u$;来说的,因为未标记样本的类别不确定,即它可能来自$N$个高斯混合成分中的任何一个,所以第 1 项使用了式(13.1)。
45 | ## 13.5
46 | $$
47 | \gamma_{j i}=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
48 | $$
49 | [解析]:该式与式(13.3)相同,即后验概率。 可通过有标记数据对模型参数$(\alpha_i,\mu_i,\Sigma_i)$进行初始化,具体来说:
50 | $$\alpha_i = \cfrac{l_i}{|D_l|},where |D_l| = \sum_{i=1}^N l_i$$
51 | $$\mu_i = \cfrac{1}{l_i}\sum_{(x_j,y_j) \in D_l\wedge y_i=i}(x_j-\mu_j)(x_j-\mu_j)^T$$
52 | $$
53 | \Sigma_{i}=\frac{1}{l_{i}} \sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left( x_{j}- \mu_{i}\right)\left( x_{j}-\mu_{i}\right)^{\top}
54 | $$
55 | 其中$l_i$表示第$i$类样本的有标记样本数目,$|D_l|$为有标记样本集样本总数,$\wedge$为“逻辑与”。
56 | ## 13.6
57 | $$
58 | \boldsymbol{\mu}_{i}=\frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \boldsymbol{x}_{j}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{x}_{j}\right)
59 | $$
60 | [推导]:类似于式(9.34)该式由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=0$而得,将式(13.4)的两项分别记为:
61 | $$LL(D_l)=\sum_{(x_j,y_j \in D_l)}ln(\sum_{s=1}^{N}\alpha_s \cdot p(x_j \vert \mu_s,\Sigma_s) \cdot p(y_i|\theta = s,x_j)$$
62 | $$LL(D_u)=\sum_{x_j \in D_u} ln(\sum_{s=1}^N \alpha_s \cdot p(x_j | \mu_s,\Sigma_s))$$
63 | 对于式(13.4)中的第 1 项$LL(D_l)$,由于$p(y_j\vert \theta=i,x_j)$取值非1即0(详见13.2,13.4分析),因此
64 | $$LL(D_l)=\sum_{(x_j,y_j)\in D_l} ln(\alpha_{y_j} \cdot p(x_j|\mu_{y_j}, \Sigma_{y_j}))$$
65 | 若求$LL(D_l)$对$\mu_i$的偏导,则$LL(D_l)$求和号中只有$y_j=i$ 的项能留下来,即
66 |
67 | $$\begin{aligned}
68 | \cfrac{\partial LL(D_l) }{\partial \mu_i} &=
69 | \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\mu_i}\\
70 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial p(x_j|\mu_i,\Sigma_i)}{\partial\mu_i}\\
71 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \cdot \Sigma_i^{-1}(x_j-\mu_i)\\
72 | &=\sum_{x_j \in D_u } \Sigma_i^{-1}(x_j-\mu_i)
73 | \end{aligned}$$
74 |
75 | 对于式(13.4)中的第 2 项$LL(D_u)$,求导结果与式(9.33)的推导过程一样
76 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=\sum_{x_j \in {D_u}} \cfrac{\alpha_i}{\sum_{s=1}^N \alpha_s \cdotp(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i )\cdot \Sigma_i^{-1}(x_j-\mu_i)$$
77 | $$=\sum_{x_j \in D_u }\gamma_{ji} \cdot \Sigma_i^{-1}(x_j-\mu_i)$$
78 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}$为
79 | $$
80 | \begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \mu_{i}} &=\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot\left(x_{j}-\mu_{i}\right)\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}-\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \mu_{i}-\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}\right) \end{aligned}
81 | $$
82 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=0$,两边同时左乘$\Sigma_i$可将$\Sigma_i^{-1}$消掉,移项即得
83 | $$
84 | \sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
85 | $$
86 | 上式中, 可以作为常量提到求和号外面,而$\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} 1=l_{i}$,即第 类样本的有标记 样本数目,因此
87 | $$
88 | \left(\sum_{x_{j} \in D_{u}} \gamma_{j i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \backslash y_{j}=i} 1\right) \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
89 | $$
90 | 即得式(13.6);
91 | ## 13.7
92 | $$
93 | \begin{aligned} \boldsymbol{\Sigma}_{i}=& \frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}}\right.\\+& \sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} ) \end{aligned}
94 | $$
95 | [推导]:类似于13.6 由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}=0$得,化简过程同13.6过程类似
96 | 对于式(13.4)中的第 1 项$LL(D_l)$ ,类似于刚才式(13.6)的推导过程;
97 | $$
98 | \begin{aligned} \frac{\partial L L\left(D_{l}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{\partial \ln \left(\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right)}{\partial \boldsymbol{\Sigma}_{i}} \\ &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot \frac{\partial p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)}{\partial \boldsymbol{\Sigma}_{i}} \\
99 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}\\
100 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\mathbf{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
101 | \end{aligned}
102 | $$
103 | 对于式(13.4)中的第 2 项$LL(D_u)$ ,求导结果与式(9.35)的推导过程一样;
104 | $$
105 | \frac{\partial L L\left(D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
106 | $$
107 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}$为
108 | $$\begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\
109 | &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right)\right.\\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) ) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
110 | \end{aligned}
111 | $$
112 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=0$,两边同时右乘$2\Sigma_i$可将 $\cfrac{1}{2}\Sigma_i^{-1}$消掉,移项即得
113 | $$
114 | \begin{aligned} \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+& \sum_{\left(\boldsymbol{x}_{j}, y_{j} \in D_{l} \wedge y_{j}=i\right.} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top} \\=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{I}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{I} \\ &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{I} \end{aligned}
115 | $$
116 | 两边同时左乘以$\Sigma_i$,上式变为
117 | $$
118 | \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{\Sigma}_{i}
119 | $$
120 | 即得式(13.7);
121 | ## 13.8
122 | $$
123 | \alpha_{i}=\frac{1}{m}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right)
124 | $$
125 | [推导]:类似于式(9.36),写出$LL(D_l \cup D_u)$的拉格朗日形式
126 | $$\begin{aligned}
127 | \mathcal{L}(D_l \cup D_u,\lambda) &= LL(D_l \cup D_u)+\lambda(\sum_{s=1}^N \alpha_s -1)\\
128 | & =LL(D_l)+LL(D_u)+\lambda(\sum_{s=1}^N \alpha_s - 1)\\
129 | \end{aligned}$$
130 | 类似于式(9.37),对$\alpha_i$求偏导。对于LL(D_u),求导结果与式(9.37)的推导过程一样:
131 | $$\cfrac{\partial LL(D_u)}{\partial\alpha_i} = \sum_{x_j \in D_u} \cfrac{1}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i)$$
132 | 对于$LL(D_l)$,类似于类似于(13.6)和(13.7)的推导过程
133 | $$\begin{aligned}
134 | \cfrac{\partial LL(D_l)}{\partial\alpha_i} &= \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\alpha_i}\\
135 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{ \alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot \cfrac{\partial (\alpha_i \cdot p(x_j|\mu_i,\Sigma_i))}{\partial \alpha_i}\\
136 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot p(x_j|\mu_i,\Sigma_i) \\
137 | &=\cfrac{1}{\alpha_i} \cdot \sum_{(x_i,y_i)\in D_l \wedge y_j=i} 1 \\
138 | &=\cfrac{l_i}{\alpha_i}
139 | \end{aligned}$$
140 | 上式推导过程中,重点注意变量是$\alpha_i$ ,$p(x_j|\mu_i,\Sigma_i)$是常量;最后一行$\alpha_i$相对于求和变量为常量,因此作为公因子提到求和号外面; 为第$i$类样本的有标记样本数目。
141 | 综合两项结果,则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}$为
142 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i} = \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda$$
143 | 令$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}=0$并且两边同乘以$\alpha_i$,得
144 | $$ \alpha_i \cdot \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda \cdot \alpha_i=0$$
145 | 结合式(9.30)发现,求和号内即为后验概率$\gamma_{ji}$,即
146 | $$l_i+\sum_{x_i \in D_u} \gamma_{ji}+\lambda \alpha_i = 0$$
147 | 对所有混合成分求和,得
148 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
149 | 这里$\Sigma_{i=1}^N \alpha_i =1$ ,因此$\sum_{i=1}^N \lambda \alpha_i=\lambda\sum_{i=1}^N \alpha_i=\lambda$
150 | 根据(9.30)中$\gamma_{ji}$表达式可知
151 | $$\sum_{i=1}^N \gamma_{ji} = \sum_{i =1}^{N} \cfrac{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}= \cfrac{\sum_{i =1}^{N}\alpha_i \cdot p(x_j|\mu_i,\Sigma_i)}{\sum_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}=1$$
152 | 再结合加法满足交换律,所以
153 | $$\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}=\sum_{x_i \in D_u} \sum_{i=1}^N \gamma_{ji} =\sum_{x_i \in D_u} 1=u$$
154 | 以上分析过程中,$\sum_{x_j\in D_u}$ 形式与$\sum_{j=1}^u$等价,其中u为未标记样本集的样本个数; $\sum_{i=1}^Nl_i=l$其中$l$为有标记样本集的样本个数;将这些结果代入
155 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
156 | 解出$l+u+\lambda = 0$ 且$l+u =m$ 其中$m$为样本总个数,移项即得$\lambda = -m$
157 | 最后带入整理解得
158 | $$l_i + \Sigma_{X_j \in{D_u}} \gamma_{ji}-m \alpha_i = 0$$
159 | 整理即得式(13.8);
160 |
161 |
162 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.