├── docs
    ├── .nojekyll
    ├── chapter2
    │   ├── resources
    │   │   └── images
    │   │   │   └── lrank.png
    │   └── chapter2.md
    ├── _sidebar.md
    ├── chapter1
    │   └── chapter1.md
    ├── index.html
    ├── README.md
    ├── chapter4
    │   └── chapter4.md
    ├── chapter7
    │   └── chapter7.md
    ├── chapter8
    │   └── chapter8.md
    ├── chapter5
    │   └── chapter5.md
    ├── chapter16
    │   └── chapter16.md
    ├── chapter14
    │   └── chapter14.md
    ├── chapter10
    │   └── chapter10.md
    ├── chapter6
    │   └── chapter6.md
    ├── chapter11
    │   └── chapter11.md
    ├── chapter3
    │   └── chapter3.md
    ├── chapter9
    │   └── chapter9.md
    └── chapter13
    │   └── chapter13.md
├── res
    ├── example.png
    ├── qrcode.jpeg
    └── xigua.jpg
├── .gitignore
├── README.md
└── LICENSE


/docs/.nojekyll:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/res/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/example.png


--------------------------------------------------------------------------------
/res/qrcode.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/qrcode.jpeg


--------------------------------------------------------------------------------
/res/xigua.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/res/xigua.jpg


--------------------------------------------------------------------------------
/docs/chapter2/resources/images/lrank.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awesome-interview/pumpkin-book/master/docs/chapter2/resources/images/lrank.png


--------------------------------------------------------------------------------
/docs/_sidebar.md:
--------------------------------------------------------------------------------
 1 | - 目录
 2 |   - [第1章 绪论](chapter1/chapter1.md)
 3 |   - [第2章 模型评估](chapter2/chapter2.md)
 4 |   - [第3章 线性模型](chapter3/chapter3.md)
 5 |   - [第4章 决策树](chapter4/chapter4.md)
 6 |   - [第5章 神经网络](chapter5/chapter5.md)
 7 |   - [第6章 支持向量机](chapter6/chapter6.md)
 8 |   - [第7章 贝叶斯分类器](chapter7/chapter7.md)
 9 |   - [第8章 集成学习](chapter8/chapter8.md)
10 |   - [第9章 聚类](chapter9/chapter9.md)
11 |   - [第10章 降维与度量学习](chapter10/chapter10.md)
12 |   - [第11章 特征选择与稀疏学习](chapter11/chapter11.md)
13 |   - [第13章 半监督学习](chapter13/chapter13.md)
14 |   - [第14章 概率图模型](chapter14/chapter14.md)
15 |   - [第16章 强化学习](chapter16/chapter16.md)
16 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Logs
 2 | logs
 3 | *.log
 4 | 
 5 | # Runtime data
 6 | pids
 7 | *.pid
 8 | *.seed
 9 | 
10 | # Directory for instrumented libs generated by jscoverage/JSCover
11 | lib-cov
12 | 
13 | # Coverage directory used by tools like istanbul
14 | coverage
15 | 
16 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
17 | .grunt
18 | 
19 | # Compiled binary addons (http://nodejs.org/api/addons.html)
20 | build/Release
21 | 
22 | # Dependency directory
23 | # Deployed apps should consider commenting this line out:
24 | # see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git
25 | node_modules
26 | 
27 | _book/
28 | book.pdf
29 | book.epub
30 | book.mobi
31 | 
32 | .idea
33 | 


--------------------------------------------------------------------------------
/docs/chapter1/chapter1.md:
--------------------------------------------------------------------------------
 1 | ## 1.2
 2 | $$\begin{aligned}
 3 | \sum_{f}E_{ote}(\mathfrak{L}_a\vert X,f) &= \sum_f\sum_h\sum_{x\in\mathcal{X}-X}P(x)\mathbb{I}(h(x)\neq f(x))P(h\vert X,\mathfrak{L}_a) \\
 4 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\sum_f\mathbb{I}(h(x)\neq f(x)) \\
 5 | &=\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a)\cfrac{1}{2}2^{\vert \mathcal{X} \vert} \\
 6 | &=\cfrac{1}{2}2^{\vert \mathcal{X} \vert}\sum_{x\in\mathcal{X}-X}P(x) \sum_hP(h\vert X,\mathfrak{L}_a) \\
 7 | &=2^{\vert \mathcal{X} \vert-1}\sum_{x\in\mathcal{X}-X}P(x) \cdot 1\\
 8 | \end{aligned}$$
 9 | 
10 | [解析]：第一步到第二步是因为$\sum_i^m\sum_j^n\sum_k^o a_ib_jc_k=\sum_i^m a_i \cdot \sum_j^n b_j \cdot \sum_k^o c_k$；
11 | 第二步到第三步：首先要知道此时$f$的定义为**任何能将样本映射到{0,1}的函数+均匀分布**，也即不止一个$f$且每个$f$出现的概率相等，例如样本空间只有两个样本时：$ \mathcal{X}=\{x_1,x_2\},\vert \mathcal{X} \vert=2$，那么所有的真实目标函数$f$为：
12 | $$\begin{aligned}
13 | f_1:f_1(x_1)=0,f_1(x_2)=0;\\
14 | f_2:f_2(x_1)=0,f_2(x_2)=1;\\
15 | f_3:f_3(x_1)=1,f_3(x_2)=0;\\
16 | f_4:f_4(x_1)=1,f_4(x_2)=1;
17 | \end{aligned}$$
18 | 一共$2^{\vert \mathcal{X} \vert}=2^2=4$个真实目标函数。所以此时通过算法$\mathfrak{L}_a$学习出来的模型$h(x)$对每个样本无论预测值为0还是1必然有一半的$f$与之预测值相等，所以$\sum_f\mathbb{I}(h(x)\neq f(x)) = \cfrac{1}{2}2^{\vert \mathcal{X} \vert} $；
19 | 第三步一直到最后有点概率论的基础应该都能看懂了。


--------------------------------------------------------------------------------
/docs/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 | <head>
 4 |   <meta charset="UTF-8">
 5 |   <title>南瓜书PumpkinBook</title>
 6 |   <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
 7 |   <meta name="description" content="Description">
 8 |   <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
 9 |   <link rel="stylesheet" href="//unpkg.com/docsify/lib/themes/vue.css">
10 | </head>
11 | <body>
12 |   <div id="app"></div>
13 |   <script>
14 |     window.$docsify = {
15 |       name: '南瓜书PumpkinBook',
16 |       loadSidebar: true,
17 |       subMaxLevel: 2,
18 |       alias: {
19 |         '/.*/_sidebar.md': '/_sidebar.md'
20 |       }
21 |     }
22 |   </script>
23 |   <!-- CDN files for docsify-katex -->
24 |   <script src="//cdn.jsdelivr.net/npm/docsify-katex@latest/dist/docsify-katex.js"></script>
25 |   <!-- or <script src="//cdn.jsdelivr.net/gh/upupming/docsify-katex/dist/docsify-katex.js"></script> -->
26 |   <link
27 |     rel="stylesheet"
28 |     href="//cdn.jsdelivr.net/npm/katex@latest/dist/katex.min.css"
29 |   />
30 |   <!-- Put them above docsify.min.js -->
31 |   <script src="//cdn.jsdelivr.net/npm/docsify@latest/lib/docsify.min.js"></script>
32 | </body>
33 | </html>
34 | 


--------------------------------------------------------------------------------
/docs/README.md:
--------------------------------------------------------------------------------
 1 | # 南瓜书PumpkinBook
 2 | 周志华老师的《机器学习》（西瓜书）是机器学习领域的经典入门教材之一，周老师为了使尽可能多的读者通过西瓜书对机器学习有所了解, 所以在书中对部分公式的推导细节没有详述，但是这对那些想深究公式推导细节的读者来说可能“不太友好”，本书旨在对西瓜书里比较难理解的公式加以解析，以及对部分公式补充具体的推导细节，诚挚欢迎每一位西瓜书读者前来参与完善本书：一个人可以走的很快，但是一群人却可以走的更远。
 3 | 
 4 | # 使用说明
 5 | 南瓜书仅仅是西瓜书的一些细微补充而已，里面的内容都是以西瓜书的内容为前置知识进行表述的，所以南瓜书的最佳使用方法是以西瓜书为主线，遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。若南瓜书里没有你想要查阅的公式，可以[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号，我们看到后会尽快进行补充。
 6 | 
 7 | # 选用的西瓜书版本
 8 | <img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/xigua.jpg" width = "476.7" height = "555.3">
 9 | 
10 | > 书名：机器学习<br>
11 | > 作者：周志华<br>
12 | > 出版社：清华大学出版社<br>
13 | > 版次：2016年1月第1版<br>
14 | > 勘误表：http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm
15 | 
16 | # 主要贡献者（按首字母排名）
17 | [@awyd234](https://github.com/awyd234)
18 | [@Heitao5200](https://github.com/Heitao5200)
19 | [@juxiao](https://github.com/juxiao)
20 | [@LongJH](https://github.com/LongJH)
21 | [@LilRachel](https://github.com/LilRachel)
22 | [@Majingmin](https://github.com/Majingmin)
23 | [@spareribs](https://github.com/spareribs)
24 | [@sunchaothu](https://github.com/sunchaothu)
25 | [@StevenLzq](https://github.com/StevenLzq)
26 | [@Sm1les](https://github.com/Sm1les)
27 | [@Ye980226](https://github.com/Ye980226)
28 | 
29 | # 关注我们
30 | 
31 | <div align=center><img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/qrcode.jpeg" width = "250" height = "270"></div>
32 | 
33 | # LICENSE
34 | [GNU General Public License v3.0](https://github.com/datawhalechina/pumpkin-book/blob/master/LICENSE)


--------------------------------------------------------------------------------
/docs/chapter4/chapter4.md:
--------------------------------------------------------------------------------
 1 | ## 4.1
 2 | $$Ent(D) =-\sum_{k=1}^{|y|}p_klog_{2}{p_k}$$
 3 | [解析]：熵是度量样本集合纯度最常用的一种指标，代表一个系统中蕴含多少信息量，信息量越大表明一个系统不确定性就越大，就存在越多的可能性。
 4 | 
 5 | 假定当前样本集合 $D$ 中第 $k$ 类样本所占的比例为 $p_k(k =1,2,...,|y|)$ ，则 $D$ 的信息熵为：
 6 | 
 7 | $$
 8 | Ent(D) =-\sum_{k=1}^{|y|}p_klog_{2}{p_k}
 9 | $$
10 | 
11 | 其中，当样本 $D$ 中 $|y|$ 类样本均匀分布时，这时信息熵最大，其值为
12 | $$
13 | Ent(D) =-\sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{\frac{1}{|y|}} = \sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{|y|} = log_{2}{|y|}
14 | $$
15 | 此时样本D的纯度越小；
16 | 
17 | 相反，假设样本D中只有一类样本，此时信息熵最小，其值为
18 | $$
19 | Ent(D) =-\sum_{k=1}^{|y|}\frac{1}{|y|}log_{2}{\frac{1}{|y|}} = -1log_21-0log_20-...-0log_20 = 0
20 | $$
21 | 此时样本的纯度最大。
22 | 
23 | ## 4.2
24 | $$
25 | Gain(D,a) = Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent({D^v})
26 | $$
27 | [解析]：假定在样本D中有某个**离散特征** $a$ 有 $V$ 个可能的取值 $(a^1,a^2,...,a^V)$，若使用特征 $a$ 来对样本集 $D$ 进行划分，则会产生 $V$ 个分支结点，其中第 $v$ 个分支结点包含了 $D$ 中所有在特征 $a$ 上取值为 $a^v$ 的样本，样本记为 $D^v$，由于根据离散特征a的每个值划分的 $V$ 个分支结点下的样本数量不一致，对于这 $V$ 个分支结点赋予权重 $\frac{|D^v|}{|D|}$，即样本数越多的分支结点的影响越大，特征 $a$ 对样本集 $D$ 进行划分所获得的“信息增益”为
28 | $$
29 | Gain(D,a) = Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{|D|}Ent({D^v})
30 | $$
31 | 信息增益越大，表示使用特征a来对样本集进行划分所获得的纯度提升越大。
32 | 
33 | **缺点**：由于在计算信息增益中倾向于特征值越多的特征进行优先划分，这样假设某个特征值的离散值个数与样本集 $D$ 个数相同（假设为样本编号），虽然用样本编号对样本进行划分，样本纯度提升最高，但是并不具有泛化能力。
34 | 
35 | ## 4.3
36 | $$
37 | Gain-ratio(D,a)=\frac{Gain(D,a)}{IV(a)}
38 | $$
39 | [解析]：基于信息增益的缺点，$C4.5$ 算法不直接使用信息增益，而是使用一种叫增益率的方法来选择最优特征进行划分，对于样本集 $D$ 中的离散特征 $a$ ，增益率为
40 | $$
41 | Gain-ratio(D,a)=\frac{Gain(D,a)}{IV(a)} 
42 | $$
43 | 其中，
44 | $$
45 | IV(a)=-\sum_{v=1}^{V}\frac{|D^v|}{|D|}log_2\frac{|D^v|}{|D|}
46 | $$
47 | IV(a) 是特征 a 的熵。
48 | 
49 | 增益率对特征值较少的特征有一定偏好，因此 $C4.5$ **算法选择特征的方法是先从候选特征中选出信息增益高于平均水平的特征，再从这些特征中选择增益率最高的**。
50 | 
51 | ## 4.5
52 | $$
53 | \begin{aligned}
54 | Gini(D) &=\sum_{k=1}^{|y|}\sum_{k\neq{k'}}{p_k}{p_{k'}}\\
55 | &=1-\sum_{k=1}^{|y|}p_k^2 
56 | \end{aligned}
57 | $$
58 | [推导]：假定当前样本集合 $D$ 中第 $k$ 类样本所占的比例为 $p_k(k =1,2,...,|y|)$，则 $D$ 的**基尼值**为
59 | $$
60 | \begin{aligned}
61 | Gini(p) &=\sum_{k=1}^{|y|}\sum_{k\neq{k'}}{p_k}{p_{k'}}\\
62 | &=\sum_{k=1}^{|y|}{p_k}{(1-p_k)} \\
63 | &=1-\sum_{k=1}^{|y|}p_k^2 
64 | \end{aligned}
65 | $$
66 | 
67 | ## 4.7 - 4.8
68 | 
69 | [解析]：样本集 $D$ 中的**连续特征** $a$，假设特征 $a$ 有 $n$ 个不同的取值，对其进行大小排序，记为 $\lbrace{a^1,a^2,...,a^n}\rbrace$，根据特征 $a$ 可得到 $n-1$ 个划分点 $t$，划分点 $t$ 的集合为
70 | $$
71 | T_a=\lbrace{\frac{a^i+a^{i+1}}{2}|1\leq{i}\leq{n-1}}\rbrace \tag {4.7}
72 | $$
73 | 对于取值集合 $ T_a$  中的每个 $t$  值计算将特征 $a$  离散为一个特征值只有两个值，分别是 $\lbrace{a} >t\rbrace$ 和 $\lbrace{a} \leq{t}\rbrace$  的特征，计算新特征的信息增益，找到信息增益最大的 $t$ 值即为该特征的最优划分点。
74 | $$
75 | \begin{aligned}
76 | Gain(D,a) &= \max\limits_{t \in T_a} \ Gain(D,a) \\
77 | &= \max\limits_{t \in T_a} \ Ent(D)-\sum_{\lambda \in \{-,+\}} \frac{\left | D_t^{\lambda } \right |}{\left |D  \right |}Ent(D_t^{\lambda }) \end{aligned} \tag{4.8}
78 | $$


--------------------------------------------------------------------------------
/docs/chapter2/chapter2.md:
--------------------------------------------------------------------------------
 1 | ## 2.20
 2 | 
 3 | $$ AUC=\cfrac{1}{2}\sum_{i=1}^{m-1}(x_{i+1} - x_i)\cdot(y_i + y_{i+1}) $$
 4 | 
 5 | [解析]：由于图2.4(b)中给出的ROC曲线为横平竖直的标准折线，所以乍一看这个式子的时候很不理解其中的$ \cfrac{1}{2} $和$ (y_i + y_{i+1}) $代表着什么，因为对于横平竖直的标准折线用$ AUC=\sum_{i=1}^{m-1}(x_{i+1} - x_i) \cdot y_i $就可以求出AUC了，但是图2.4(b)中的ROC曲线只是个特例罢了，因为此图是所有样例的预测值均不相同时的情形，也就是说每次分类阈值变化的时候只会划分新增**1个**样例为正例，所以下一个点的坐标为$ (x+\cfrac{1}{m^-},y) $或$ (x,y+\cfrac{1}{m^+}) $，然而当模型对某个正样例和某个反样例给出的预测值相同时，便会划分新增**两个**样例为正例，于是其中一个分类正确一个分类错误，那么下一个点的坐标为$ (x+\cfrac{1}{m^-},y+\cfrac{1}{m^+}) $（当没有预测值相同的样例时，若采取按固定梯度改变分类阈值，也会出现一下划分新增两个甚至多个正例的情形，但是此种阈值选取方案画出的ROC曲线AUC值更小，不建议使用），此时ROC曲线中便会出现斜线，而不再是只有横平竖直的折线，所以用**梯形面积公式**就能完美兼容这两种分类阈值选取方案，也即 **(上底+下底)\*高\*$ \cfrac{1}{2} $**
 6 | 
 7 | ## 2.21
 8 | 
 9 | $$ l_{rank}=\cfrac{1}{m^+m^-}\sum_{x^+ \in D^+}\sum_{x^- \in D^-}(\mathbb{I}(f(x^+)<f(x^-))+\cfrac{1}{2}\mathbb{I}(f(x^+)=f(x^-))) $$
10 | 
11 | [解析]：此公式正如书上所说，$ l_{rank} $为ROC曲线**之上**的面积，假设某ROC曲线如下图所示：
12 | 
13 | ![avatar](resources/images/lrank.png)
14 | 
15 | 观察ROC曲线易知：
16 | - 每增加一条绿色线段对应着有**1个**正样例（$ x^+_i $）被模型正确判别为正例，且该线段在Y轴的投影长度恒为$ \cfrac{1}{m^+} $；
17 | - 每增加一条红色线段对应着有**1个**反样例（$ x^-_i $）被模型错误判别为正例，且该线段在X轴的投影长度恒为$ \cfrac{1}{m^-} $；
18 | - 每增加一条蓝色线段对应着有a个正样例和b个反样例**同时**被判别为正例，且该线段在X轴上的投影长度=$ b * \cfrac{1}{m^-} $，在Y轴上的投影长度=$ a * \cfrac{1}{m^+} $；
19 | - 任何一条线段所对应的样例的预测值一定**小于**其左边和下边的线段所对应的样例的预测值，其中蓝色线段所对应的a+b个样例的预测值相等。
20 | 
21 | 公式里的$ \sum_{x^+ \in D^+} $可以看成一个遍历$ x^+_i $的循环：
22 | 
23 | for $ x^+_i $ in $ D^+ $:
24 | 
25 | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$ \cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}(\mathbb{I}(f(x^+_i)<f(x^-))+\cfrac{1}{2}\mathbb{I}(f(x^+_i)=f(x^-))) $ #记为式S
26 | 
27 | 由于每个$ x^+_i $都对应着一条绿色或蓝色线段，所以遍历$ x^+_i $可以看成是在遍历每条绿色和蓝色线段，并用式S来求出每条绿色线段与Y轴构成的面积（例如上图中的m1)或者蓝色线段与Y轴构成的面积（例如上图中的m2+m3）。
28 | 
29 | **对于每条绿色线段：** 将其式S展开可得：
30 | $$ \cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\mathbb{I}(f(x^+_i)<f(x^-))+\cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\cfrac{1}{2}\mathbb{I}(f(x^+_i)=f(x^-)) $$其中$ x^+_i $此时恒为该线段所对应的正样例，是一个定值。$ \sum_{x^- \in D^-}\cfrac{1}{2}\mathbb{I}(f(x^+_i)=f(x^-) $是在通过遍历所有反样例来统计和$ x^+_i $的预测值相等的反样例个数，由于没有反样例的预测值和$ x^+_i $的预测值相等，所以$ \sum_{x^- \in D^-}\cfrac{1}{2}\mathbb{I}(f(x^+_i)=f(x^-)) $此时恒为0，于是其式S可以化简为：$$ \cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\mathbb{I}(f(x^+_i)<f(x^-)) $$其中$ \cfrac{1}{m^+} $为该线段在Y轴上的投影长度，$ \sum_{x^- \in D^-}\mathbb{I}(f(x^+_i)<f(x^-)) $同理是在通过遍历所有反样例来统计预测值大于$ x^+_i $的预测值的反样例个数，也即该线段左边和下边的红色线段个数+蓝色线段对应的反样例个数，所以$ \cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}(\mathbb{I}(f(x^+)<f(x^-))) $便是该线段左边和下边的红色线段在X轴的投影长度+蓝色线段在X轴的投影长度，也就是该绿色线段在X轴的投影长度，观察ROC图像易知绿色线段与Y轴围成的面积=该线段在Y轴的投影长度 * 该线段在X轴的投影长度。
31 | 
32 | **对于每条蓝色线段：** 将其式S展开可得：
33 | $$ \cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\mathbb{I}(f(x^+_i)<f(x^-))+\cfrac{1}{m^+}\cdot\cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\cfrac{1}{2}\mathbb{I}(f(x^+_i)=f(x^-)) $$
34 | 其中前半部分表示的是蓝色线段和Y轴围成的图形里面矩形部分的面积，后半部分表示的便是剩下的三角形的面积，矩形部分的面积公式同绿色线段的面积公式一样很好理解，而三角形部分的面积公式里面的$ \cfrac{1}{m^+} $为底边长，$ \cfrac{1}{m^-}\cdot\sum_{x^- \in D^-}\mathbb{I}(f(x^+_i)=f(x^-)) $为高。
35 | 
36 | 综上分析可知，式S既可以用来求绿色线段与Y轴构成的面积也能求蓝色线段与Y轴构成的面积，所以遍历完所有绿色和蓝色线段并将其与Y轴构成的面积累加起来即得$ l_{rank} $。


--------------------------------------------------------------------------------
/docs/chapter7/chapter7.md:
--------------------------------------------------------------------------------
 1 | ## 7.5
 2 | $$R(c|\boldsymbol x)=1−P(c|\boldsymbol x)$$
 3 | [推导]：由式7.1和式7.4可得：
 4 | $$R(c_i|\boldsymbol x)=1*P(c_1|\boldsymbol x)+1*P(c_2|\boldsymbol x)+...+0*P(c_i|\boldsymbol x)+...+1*P(c_N|\boldsymbol x)$$
 5 | 又$\sum_{j=1}^{N}P(c_j|\boldsymbol x)=1$，则：
 6 | $$R(c_i|\boldsymbol x)=1-P(c_i|\boldsymbol x)$$
 7 | 此即为式7.5
 8 | ## 7.8
 9 | $$P(c|\boldsymbol x)=\cfrac{P(c)P(\boldsymbol x|c)}{P(\boldsymbol x)}$$
10 | [解析]：最小化误差，也就是最大化P(c|x)，但由于P(c|x)属于后验概率无法直接计算，由贝叶斯公式可计算出:
11 | $$P(c|\boldsymbol x)=\cfrac{P(c)P(\boldsymbol x|c)}{P(\boldsymbol x)}$$
12 | $P(\boldsymbol x)$可以省略，因为我们比较的时候$P(\boldsymbol x)$一定是相同的，所以我们就是用历史数据计算出$P(c)$和$P(\boldsymbol x|c)$。
13 | 1. $P(c)$根据大数定律，当样本量到了一定程度且服从独立同分布，c的出现的频率就是c的概率。
14 | 2. $P(\boldsymbol x|c)$，因为$\boldsymbol x$在这里不对单一元素是个矩阵，涉及n个元素，不太好直接统计分类为c时，$\boldsymbol x$的概率，所以我们根据假设独立同分布，对每个$\boldsymbol x$的每个特征分别求概率
15 | $$P(\boldsymbol x|c)=P(x_1|c)*P(x_2|c)*P(x_3|c)...*P(x_n|c)$$
16 | 这个式子就可以很方便的通过历史数据去统计了,比如特征n，就是在分类为c时特征n出现的概率，在数据集中应该是用1显示。
17 | 但是当某一概率为0时会导致整个式子概率为0，所以采用拉普拉斯修正
18 | 
19 | 当样本属性独依赖时，也就是除了c多加一个依赖条件，式子变成了
20 | $$∏_{i=1}^n P(x_i|c,p_i)$$
21 | $p_i$是$x_i$所依赖的属性
22 | 
23 | 当样本属性相关性未知时,我们采用贝叶斯网的算法，对相关性进行评估，以找出一个最佳的分类模型。
24 | 
25 | 当遇到不完整的训练样本时，可通过使用EM算法对模型参数进行评估来解决。
26 | 
27 | ## 7.17-7.18
28 | $$P_{(\boldsymbol x_{i}|c)}\in[0,1]$$
29 | $$p_{(\boldsymbol x_{i}| c)}$$
30 | [解析]：式(7.17)所得$P_{(\boldsymbol x_{i}|c)}\in[0,1]$为条件概率，但式(7.18)所得$p_{(\boldsymbol x_{i}| c)}$为条件概率密度而非概率，其值并不在局限于区间[0,1]之内。
31 | 
32 | ## 7.23
33 | 
34 | $$P(c|\boldsymbol x)\propto\sum\limits_{i=1}^{d}P(c_{i}|\boldsymbol x_{i})\prod _{j=1}^{d}P(c_{i}|\boldsymbol x_{i})$$
35 | [推导]：$$P(c|\boldsymbol x)=\cfrac{P(\boldsymbol x|c)}{P(\boldsymbol x)}=\cfrac{P(c|x_{i})P(x_{1},…,x_{i-1},x_{i+1},…,x_{d}|c,x_{i})}{P(\boldsymbol x)}$$
36 | $$P(c|\boldsymbol x)\propto P(c|x_{i})P(x_{1},…,x_{i-1},x_{i+1},…,x_{d}|c,x_{i})$$
37 | $$P(c|\boldsymbol x)\propto\sum\limits_{i=1}^{d}P(c_{i}|\boldsymbol x_{i})\prod _{j=1}^{d}P(c_{i}|\boldsymbol x_{i})$$
38 | 此即为式7.23
39 | [解析]：式(7.24)和式(7.25)的使用到了$|D_{c,x_{i}}|$与$|D_{c,x_{i},x_{j}}|$，若$|D_{x_{i}}|$集合中样本数量过少，则$|D_{c,x_{i}}|$与$|D_{c,x_{i},x_{j}}|$将会更小，因此在式(7.23)中要求$|D_{x_{i}}|$集合中样本数量不少于$m^{'}$。
40 | 
41 | 
42 | 
43 | 
44 | ## 附录
45 | ### 勘误
46 | 7.3节例子计算有笔误，第152页的第9个等式应为$P_{(凹陷|\boldsymbol 是)}=\cfrac{5}{8}$。截止到目前（第28次印刷），西瓜书官方勘误修订仅在第8次印刷时修正了第3个等式$P_{(蜷缩|\boldsymbol 是)}$，但第9个等式仍未修正（若修正第84页的西瓜数据集3.0也可，但亦未修正）。
47 | 
48 | #### sklearn调包
49 | 
50 | ```python
51 |  import numpy as np
52 |  X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
53 |  Y = np.array([1, 1, 1, 2, 2, 2])
54 | from sklearn.naive_bayes import GaussianNB
55 |  clf = GaussianNB()
56 | clf.fit(X, Y)
57 | GaussianNB(priors=None, var_smoothing=1e-09)
58 | print(clf.predict([[-0.8, -1]]))
59 | ```
60 | #### 参数:	
61 | priors : array-like, shape (n_classes,)
62 | Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
63 | 
64 | var_smoothing : float, optional (default=1e-9)
65 | Portion of the largest variance of all features that is added to variances for calculation stability.
66 | #### 贝叶斯应用
67 | 
68 | 1. 中文分词
69 | 分词后，得分的假设是基于两词之间是独立的，后词的出现与前词无关
70 | 2. 统计机器翻译
71 | 统计机器翻译因为其简单，无需手动添加规则，迅速成为了机器翻译的事实标准。
72 | 3. 贝叶斯图像识别
73 | 首先是视觉系统提取图形的边角特征，然后使用这些特征自底向上地激活高层的抽象概念，然后使用一个自顶向下的验证来比较到底哪个概念最佳地解释了观察到的图像。
74 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 南瓜书PumpkinBook
 2 | 周志华老师的《机器学习》（西瓜书）是机器学习领域的经典入门教材之一，周老师为了使尽可能多的读者通过西瓜书对机器学习有所了解, 所以在书中对部分公式的推导细节没有详述，但是这对那些想深究公式推导细节的读者来说可能“不太友好”，本书旨在对西瓜书里比较难理解的公式加以解析，以及对部分公式补充具体的推导细节，诚挚欢迎每一位西瓜书读者前来参与完善本书：一个人可以走的很快，但是一群人却可以走的更远。
 3 | 
 4 | # 使用说明
 5 | 南瓜书仅仅是西瓜书的一些细微补充而已，里面的内容都是以西瓜书的内容为前置知识进行表述的，所以南瓜书的最佳使用方法是以西瓜书为主线，遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书。若南瓜书里没有你想要查阅的公式，可以[点击这里](https://github.com/datawhalechina/pumpkin-book/issues/1)提交你希望补充推导或者解析的公式编号，我们看到后会尽快进行补充。
 6 | 
 7 | # 在线阅读地址
 8 | https://datawhalechina.github.io/pumpkin-book/
 9 | 
10 | # 目录
11 | 
12 | - 第1章 [绪论](https://datawhalechina.github.io/pumpkin-book/#/chapter1/chapter1)
13 | - 第2章 [模型评估与选择](https://datawhalechina.github.io/pumpkin-book/#/chapter2/chapter2)
14 | - 第3章 [线性模型](https://datawhalechina.github.io/pumpkin-book/#/chapter3/chapter3)
15 | - 第4章 [决策树](https://datawhalechina.github.io/pumpkin-book/#/chapter4/chapter4)
16 | - 第5章 [神经网络](https://datawhalechina.github.io/pumpkin-book/#/chapter5/chapter5)
17 | - 第6章 [支持向量机](https://datawhalechina.github.io/pumpkin-book/#/chapter6/chapter6)
18 | - 第7章 [贝叶斯分类器](https://datawhalechina.github.io/pumpkin-book/#/chapter7/chapter7)
19 | - 第8章 [集成学习](https://datawhalechina.github.io/pumpkin-book/#/chapter8/chapter8)
20 | - 第9章 [聚类](https://datawhalechina.github.io/pumpkin-book/#/chapter9/chapter9)
21 | - 第10章 [降维与度量学习](https://datawhalechina.github.io/pumpkin-book/#/chapter10/chapter10)
22 | - 第11章 [特征选择与稀疏学习](https://datawhalechina.github.io/pumpkin-book/#/chapter11/chapter11)
23 | - 第12章 计算学习理论
24 | - 第13章 [半监督学习](https://datawhalechina.github.io/pumpkin-book/#/chapter13/chapter13)
25 | - 第14章 [概率图模型](https://datawhalechina.github.io/pumpkin-book/#/chapter14/chapter14)
26 | - 第15章 规则学习
27 | - 第16章 [强化学习](https://datawhalechina.github.io/pumpkin-book/#/chapter16/chapter16)
28 | 
29 | # 选用的西瓜书版本
30 | <img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/xigua.jpg" width = "476.7" height = "555.3">
31 | 
32 | > 书名：机器学习<br>
33 | > 作者：周志华<br>
34 | > 出版社：清华大学出版社<br>
35 | > 版次：2016年1月第1版<br>
36 | > 勘误表：http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm
37 | 
38 | #  协作规范
39 | 
40 | ### 文档书写规范：
41 | 文档采用```Markdown```语法编写，数学公式采用```LaTeX```语法编写，数学符号规范参见西瓜书目录前一页《主要符号表》。<br>
42 | 编辑器推荐：[马克飞象](https://maxiang.io)
43 | 
44 | ### 目录结构规范：
45 | 
46 | ```
47 | pumpkin-book
48 | ├─docs
49 | |  ├─chapter1  # 第1章
50 | |  |  ├─resources  # 资源文件夹
51 | |  |  |  └─images  # 图片资源
52 | |  |  └─chapter1.md # 第1章公式全解
53 | |  ├─chapter2
54 | ...
55 | ```
56 | ### 公式全解文档规范：
57 | ```
58 | ## 公式编号
59 | 
60 | $$（公式的LaTeX表达式）$$
61 | 
62 | [推导]：（公式推导步骤） or [解析]：（公式解析说明）
63 | 
64 | ## 附录（可选）
65 | 
66 | （附录内容）
67 | ```
68 | 例如：
69 | <img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/example.png">
70 | 
71 | 
72 | # 主要贡献者（按首字母排名）
73 | [@awyd234](https://github.com/awyd234)
74 | [@Heitao5200](https://github.com/Heitao5200)
75 | [@juxiao](https://github.com/juxiao)
76 | [@LongJH](https://github.com/LongJH)
77 | [@LilRachel](https://github.com/LilRachel)
78 | [@Majingmin](https://github.com/Majingmin)
79 | [@spareribs](https://github.com/spareribs)
80 | [@sunchaothu](https://github.com/sunchaothu)
81 | [@StevenLzq](https://github.com/StevenLzq)
82 | [@Sm1les](https://github.com/Sm1les)
83 | [@Ye980226](https://github.com/Ye980226)
84 | 
85 | # 关注我们
86 | 
87 | <div align=center><img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/qrcode.jpeg" width = "250" height = "270"></div>
88 | 
89 | # LICENSE
90 | [GNU General Public License v3.0](https://github.com/datawhalechina/pumpkin-book/blob/master/LICENSE)
91 | 
92 | 


--------------------------------------------------------------------------------
/docs/chapter8/chapter8.md:
--------------------------------------------------------------------------------
 1 | ## 8.3
 2 | $$
 3 | \begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned}
 4 | $$
 5 | [推导]:由基分类器相互独立，设X为T个基分类器分类正确的次数，因此$\mathrm{X} \sim \mathrm{B}(\mathrm{T}, 1-\mathrm{\epsilon})$
 6 | $$
 7 | \begin{aligned} P(H(x) \neq f(x))=& P(X \leq\lfloor T / 2\rfloor) \\ & \leqslant P(X \leq T / 2)
 8 | \\ & =P\left[X-(1-\varepsilon) T \leqslant \frac{T}{2}-(1-\varepsilon) T\right] 
 9 | \\ & =P\left[X-
10 | (1-\varepsilon) T \leqslant -\frac{T}{2}\left(1-2\varepsilon\right)]\right]
11 |  \end{aligned}
12 | $$
13 | 根据Hoeffding不等式$P(X-(1-\epsilon)T\leqslant -kT) \leq \exp (-2k^2T)$
14 | 令$k=\frac {(1-2\epsilon)}{2}$得
15 | $$
16 | \begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned}
17 | $$
18 | 
19 | ## 8.5-8.8
20 | [解析]：由式(8.4)可知
21 | $$H(\boldsymbol{x})=\sum_{t=1}^{T} \alpha_{t} h_{t}(\boldsymbol{x})$$
22 | 
23 | 又由式(8.11)可知
24 | $$
25 | \alpha_{t}=\frac{1}{2} \ln \left(\frac{1-\epsilon_{t}}{\epsilon_{t}}\right)
26 | $$
27 | 该分类器的权重只与分类器的错误率负相关(即错误率越大，权重越低)
28 | 
29 | (1)先考虑指数损失函数$e^{-f(x) H(x)}$的含义：$f$为真实函数，对于样本$x$来说，$f(\boldsymbol{x}) \in\{-1,+1\}$只能取和两个值，而$H(\boldsymbol{x})$是一个实数；
30 | 当$H(\boldsymbol{x})$的符号与$f(x)$一致时，$f(\boldsymbol{x}) H(\boldsymbol{x})>0$，因此$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}=e^{-|H(\boldsymbol{x})|}<1$，且$|H(\boldsymbol{x})|$越大指数损失函数$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}$越小（这很合理：此时$|H(\boldsymbol{x})|$越大意味着分类器本身对预测结果的信心越大，损失应该越小；若$|H(\boldsymbol{x})|$在零附近，虽然预测正确，但表示分类器本身对预测结果信心很小，损失应该较大）；
31 | 当$H(\boldsymbol{x})$的符号与$f(\boldsymbol{x})$不一致时，$f(\boldsymbol{x}) H(\boldsymbol{x})<0$，因此$e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}=e^{|H(\boldsymbol{x})|}>1$，且$| H(\boldsymbol{x}) |$越大指数损失函数越大（这很合理：此时$| H(\boldsymbol{x}) |$越大意味着分类器本身对预测结果的信心越大，但预测结果是错的，因此损失应该越大；若$| H(\boldsymbol{x}) |$在零附近，虽然预测错误，但表示分类器本身对预测结果信心很小，虽然错了，损失应该较小）；
32 | (2)符号$\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}[\cdot]$的含义：$\mathcal{D}$为概率分布，可简单理解为在数据集$D$中进行一次随机抽样，每个样本被取到的概率；$\mathbb{E}[\cdot]$为经典的期望，则综合起来$\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}[\cdot]$表示在概率分布$\mathcal{D}$上的期望，可简单理解为对数据集$D$以概率$\mathcal{D}$进行加权后的期望。
33 | f(x)是概率分布函数
34 | 
35 | $$
36 | \begin{aligned}
37 | \ell_{\mathrm{exp}}(H | \mathcal{D})=&\mathbb{E}_{\boldsymbol{x} \sim \mathcal{D}}\left[e^{-f(\boldsymbol{x}) H(\boldsymbol{x})}\right]
38 | \\ =&P(f(x)=1|x)*e^{-H(x)}+P(f(x)=-1|x)*e^{H(x)}
39 | \end{aligned}
40 | $$
41 | 
42 | 由于$P(f(x)=1|x)和P(f(x)=-1|x)$为常数
43 | 
44 | 故式(8.6)可轻易推知
45 | 
46 | $$
47 | \frac{\partial \ell_{\exp }(H | \mathcal{D})}{\partial H(\boldsymbol{x})}=-e^{-H(\boldsymbol{x})} P(f(\boldsymbol{x})=1 | \boldsymbol{x})+e^{H(\boldsymbol{x})} P(f(\boldsymbol{x})=-1 | \boldsymbol{x})
48 | $$
49 | 
50 | 令式(8.6)等于0可得
51 | 
52 | 式(8.7)
53 | $$
54 | H(\boldsymbol{x})=\frac{1}{2} \ln \frac{P(f(x)=1 | \boldsymbol{x})}{P(f(x)=-1 | \boldsymbol{x})}
55 | $$
56 | 式(8.8)显然成立
57 | $$
58 | \begin{aligned}
59 | \operatorname{sign}(H(\boldsymbol{x}))&=\operatorname{sign}\left(\frac{1}{2} \ln \frac{P(f(x)=1 | \boldsymbol{x})}{P(f(x)=-1 | \boldsymbol{x})}\right)
60 | \\ & =\left\{\begin{array}{ll}{1,} & {P(f(x)=1 | \boldsymbol{x})>P(f(x)=-1 | \boldsymbol{x})} \\ {-1,} & {P(f(x)=1 | \boldsymbol{x})<P(f(x)=-1 | \boldsymbol{x})}\end{array}\right.
61 | \\ & =\underset{y \in\{-1,1\}}{\arg \max } P(f(x)=y | \boldsymbol{x})
62 | \end{aligned}
63 | $$
64 | 


--------------------------------------------------------------------------------
/docs/chapter5/chapter5.md:
--------------------------------------------------------------------------------
 1 | ## 5.2
 2 | $$\Delta w_i = \eta(y-\hat{y})x_i$$
 3 | [推导]：此处感知机的模型为：
 4 | $$y=f(\sum_{i} w_i x_i - \theta)$$
 5 | 将$\theta$看成哑结点后，模型可化简为：
 6 | $$y=f(\sum_{i} w_i x_i)=f(\boldsymbol w^T \boldsymbol x)$$
 7 | 其中$f$为阶跃函数。<br>根据《统计学习方法》§2可知，假设误分类点集合为$M$，$\boldsymbol x_i \in M$为误分类点，$\boldsymbol x_i$的真实标签为$y_i$,模型的预测值为$\hat{y_i}$,对于误分类点$\boldsymbol x_i$来说，此时$\boldsymbol w^T \boldsymbol x_i \gt 0,\hat{y_i}=1,y_i=0$或$\boldsymbol w^T \boldsymbol x_i \lt 0,\hat{y_i}=0,y_i=1$,综合考虑两种情形可得：
 8 | $$(\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i>0$$
 9 | 所以可以推得损失函数为：
10 | $$L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i$$
11 | 损失函数的梯度为：
12 | $$\nabla_w L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol x_i$$
13 | 随机选取一个误分类点$(\boldsymbol x_i,y_i)$，对$\boldsymbol w$进行更新：
14 | $$\boldsymbol w \leftarrow \boldsymbol w-\eta(\hat{y_i}-y_i)\boldsymbol x_i=\boldsymbol w+\eta(y_i-\hat{y_i})\boldsymbol x_i$$
15 | 显然式5.2为$\boldsymbol w$的第$i$个分量$w_i$的变化情况
16 | ## 5.12
17 | $$\Delta \theta_j = -\eta g_j$$
18 | [推导]：因为
19 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}$$
20 | 又
21 | $$
22 | \begin{aligned}	
23 | \cfrac{\partial E_k}{\partial \theta_j} &= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial \hat{y}_j^k}{\partial \theta_j} \\
24 | &= (\hat{y}_j^k-y_j^k) \cdot f’(\beta_j-\theta_j) \cdot (-1) \\
25 | &= -(\hat{y}_j^k-y_j^k)f’(\beta_j-\theta_j) \\
26 | &= g_j
27 | \end{aligned}
28 | $$
29 | 所以
30 | $$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}=-\eta g_j$$
31 | ## 5.13
32 | $$\Delta v_{ih} = \eta e_h x_i$$
33 | [推导]：因为
34 | $$\Delta v_{ih} = -\eta \cfrac{\partial E_k}{\partial v_{ih}}$$
35 | 又
36 | $$
37 | \begin{aligned}	
38 | \cfrac{\partial E_k}{\partial v_{ih}} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot \cfrac{\partial \alpha_h}{\partial v_{ih}} \\
39 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot x_i \\ 
40 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
41 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
42 | &= \sum_{j=1}^{l} (-g_j) \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\
43 | &= -f’(\alpha_h-\gamma_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj}  \cdot x_i\\
44 | &= -b_h(1-b_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj}  \cdot x_i \\
45 | &= -e_h \cdot x_i
46 | \end{aligned}
47 | $$
48 | 所以
49 | $$\Delta v_{ih} = -\eta \cdot -e_h \cdot x_i=\eta e_h x_i$$
50 | ## 5.14
51 | $$\Delta \gamma_h= -\eta e_h$$
52 | [推导]：因为
53 | $$\Delta \gamma_h = -\eta \cfrac{\partial E_k}{\partial \gamma_h}$$
54 | 又
55 | $$
56 | \begin{aligned}	
57 | \cfrac{\partial E_k}{\partial \gamma_h} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \gamma_h} \\
58 | &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot (-1) \\
59 | &= -\sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h)\\
60 | &=e_h
61 | \end{aligned}
62 | $$
63 | 所以
64 | $$\Delta \gamma_h= -\eta e_h$$


--------------------------------------------------------------------------------
/docs/chapter16/chapter16.md:
--------------------------------------------------------------------------------
  1 | ## 16.2
  2 | $$
  3 | Q_{n}(k)=\frac{1}{n}\left((n-1)\times Q_{n-1}(k)+v_{n}\right)
  4 | $$
  5 | 
  6 | [推导]：
  7 | $$
  8 | Q_{n}(k)=\frac{1}{n}\sum_{i=1}^{n}v_{i}=\frac{1}{n}\left(\sum_{i=1}^{n-1}v_{i}+v_{n}\right)=\frac{1}{n}\left((n-1)Q_{n-1}(k)+v_{n}\right)
  9 | $$
 10 | 
 11 | ## 16.4
 12 | 
 13 | $$
 14 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}}
 15 | $$
 16 | 
 17 | $$
 18 | \tau越小则平均奖赏高的摇臂被选取的概率越高
 19 | $$
 20 | 
 21 | [解析]：
 22 | $$
 23 | P(k)=\frac{e^{\frac{Q(k)}{\tau }}}{\sum_{i=1}^{K}e^{\frac{Q(i)}{\tau}}}\propto e^{\frac{Q(k)}{\tau }}\propto\frac{Q(k)}{\tau }\propto\frac{1}{\tau}
 24 | $$
 25 | 
 26 | ## 16.7
 27 | 
 28 | $$
 29 | \begin{aligned}
 30 | V_{T}^{\pi}(x)&=\mathbb{E}_{\pi}[\frac{1}{T}\sum_{t=1}^{T}r_{t}\mid x_{0}=x]\\
 31 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\
 32 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}'])\\
 33 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}V_{T-1}^{\pi}(x{}')])
 34 | \end{aligned}
 35 | $$
 36 | 
 37 | [解析]：
 38 | 
 39 | 因为
 40 | $$
 41 | \pi(x,a)=P(state=x\mid action=a)
 42 | $$
 43 | 表示在状态 **x**下选择动作 **a**的概率，
 44 | 
 45 | 又因为动作事件之间两两互斥且和为动作空间，由全概率展开公式
 46 | $$
 47 | P(A)=\sum_{i=1}^{\infty}P(B_{i})P(A\mid B_{i})
 48 | $$
 49 | 可得
 50 | $$
 51 | \begin{aligned}
 52 | &=\mathbb{E}_{\pi}[\frac{1}{T}r_{1}+\frac{T-1}{T}\frac{1}{T-1}\sum_{t=2}^{T}r_{t}\mid x_{0}=x]\\
 53 | &=\sum_{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(\frac{1}{T}R_{x\rightarrow x{}'}^{a}+\frac{T-1}{T}\mathbb{E}_{\pi}[\frac{1}{T-1}\sum_{t=1}^{T-1}r_{t}\mid x_{0}=x{}'])
 54 | \end{aligned}
 55 | $$
 56 | 其中
 57 | $$
 58 | r_{1}=\pi(x,a)P_{x\rightarrow x{}'}^{a}R_{x\rightarrow x{}'}^{a}
 59 | $$
 60 | 最后一个等式用到了递归形式。
 61 | 
 62 | 
 63 | 
 64 | ## 16.8
 65 | 
 66 | $$
 67 | V_{\gamma }^{\pi}(x)=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}'))
 68 | $$
 69 | 
 70 | [推导]：
 71 | $$
 72 | \begin{aligned}
 73 | V_{\gamma }^{\pi}(x)&=\mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x]\\
 74 | &=\mathbb{E}_{\pi}[r_{1}+\sum_{t=1}^{\infty}\gamma^{t}r_{t+1}\mid x_{0}=x]\\
 75 | &=\mathbb{E}_{\pi}[r_{1}+\gamma\sum_{t=1}^{\infty}\gamma^{t-1}r_{t+1}\mid x_{0}=x]\\
 76 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma \mathbb{E}_{\pi}[\sum_{t=0}^{\infty }\gamma^{t}r_{t+1}\mid x_{0}=x{}'])\\
 77 | &=\sum _{a\in A}\pi(x,a)\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{a}(R_{x\rightarrow x{}'}^{a}+\gamma V_{\gamma }^{\pi}(x{}'))
 78 | \end{aligned}
 79 | $$
 80 | 
 81 | ## 16.16
 82 | 
 83 | $$
 84 | V^{\pi}(x)\leq V^{\pi{}'}(x)
 85 | $$
 86 | 
 87 | [推导]：
 88 | $$
 89 | \begin{aligned}
 90 | V^{\pi}(x)&\leq Q^{\pi}(x,\pi{}'(x))\\
 91 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi}(x{}'))\\
 92 | &\leq \sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma Q^{\pi}(x{}',\pi{}'(x{}')))\\
 93 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma \sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}')))\\
 94 | &=\sum_{x{}'\in X}P_{x\rightarrow x{}'}^{\pi{}'(x)}(R_{x\rightarrow x{}'}^{\pi{}'(x)}+\gamma V^{\pi{}'}(x{}'))\\
 95 | &=V^{\pi{}'}(x)
 96 | \end{aligned}
 97 | $$
 98 | 其中，使用了动作改变条件
 99 | $$
100 | Q^{\pi}(x,\pi{}'(x))\geq V^{\pi}(x)
101 | $$
102 | 以及状态-动作值函数
103 | $$
104 | Q^{\pi}(x{}',\pi{}'(x{}'))=\sum_{x{}'\in X}P_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}(R_{x{}'\rightarrow x{}'}^{\pi{}'(x{}')}+\gamma V^{\pi}(x{}'))
105 | $$
106 | 于是，当前状态的最优值函数为
107 | 
108 | $$
109 | V^{\ast}(x)=V^{\pi{}'}(x)\geq V^{\pi}(x)
110 | $$
111 | 
112 | 
113 | 
114 | ## 16.31
115 | 
116 | $$
117 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\alpha (R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}')-Q_{t}^{\pi}(x,a))
118 | $$
119 | 
120 | [推导]：对比公式16.29
121 | $$
122 | Q_{t+1}^{\pi}(x,a)=Q_{t}^{\pi}(x,a)+\frac{1}{t+1}(r_{t+1}-Q_{t}^{\pi}(x,a))
123 | $$
124 | 以及由
125 | $$
126 | \frac{1}{t+1}=\alpha
127 | $$
128 | 可知
129 | $$
130 | r_{t+1}=R_{x\rightarrow x{}'}^{a}+\gamma Q_{t}^{\pi}(x{}',a{}')
131 | $$
132 | 而由γ折扣累积奖赏可估计得到。
133 | 
134 | 
135 | 
136 | 


--------------------------------------------------------------------------------
/docs/chapter14/chapter14.md:
--------------------------------------------------------------------------------
  1 | ## 14.26
  2 | 
  3 | $$p(x^t)T(x^{t-1}|x^t)=p(x^{t-1})T(x^t|x^{t-1})$$
  4 | 
  5 | [解析]：假设变量$x$所在的空间有$n$个状态($s_1,s_2,..,s_n$), 定义在该空间上的一个转移矩阵$T(n\times n)$如果满足一定的条件则该马尔可夫过程存在一个稳态分布$\pi$, 使得
  6 | $$
  7 | \begin{aligned}
  8 | \pi T=\pi
  9 | \end{aligned}
 10 | \tag{1}
 11 | $$
 12 | 其中, $\pi$是一个是一个$n$维向量，代表​$s_1,s_2,..,s_n$对应的概率. 反过来, 如果我们希望采样得到符合某个分布​$\pi$的一系列变量​$x_1,x_2,..,x_t$, 应当采用哪一个转移矩阵​$T(n\times n)​$呢？
 13 | 
 14 | 事实上，转移矩阵只需要满足马尔可夫细致平稳条件
 15 | $$
 16 | \begin{aligned}
 17 | \pi (i)T(i,j)=\pi (j)T(j,i)
 18 | \end{aligned}
 19 | \tag{2}
 20 | $$
 21 | 即公式$14.26​$，这里采用的符号与西瓜书略有区别以便于理解.  证明如下
 22 | $$
 23 | \begin{aligned}
 24 | \pi T(j) = \sum _i \pi (i)T(i,j) = \sum _i \pi (j)T(j,i) = \pi(j)
 25 | \end{aligned} 
 26 | \tag{3}
 27 | $$
 28 | 假设采样得到的序列为$x_1,x_2,..,x_{t-1},x_t$，则可以使用$MH$算法来使得$x_{t-1}$(假设为状态$s_i$)转移到$x_t$(假设为状态$s_j$)的概率满足式$(2)$.
 29 | 
 30 | ## 14.28
 31 | 
 32 | $$A(x^* | x^{t-1}) = \min\left ( 1,\frac{p(x^*)Q(x^{t-1} | x^*) }{p(x^{t-1})Q(x^* | x^{t-1})} \right )$$
 33 | 
 34 | [推导]：这个公式其实是拒绝采样的一个trick，因为基于式$14.27​$只需要
 35 | $$
 36 | \begin{aligned}
 37 |   A(x^* | x^{t-1}) &= p(x^*)Q(x^{t-1} | x^*)  \\
 38 |   A(x^{t-1} | x^*) &= p(x^{t-1})Q(x^* | x^{t-1})
 39 |  \end{aligned} 
 40 |  \tag{4}
 41 | $$
 42 | 即可满足式$14.26$，但是实际上等号右边的数值可能比较小，比如各为0.1和0.2，那么好不容易才到的样本只有百分之十几得到利用，所以不妨将接受率设为0.5和1，则细致平稳分布条件依然满足，样本利用率大大提高, 所以可以将$(4)$改进为
 43 | $$
 44 | \begin{aligned} 
 45 | A(x^* | x^{t-1}) &=  \frac{p(x^*)Q(x^{t-1} | x^*)}{norm}  \\  
 46 | A(x^{t-1} | x^*) &= \frac{p(x^{t-1})Q(x^* | x^{t-1}) }{norm}
 47 | \end{aligned}  
 48 | \tag{5}
 49 | $$
 50 | 其中
 51 | $$
 52 | \begin{aligned} 
 53 | norm = \max\left (p(x^{t-1})Q(x^* | x^{t-1}),p(x^*)Q(x^{t-1} | x^*) \right )
 54 | \end{aligned}  
 55 | \tag{6}
 56 | $$
 57 | 即教材的$14.28​$.
 58 | 
 59 | ## 14.32
 60 | 
 61 | $${\rm ln}p(x)=\mathcal{L}(q)+{\rm KL}(q \parallel p)$$ 
 62 | 
 63 | [推导]：根据条件概率公式$p(x,z)=p(z|x)*p(x)$，可以得到$p(x)=\frac{p(x,z)}{p(z|x)}$
 64 | 
 65 | 然后两边同时作用${\rm ln}$函数，可得${\rm ln}p(x)={\rm ln}\frac{p(x,z)}{p(z|x)}$    (1)
 66 | 
 67 | 因为$q(z)$是概率密度函数，所以$1=\int q(z)dz$
 68 | 
 69 | 等式两边同时乘以${\rm ln}p(x)$，因为${\rm ln}p(x)$是不关于变量$z$的函数，所以${\rm ln}p(x)$可以拿进积分里面，得到${\rm ln}p(x)=\int q(z){\rm ln}p(x)dz$
 70 | $$
 71 | \begin{aligned}
 72 | {\rm ln}p(x)&=\int q(z){\rm ln}p(x) \\
 73 |  &=\int q(z){\rm ln}\frac{p(x,z)}{p(z|x)}\qquad(带入公式(1))\\
 74 |  &=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\cdot\frac{q(z)}{p(z|x)}\bigg\} \\
 75 |  &=\int q(z)\bigg({\rm ln}\frac{p(x,z)}{q(z)}-{\rm ln}\frac{p(z|x)}{q(z)}\bigg) \\
 76 |   &=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\bigg\}-\int q(z){\rm ln}\frac{p(z|x)}{q(z)} \\
 77 |   &=\mathcal{L}(q)+{\rm KL}(q \parallel p)\qquad(根据\mathcal{L}和{\rm KL}的定义)
 78 | \end{aligned}
 79 | $$
 80 | 
 81 | 
 82 | ## 14.36
 83 | 
 84 | $$
 85 | \begin{aligned}
 86 | \mathcal{L}(q)&=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}} \\
 87 | &=\int q_{j}\bigg\{\int p(x,z)\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const} \\
 88 | &=\int q_{j}{\rm ln}\tilde{p}({\rm \mathbf{x},\mathbf{z_{j}}})d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const}
 89 | \end{aligned}
 90 | $$
 91 | 
 92 | [推导]：
 93 | $$
 94 | \mathcal{L}(q)=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}}=\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}-\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}
 95 | $$
 96 | 公式可以看做两个积分相减，我们先来看左边积分$\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}$的推导。
 97 | $$
 98 | \begin{aligned}
 99 | \int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} &= \int q_{j}\prod_{i\ne j}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} \\
100 | &= \int q_{j}\bigg\{\int{\rm ln}p({\rm \mathbf{x},\mathbf{z}})\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}\qquad (先对{\rm\mathbf{z_{j}}}求积分，再对{\rm\mathbf{z_{i}}}求积分)
101 | \end{aligned}
102 | $$
103 | 这个就是教材中的$14.36$左边的积分部分。
104 | 
105 | 我们现在看下右边积分的推导$\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}$的推导。
106 | 
107 | 在此之前我们看下$\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}$的计算
108 | $$
109 | \begin{aligned}
110 | \int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&= \int q_{i^{\prime}}\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}\qquad (选取一个变量q_{i^{\prime}}, i^{\prime}\ne k) \\
111 | &=\int q_{i^{\prime}}\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{i^{\prime}}}}
112 | \end{aligned}
113 | $$
114 | $\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}$部分与变量$q_{i^{\prime}}$无关，所以可以拿到积分外面。又因为$\int q_{i^{\prime}}d{\rm\mathbf{z_{i^{\prime}}}}=1$，所以
115 | $$
116 | \begin{aligned}
117 | \int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&=\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}} \\
118 | &= \int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (所有k以外的变量都可以通过上面的方式消除)
119 | \end{aligned}
120 | $$
121 | 有了这个结论，我们再来看公式
122 | $$
123 | \begin{aligned}
124 | \int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}&= \int\prod_{i}q_{i}{\rm ln}q_{j}d{\rm\mathbf{z}} + \sum_{k\ne j}\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}} \\
125 | &= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + \sum_{z\ne j}\int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (根据上面结论) \\
126 | &= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + {\rm const} \qquad (这里我们关心的是q_{j}，其他变量可以视为{\rm const})
127 | \end{aligned}
128 | $$
129 | 这个就是$14.36$右边的积分部分。
130 | 
131 | ## 14.40
132 | 
133 | $$
134 | \begin{aligned} 
135 | q_j^*(\mathbf{z}_j) = \frac{ \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) }{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \mathrm{d}\mathbf{z}_j}
136 | \end{aligned}
137 | $$
138 | 
139 | [推导]：由$14.39$去对数并积分
140 | $$
141 | \begin{aligned} 
142 |  \int q_j^*(\mathbf{z}_j)\mathrm{d}\mathbf{z}_j &=\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right )\cdot\exp(const) \, \mathrm{d}\mathbf{z}_j \\
143 |  &=\exp(const) \int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \, \mathrm{d}\mathbf{z}_j \\
144 |  &= 1
145 |  \end{aligned}
146 |  \tag{7}
147 | $$
148 | 所以
149 | $$
150 | \exp(const)  = \dfrac{1}{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \, \mathrm{d}\mathbf{z}_j}  \\
151 | \tag{8}
152 | $$
153 | 
154 | $$
155 | \begin{aligned} 
156 |   q_j^*(\mathbf{z}_j) &= \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right )\cdot\exp(const) \, \mathrm{d}\mathbf{z}_j \\
157 |  &= \frac{ \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) }{\int \exp\left ( \mathbb{E}_{i\neq j}[\ln (p(\mathbf{x},\mathbf{z}))] \right ) \mathrm{d}\mathbf{z}_j}
158 |  \end{aligned}
159 |  \tag{9}
160 | $$
161 | 


--------------------------------------------------------------------------------
/docs/chapter10/chapter10.md:
--------------------------------------------------------------------------------
  1 | ## 10.4
  2 | $$\sum^m_{i=1}dist^2_{ij}=tr(\boldsymbol B)+mb_{jj}$$
  3 | [推导]：
  4 | $$\begin{aligned}
  5 | \sum^m_{i=1}dist^2_{ij}&= \sum^m_{i=1}b_{ii}+\sum^m_{i=1}b_{jj}-2\sum^m_{i=1}b_{ij}\\
  6 | &=tr(B)+mb_{jj}
  7 | \end{aligned}​$$
  8 | 
  9 | ## 10.10
 10 | $$b_{ij}=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}-dist^2_{\cdot j}+dist^2_{\cdot\cdot})$$
 11 | [推导]：由公式（10.3）可得，
 12 | $$b_{ij}=-\frac{1}{2}(dist^2_{ij}-b_{ii}-b_{jj})$$
 13 | 由公式（10.6）和（10.9）可得，
 14 | $$\begin{aligned}
 15 | tr(\boldsymbol B)&=\frac{1}{2m}\sum^m_{i=1}\sum^m_{j=1}dist^2_{ij}\\
 16 | &=\frac{m}{2}dist^2_{\cdot\cdot}
 17 | \end{aligned}$$
 18 | 由公式（10.4）和（10.8）可得，
 19 | $$\begin{aligned}
 20 | b_{jj}&=\frac{1}{m}\sum^m_{i=1}dist^2_{ij}-\frac{1}{m}tr(\boldsymbol B)\\
 21 | &=dist^2_{\cdot j}-\frac{1}{2}dist^2_{\cdot\cdot}
 22 | \end{aligned}$$
 23 | 由公式（10.5）和（10.7）可得，
 24 | $$\begin{aligned}
 25 | b_{ii}&=\frac{1}{m}\sum^m_{j=1}dist^2_{ij}-\frac{1}{m}tr(\boldsymbol B)\\
 26 | &=dist^2_{i\cdot}-\frac{1}{2}dist^2_{\cdot\cdot}
 27 | \end{aligned}$$
 28 | 综合可得，
 29 | $$\begin{aligned}
 30 | b_{ij}&=-\frac{1}{2}(dist^2_{ij}-b_{ii}-b_{jj})\\
 31 | &=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}+\frac{1}{2}dist^2_{\cdot\cdot}-dist^2_{\cdot j}+\frac{1}{2}dist^2_{\cdot\cdot})\\
 32 | &=-\frac{1}{2}(dist^2_{ij}-dist^2_{i\cdot}-dist^2_{\cdot j}+dist^2_{\cdot\cdot})
 33 | \end{aligned}$$
 34 | 
 35 | ## 10.14
 36 | $$\begin{aligned}
 37 | \sum^m_{i=1}\| \sum^{d'}_{j=1}z_{ij}\boldsymbol w_j-\boldsymbol x_i \|^2_2&=\sum^m_{i=1}\boldsymbol z^T_i\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z^T_i\boldsymbol W^T\boldsymbol x_i + const\\
 38 | &\propto -tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)
 39 | \end{aligned}$$
 40 | [推导]：已知$\boldsymbol W^T \boldsymbol W=\boldsymbol I$和$\boldsymbol z_i=\boldsymbol W^T \boldsymbol x_i$，
 41 | $$\begin{aligned}
 42 | \sum^m_{i=1}\| \sum^{d'}_{j=1}z_{ij}\boldsymbol w_j-\boldsymbol x_i \|^2_2&=\sum^m_{i=1}\| \boldsymbol W\boldsymbol z_i-\boldsymbol x_i \|^2_2\\
 43 | &=\sum^m_{i=1}(\boldsymbol W\boldsymbol z_i)^T(\boldsymbol W\boldsymbol z_i)-2\sum^m_{i=1}(\boldsymbol W\boldsymbol z_i)^T\boldsymbol x_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
 44 | &=\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol W^T\boldsymbol x_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
 45 | &=\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i-2\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
 46 | &=-\sum^m_{i=1}\boldsymbol z_i^T\boldsymbol z_i+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
 47 | &=-tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)+\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i\\
 48 | &\propto -tr(\boldsymbol W^T(\sum^m_{i=1}\boldsymbol x_i\boldsymbol x^T_i)\boldsymbol W)
 49 | \end{aligned}$$
 50 | 其中，$\sum^m_{i=1}\boldsymbol x^T_i\boldsymbol x_i$是常数。
 51 | 
 52 | ## 10.17
 53 | $$
 54 | \boldsymbol X\boldsymbol X^T\boldsymbol w_i=\lambda _i\boldsymbol w_i
 55 | $$
 56 | [推导]：已知
 57 | $$\begin{aligned}
 58 | &\min\limits_{\boldsymbol W}-tr(\boldsymbol W^T\boldsymbol X\boldsymbol X^T\boldsymbol W)\\
 59 | &s.t. \boldsymbol W^T\boldsymbol W=\boldsymbol I. 
 60 | \end{aligned}$$
 61 | 运用拉格朗日乘子法可得，
 62 | $$\begin{aligned}
 63 | J(\boldsymbol W)&=-tr(\boldsymbol W^T\boldsymbol X\boldsymbol X^T\boldsymbol W+\boldsymbol\lambda'(\boldsymbol W^T\boldsymbol W-\boldsymbol I))\\
 64 | \cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W} &=\boldsymbol X\boldsymbol X^T\boldsymbol W+\boldsymbol\lambda'\boldsymbol W
 65 | \end{aligned}$$
 66 | 令$\cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W}=\boldsymbol 0$，故
 67 | $$\begin{aligned}
 68 | \boldsymbol X\boldsymbol X^T\boldsymbol W&=-\boldsymbol\lambda'\boldsymbol W\\
 69 | \boldsymbol X\boldsymbol X^T\boldsymbol W&=\boldsymbol\lambda\boldsymbol W\\
 70 | \end{aligned}$$
 71 | 其中，$\boldsymbol W=\{\boldsymbol w_1,\boldsymbol w_2,\cdot\cdot\cdot,\boldsymbol w_d\}$和$\boldsymbol \lambda=\boldsymbol{diag}(\lambda_1,\lambda_2,\cdot\cdot\cdot,\lambda_d)$。
 72 | 
 73 | ## 10.28
 74 | $$w_{ij}=\cfrac{\sum\limits_{k\in Q_i}C_{jk}^{-1}}{\sum\limits_{l,s\in Q_i}C_{ls}^{-1}}$$
 75 | [推导]：已知
 76 | $$\begin{aligned}
 77 | \min\limits_{\boldsymbol W}&\sum^m_{i=1}\| \boldsymbol x_i-\sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2\\
 78 | s.t.&\sum_{j \in Q_i}w_{ij}=1
 79 | \end{aligned}$$
 80 | 转换为
 81 | $$\begin{aligned}
 82 | \sum^m_{i=1}\| \boldsymbol x_i-\sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2 &=\sum^m_{i=1}\| \sum_{j \in Q_i}w_{ij}\boldsymbol x_i- \sum_{j \in Q_i}w_{ij}\boldsymbol x_j \|^2_2 \\
 83 | &=\sum^m_{i=1}\| \sum_{j \in Q_i}w_{ij}(\boldsymbol x_i- \boldsymbol x_j) \|^2_2\\
 84 | &=\sum^m_{i=1}\boldsymbol W^T_i(\boldsymbol x_i-\boldsymbol x_j)(\boldsymbol x_i-\boldsymbol x_j)^T\boldsymbol W_i\\
 85 | &=\sum^m_{i=1}\boldsymbol W^T_i\boldsymbol C_i\boldsymbol W_i
 86 | \end{aligned}$$
 87 | 其中，$\boldsymbol W_i=(w_{i1},w_{i2},\cdot\cdot\cdot,w_{ik})^T$，$k$是$Q_i$集合的长度，$\boldsymbol C_i=(\boldsymbol x_i-\boldsymbol x_j)(\boldsymbol x_i-\boldsymbol x_j)^T$，$j \in Q_i$。
 88 | $$
 89 | \sum_{j\in Q_i}w_{ij}=\boldsymbol W_i^T\boldsymbol 1_k=1
 90 | $$
 91 | 其中，$\boldsymbol 1_k$为k维全1向量。
 92 | 运用拉格朗日乘子法可得，
 93 | $$
 94 | J(\boldsymbol W)==\sum^m_{i=1}\boldsymbol W^T_i\boldsymbol C_i\boldsymbol W_i+\lambda(\boldsymbol W_i^T\boldsymbol 1_k-1)
 95 | $$
 96 | $$\begin{aligned}
 97 | \cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W_i} &=2\boldsymbol C_i\boldsymbol W_i+\lambda'\boldsymbol 1_k
 98 | \end{aligned}$$
 99 | 令$\cfrac{\partial J(\boldsymbol W)}{\partial \boldsymbol W_i}=0$，故
100 | $$\begin{aligned}
101 | \boldsymbol W_i&=-\cfrac{1}{2}\lambda\boldsymbol C_i^{-1}\boldsymbol 1_k\\
102 | \boldsymbol W_i&=\lambda\boldsymbol C_i^{-1}\boldsymbol 1_k\\
103 | \end{aligned}$$
104 | 其中，$\lambda$为一个常数。利用$\boldsymbol W^T_i\boldsymbol 1_k=1$，对$\boldsymbol W_i$归一化，可得
105 | $$
106 | \boldsymbol W_i=\cfrac{\boldsymbol C^{-1}_i\boldsymbol 1_k}{\boldsymbol 1_k\boldsymbol C^{-1}_i\boldsymbol 1_k}
107 | $$
108 | 
109 | ## 10.31
110 | $$\begin{aligned}
111 | &\min\limits_{\boldsymbol Z}tr(\boldsymbol Z \boldsymbol M \boldsymbol Z^T)\\
112 | &s.t. \boldsymbol Z^T\boldsymbol Z=\boldsymbol I. 
113 | \end{aligned}$$
114 | [推导]：
115 | $$\begin{aligned}
116 | \min\limits_{\boldsymbol Z}\sum^m_{i=1}\| \boldsymbol z_i-\sum_{j \in Q_i}w_{ij}\boldsymbol z_j \|^2_2&=\sum^m_{i=1}\|\boldsymbol Z\boldsymbol I_i-\boldsymbol Z\boldsymbol W_i\|^2_2\\
117 | &=\sum^m_{i=1}\|\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\|^2_2\\
118 | &=\sum^m_{i=1}(\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i))^T\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\\
119 | &=\sum^m_{i=1}(\boldsymbol I_i-\boldsymbol W_i)^T\boldsymbol Z^T\boldsymbol Z(\boldsymbol I_i-\boldsymbol W_i)\\
120 | &=tr((\boldsymbol I-\boldsymbol W)^T\boldsymbol Z^T\boldsymbol Z(\boldsymbol I-\boldsymbol W))\\
121 | &=tr(\boldsymbol Z(\boldsymbol I-\boldsymbol W)(\boldsymbol I-\boldsymbol W)^T\boldsymbol Z^T)\\
122 | &=tr(\boldsymbol Z\boldsymbol M\boldsymbol Z^T)
123 | \end{aligned}$$
124 | 其中，$\boldsymbol M=(\boldsymbol I-\boldsymbol W)(\boldsymbol I-\boldsymbol W)^T$。
125 | [解析]：约束条件$\boldsymbol Z^T\boldsymbol Z=\boldsymbol I$是为了得到标准化（标准正交空间）的低维数据。
126 | 


--------------------------------------------------------------------------------
/docs/chapter6/chapter6.md:
--------------------------------------------------------------------------------
  1 | ## 6.3
  2 | $$
  3 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \geqslant+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b \leqslant-1,} & {y_{i}=-1}\end{array}\right.
  4 | $$
  5 | [推导]：假设这个超平面是$\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}+b^{\prime}=0$，对于$\left(\boldsymbol{x}_{i}, y_{i}\right) \in D$，有：
  6 | $$
  7 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}>0,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime}<0,} & {y_{i}=-1}\end{array}\right.
  8 | $$
  9 | 根据几何间隔，将以上关系修正为：
 10 | $$
 11 | \left\{\begin{array}{ll}{\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \geq+\zeta,} & {y_{i}=+1} \\ {\left(\boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+b^{\prime} \leq-\zeta,} & {y_{i}=-1}\end{array}\right.
 12 | $$
 13 | 其中$\zeta$为某个大于零的常数，两边同除以$\zeta$，再次修正以上关系为：
 14 | $$
 15 | \left\{\begin{array}{ll}{\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \geq+1,} & {y_{i}=+1} \\ {\left(\frac{1}{\zeta} \boldsymbol{w}^{\prime}\right)^{\top} \boldsymbol{x}_{i}+\frac{b^{\prime}}{\zeta} \leq-1,} & {y_{i}=-1}\end{array}\right.
 16 | $$
 17 | 令：$\boldsymbol{w}=\frac{1}{\zeta} \boldsymbol{w}^{\prime}, b=\frac{b^{\prime}}{\zeta}$，则以上关系可写为：
 18 | $$
 19 | \left\{\begin{array}{ll}{\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \geq+1,} & {y_{i}=+1} \\ {\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b \leq-1,} & {y_{i}=-1}\end{array}\right.
 20 | $$
 21 | 
 22 | ## 6.8
 23 | $$
 24 | L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right)
 25 | $$
 26 | [推导]：
 27 | 待求目标:
 28 | $$\begin{aligned}
 29 | \min_{\boldsymbol{x}}\quad f(\boldsymbol{x})\\
 30 | s.t.\quad h(\boldsymbol{x})&=0\\
 31 | g(\boldsymbol{x}) &\leq 0
 32 | \end{aligned}$$
 33 | 
 34 | 等式约束和不等式约束：$h(\boldsymbol{x})=0, g(\boldsymbol{x}) \leq 0$分别是由一个等式方程和一个不等式方程组成的方程组。
 35 | 
 36 | 拉格朗日乘子：$\boldsymbol{\lambda}=\left(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{m}\right)$  $\qquad\boldsymbol{\mu}=\left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$
 37 | 
 38 | 拉格朗日函数：$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\boldsymbol{\lambda} h(\boldsymbol{x})+\boldsymbol{\mu} g(\boldsymbol{x})$
 39 | 
 40 | ## 6.9-6.10
 41 | $$\begin{aligned}
 42 | w &= \sum_{i=1}^m\alpha_iy_i\boldsymbol{x}_i \\
 43 | 0 &=\sum_{i=1}^m\alpha_iy_i
 44 | \end{aligned}​$$
 45 | [推导]：式（6.8）可作如下展开：
 46 | $$\begin{aligned}
 47 | L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)) \\
 48 | & =  \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m(\alpha_i-\alpha_iy_i \boldsymbol{w}^T\boldsymbol{x}_i-\alpha_iy_ib)\\
 49 | & =\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib
 50 | \end{aligned}​$$
 51 | 对$\boldsymbol{w}$和$b$分别求偏导数​并令其等于0：
 52 | 
 53 | $$\frac {\partial L}{\partial \boldsymbol{w}}=\frac{1}{2}\times2\times\boldsymbol{w} + 0 - \sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i-0= 0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i$$
 54 | 
 55 | $$\frac {\partial L}{\partial b}=0+0-0-\sum_{i=1}^{m}\alpha_iy_i=0  \Longrightarrow  \sum_{i=1}^{m}\alpha_iy_i=0$$		
 56 | 
 57 | ## 6.11
 58 | $$\begin{aligned}
 59 | \max_{\boldsymbol{\alpha}} & \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
 60 | s.t. & \sum_{i=1}^m \alpha_i y_i =0 \\ 
 61 | & \alpha_i \geq 0 \quad i=1,2,\dots ,m
 62 | \end{aligned}$$  
 63 | [推导]：将式 (6.9)代人 (6.8) ，即可将$L(\boldsymbol{w},b,\boldsymbol{\alpha})$ 中的 $\boldsymbol{w}$ 和 $b$ 消去,再考虑式 (6.10) 的约束,就得到式 (6.6) 的对偶问题：
 64 | $$\begin{aligned}
 65 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha})  &=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \\
 66 | &=\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i-\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_
 67 | i -b\sum _{i=1}^m\alpha_iy_i \\
 68 | & = -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i -b\sum _{i=1}^m\alpha_iy_i
 69 | \end{aligned}$$
 70 | 又$\sum\limits_{i=1}^{m}\alpha_iy_i=0$，所以上式最后一项可化为0，于是得：
 71 | $$\begin{aligned}
 72 | \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
 73 | &=-\frac {1}{2}(\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i)^T(\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i)+\sum _{i=1}^m\alpha_i \\
 74 | &=-\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\
 75 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
 76 | \end{aligned}$$
 77 | 所以
 78 | $$\max_{\boldsymbol{\alpha}}\min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) =\max_{\boldsymbol{\alpha}} \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j $$
 79 | 
 80 | 
 81 | 
 82 | 
 83 | ## 6.39
 84 | $$ C=\alpha_i +\mu_i $$
 85 | [推导]：对式（6.36）关于$\xi_i$求偏导并令其等于0可得：
 86 | ​                                                     
 87 | $$\frac{\partial L}{\partial \xi_i}=0+C \times 1 - \alpha_i \times 1-\mu_i
 88 | \times 1 =0\Longrightarrow C=\alpha_i +\mu_i$$
 89 | 
 90 | ## 6.40
 91 | $$\begin{aligned}
 92 | \max_{\boldsymbol{\alpha}}&\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
 93 |  s.t. &\sum_{i=1}^m \alpha_i y_i=0 \\ 
 94 |  &  0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m
 95 |  \end{aligned}$$
 96 | 将式6.37-6.39代入6.36可以得到6.35的对偶问题：
 97 | $$\begin{aligned}
 98 |  \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu}) &= \frac{1}{2}||\boldsymbol{w}||^2+C\sum_{i=1}^m \xi_i+\sum_{i=1}^m \alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))-\sum_{i=1}^m\mu_i \xi_i  \\
 99 | &=\frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))+C\sum_{i=1}^m \xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
100 | & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m C\xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\
101 | &  = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m (C-\alpha_i-\mu_i)\xi_i \\
102 | &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j
103 | \end{aligned}$$  
104 | 所以
105 | $$\begin{aligned}
106 | \max_{\boldsymbol{\alpha},\boldsymbol{\mu}} \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu})&=\max_{\boldsymbol{\alpha},\boldsymbol{\mu}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \\
107 | &=\max_{\boldsymbol{\alpha}}\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j 
108 | \end{aligned}$$
109 | 又
110 | $$\begin{aligned}
111 | \alpha_i &\geq 0 \\
112 | \mu_i &\geq 0 \\
113 | C &= \alpha_i+\mu_i
114 | \end{aligned}$$
115 | 消去$\mu_i$可得等价约束条件为：
116 | $$0 \leq\alpha_i \leq C \quad i=1,2,\dots ,m$$
117 | 
118 | 
119 | 


--------------------------------------------------------------------------------
/docs/chapter11/chapter11.md:
--------------------------------------------------------------------------------
  1 | ## 11.9
  2 | $$\left \| \nabla f(\boldsymbol x{}')-\nabla  f(\boldsymbol x) \right \|_{2}^{2} \leqslant L\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2} 　(\forall \boldsymbol x,\boldsymbol x{}'),$$
  3 | [解析]：*L-Lipschitz*条件定义为：对于函数$f(\boldsymbol x)$,若其任意定义域中的**x1**,**x2**，都存在L>0，使得$|f(\boldsymbol x1)-f(\boldsymbol x2)|≤L|\boldsymbol x1-\boldsymbol x2|$。通俗理解就是，对于函数上的每一对点，都存在一个实数L，使得它们连线斜率的绝对值不大于这个L，其中最小的L称为*Lipschitz*常数。
  4 | 　　　将公式变形可以更好的理解：$$\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x) \right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}\leqslant L 　(\forall \boldsymbol x,\boldsymbol x{}'),$$
  5 | 　　　进一步，如果$\boldsymbol x{}'\to  \boldsymbol x$，即$$\lim_{\boldsymbol x{}'\to \boldsymbol x}\frac{\left \| \nabla f(\boldsymbol x{}')-\nabla f(\boldsymbol x)\right \|_{2}^{2}}{\left \| \boldsymbol x{}'-\boldsymbol x \right \|_{2}^{2}}$$
  6 | 　　　“ *Lipschitz*连续”很常见，知乎有一个问答(https://www.zhihu.com/question/51809602) 对*Lipschitz*连续的解释很形象：以陆地为例， 连续就是说这块地上没有特别陡的坡；其中最陡的地方有多陡呢？这就是所谓的*Lipschitz*常数。 
  7 |    
  8 | ## 11.10
  9 | 
 10 | $$
 11 | \hat{f}(x) \simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2}
 12 | $$
 13 | 
 14 | [推导]：
 15 | $$
 16 | \begin{aligned}
 17 | \hat{f}(x) &\simeq f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \frac{L}{2}\left \| x-x_{k} \right\|^{2} \\
 18 | &= f(x_{k})+\langle \nabla f(x_{k}),x-x_{k} \rangle + \langle\frac{L}{2}(x-x_{k}),x-x_{k}\rangle \\
 19 | &= f(x_{k})+\langle \nabla f(x_{k})+\frac{L}{2}(x-x_{k}),x-x_{k} \rangle \\
 20 | &= f(x_{k})+\frac{L}{2}\langle\frac{2}{L}\nabla f(x_{k})+(x-x_{k}),x-x_{k} \rangle \\
 21 | &= f(x_{k})+\frac{L}{2}\langle x-x_{k}+\frac{1}{L}\nabla f(x_{k})+\frac{1}{L}\nabla f(x_{k}),x-x_{k}+\frac{1}{L}\nabla f(x_{k})-\frac{1}{L}\nabla f(x_{k}) \rangle \\
 22 | &= f(x_{k})+\frac{L}{2}\left\| x-x_{k}+\frac{1}{L}\nabla f(x_{k}) \right\|_{2}^{2} -\frac{1}{2L}\left\|\nabla f(x_{k})\right\|_{2}^{2} \\
 23 | &= \frac{L}{2}\left\| x-(x_{k}-\frac{1}{L}\nabla f(x_{k})) \right\|_{2}^{2} + const \qquad (因为f(x_{k})和\nabla f(x_{k})是常数)
 24 | \end{aligned}
 25 | $$
 26 | 
 27 | ## 11.13
 28 | $$\boldsymbol x_{\boldsymbol k+\boldsymbol 1}=\underset{\boldsymbol x}{argmin}\frac{L}{2}\left \| \boldsymbol x -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}$$
 29 | [推导]：假设目标函数为$g(\boldsymbol x)$，则
 30 | $$
 31 | \begin{aligned}
 32 | g(\boldsymbol x)
 33 | & =\frac{L}{2}\left \|\boldsymbol  x \boldsymbol -\boldsymbol z\right \|_{2}^{2}+\lambda \left \| \boldsymbol x \right \|_{1}\\
 34 | & =\frac{L}{2}\sum_{i=1}^{d}\left \| x^{i} -z^{i}\right \|_{2}^{2}+\lambda \sum_{i=1}^{d}\left \| x^{i} \right \|_{1} \\
 35 | & =\sum_{i=1}^{d}(\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |)&
 36 | \end{aligned}
 37 | $$
 38 | 由上式可见， $g(\boldsymbol x)$可以拆成 d个独立的函 数，求解式(11.13)可以分别求解d个独立的目标函数。 
 39 | 针对目标函数$g(x^{i})=\frac{L}{2}(x^{i}-z^{i})^{2}+\lambda \left | x^{i}\right |$，通过求导求解极值：
 40 | $$\frac{dg(x^{i})}{dx^{i}}=L(x^{i}-z^{i})+\lambda sgn(x^{i})$$
 41 | 其中$$sgn(x^{i})=\left\{\begin{matrix}
 42 | 1, &x^{i}>0\\ 
 43 |  -1,& x^{i}<0
 44 | \end{matrix}\right.$$
 45 | 令导数为0，可得：$$x^{i}=z^{i}-\frac{\lambda }{L}sgn(x^{i})$$可分为三种情况：
 46 | 1. 当$z^{i}>\frac{\lambda }{L}$时：
 47 |     (1）假设此时的根$x^{i}<0$，则$sgn(x^{i})=-1$，所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$，与假设矛盾。
 48 |     (2）假设此时的根$x^{i}>0$，则$sgn(x^{i})=1$，所以$x^{i}=z^{i}-\frac{\lambda }{L}>0$，成立。
 49 | 2. 当$z^{i}<-\frac{\lambda }{L}$时：
 50 |     (1）假设此时的根$x^{i}>0$，则$sgn(x^{i})=1$，所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$，与假设矛盾。
 51 |     (2）假设此时的根$x^{i}<0$，则$sgn(x^{i})=-1$，所以$x^{i}=z^{i}+\frac{\lambda }{L}<0$，成立。
 52 | 3. 当$\left |z^{i}  \right |<\frac{\lambda }{L}$时：
 53 |     (1）假设此时的根$x^{i}>0$，则$sgn(x^{i})=1$，所以$x^{i}=z^{i}-\frac{\lambda }{L}<0$，与假设矛盾。
 54 |     (2）假设此时的根$x^{i}<0$，则$sgn(x^{i})=-1$，所以$x^{i}=z^{i}+\frac{\lambda }{L}>0$，与假设矛盾，此时$x^{i}=0$为函数的极小值。
 55 | 综上所述可得函数闭式解如下：
 56 | $$x_{k+1}^{i}=\left\{\begin{matrix}
 57 | z^{i}-\frac{\lambda }{L}, &\frac{\lambda }{L}< z^{i}\\ 
 58 | 0, & \left |z^{i}  \right |\leqslant \frac{\lambda }{L}\\ 
 59 | z^{i}+\frac{\lambda }{L}, & z^{i}<-\frac{\lambda }{L}
 60 | \end{matrix}\right.$$
 61 | 
 62 | ## 11.18
 63 | $$\begin{aligned}
 64 | \underset{\boldsymbol B}{min}\left \|\boldsymbol  X-\boldsymbol B\boldsymbol A \right \|_{F}^{2}
 65 | & =\underset{b_{i}}{min}\left \| \boldsymbol X-\sum_{j=1}^{k}b_{j}\alpha ^{j} \right \|_{F}^{2}\\
 66 | & =\underset{b_{i}}{min}\left \| \left (\boldsymbol X-\sum_{j\neq i}b_{j}\alpha ^{j} \right )- b_{i}\alpha ^{i}\right \|_{F}^{2} \\
 67 | & =\underset{b_{i}}{min}\left \|\boldsymbol  E_{\boldsymbol i}-b_{i}\alpha ^{i} \right \|_{F}^{2} &
 68 | \end{aligned}
 69 | $$
 70 | [推导]：此处只推导一下$BA=\sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$，其中$\boldsymbol b_{\boldsymbol j}$表示**B**的第j列，$\boldsymbol \alpha ^{\boldsymbol j}$表示**A**的第j行。
 71 | 然后，用$b_{j}^{i}$，$\alpha _{j}^{i}$分别表示**B**和**A**的第i行第j列的元素，首先计算**BA**:
 72 | $$
 73 | \begin{aligned}
 74 | \boldsymbol B\boldsymbol A
 75 | & =\begin{bmatrix}
 76 | b_{1}^{1} &b_{2}^{1}  & \cdot  & \cdot  & \cdot  & b_{k}^{1}\\ 
 77 | b_{1}^{2} &b_{2}^{2}  & \cdot  & \cdot  & \cdot  & b_{k}^{2}\\ 
 78 | \cdot  & \cdot  & \cdot  &  &  & \cdot \\ 
 79 | \cdot  &  \cdot &  & \cdot  &  &\cdot  \\ 
 80 |  \cdot & \cdot  &  &  & \cdot  & \cdot \\ 
 81 |  b_{1}^{d}& b_{2}^{d}  & \cdot  & \cdot  &\cdot   &  b_{k}^{d}
 82 | \end{bmatrix}_{d\times k}\cdot 
 83 | \begin{bmatrix}
 84 | \alpha_{1}^{1} &\alpha_{2}^{1}  & \cdot  & \cdot  & \cdot  & \alpha_{m}^{1}\\ 
 85 | \alpha_{1}^{2} &\alpha_{2}^{2}  & \cdot  & \cdot  & \cdot  & \alpha_{m}^{2}\\ 
 86 | \cdot  & \cdot  & \cdot  &  &  & \cdot \\ 
 87 | \cdot  &  \cdot &  & \cdot  &  &\cdot  \\ 
 88 |  \cdot & \cdot  &  &  & \cdot  & \cdot \\ 
 89 |  \alpha_{1}^{k}& \alpha_{2}^{k}  & \cdot  & \cdot  &\cdot   &  \alpha_{m}^{k}
 90 | \end{bmatrix}_{k\times m} \\
 91 | & =\begin{bmatrix}
 92 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot  & \cdot  & \cdot  & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\ 
 93 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j}  & \cdot  & \cdot  & \cdot  & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\ 
 94 | \cdot  & \cdot  & \cdot  &  &  & \cdot \\ 
 95 | \cdot  &  \cdot &  & \cdot  &  &\cdot  \\ 
 96 |  \cdot & \cdot  &  &  & \cdot  & \cdot \\ 
 97 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j}  & \cdot  & \cdot  &\cdot   &  \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
 98 | \end{bmatrix}_{d\times m} &
 99 | \end{aligned}
100 | $$
101 | 然后计算$\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}$:
102 | $$
103 | \begin{aligned}
104 | \boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j}
105 | & =\begin{bmatrix}
106 | b_{1}^{j}\\ b_{w}^{j}
107 | \\ \cdot 
108 | \\ \cdot 
109 | \\ \cdot 
110 | \\ b_{d}^{j}
111 | \end{bmatrix}\cdot 
112 | \begin{bmatrix}
113 |  \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot  & \cdot  & \cdot  & \alpha _{m}^{j}
114 | \end{bmatrix}\\
115 | & =\begin{bmatrix}
116 | b_{j}^{1}\alpha _{1}^{j} &b_{j}^{1}\alpha _{2}^{j} & \cdot  & \cdot  & \cdot  & b_{j}^{1}\alpha _{m}^{j}\\ 
117 | b_{j}^{2}\alpha _{1}^{j} &b_{j}^{2}\alpha _{2}^{j}  & \cdot  & \cdot  & \cdot  & b_{j}^{2}\alpha _{m}^{j}\\ 
118 | \cdot  & \cdot  & \cdot  &  &  & \cdot \\ 
119 | \cdot  &  \cdot &  & \cdot  &  &\cdot  \\ 
120 |  \cdot & \cdot  &  &  & \cdot  & \cdot \\ 
121 | b_{j}^{d}\alpha _{1}^{j}& b_{j}^{d}\alpha _{2}^{j}  & \cdot  & \cdot  &\cdot   &  b_{j}^{d}\alpha _{m}^{j}
122 | \end{bmatrix}_{d\times m} &
123 | \end{aligned}
124 | $$
125 | 求和可得：
126 | $$
127 | \begin{aligned}
128 | \sum_{j=1}^{k}\boldsymbol b_{\boldsymbol j}\boldsymbol \alpha ^{\boldsymbol j} 
129 | & = \sum_{j=1}^{k}\left (\begin{bmatrix}
130 | b_{1}^{j}\\ b_{w}^{j}
131 | \\ \cdot 
132 | \\ \cdot 
133 | \\ \cdot 
134 | \\ b_{d}^{j}
135 | \end{bmatrix}\cdot 
136 | \begin{bmatrix}
137 |  \alpha _{1}^{j}& \alpha _{2}^{j} & \cdot  & \cdot  & \cdot  & \alpha _{m}^{j}
138 | \end{bmatrix} \right )\\
139 | & =\begin{bmatrix}
140 | \sum_{j=1}^{k}b_{j}^{1}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{1}\alpha _{2}^{j} & \cdot  & \cdot  & \cdot  & \sum_{j=1}^{k}b_{j}^{1}\alpha _{m}^{j}\\ 
141 | \sum_{j=1}^{k}b_{j}^{2}\alpha _{1}^{j} &\sum_{j=1}^{k}b_{j}^{2}\alpha _{2}^{j}  & \cdot  & \cdot  & \cdot  & \sum_{j=1}^{k}b_{j}^{2}\alpha _{m}^{j}\\ 
142 | \cdot  & \cdot  & \cdot  &  &  & \cdot \\ 
143 | \cdot  &  \cdot &  & \cdot  &  &\cdot  \\ 
144 |  \cdot & \cdot  &  &  & \cdot  & \cdot \\ 
145 | \sum_{j=1}^{k}b_{j}^{d}\alpha _{1}^{j}& \sum_{j=1}^{k}b_{j}^{d}\alpha _{2}^{j}  & \cdot  & \cdot  &\cdot   &  \sum_{j=1}^{k}b_{j}^{d}\alpha _{m}^{j}
146 | \end{bmatrix}_{d\times m} &
147 | \end{aligned}
148 | $$
149 | 


--------------------------------------------------------------------------------
/docs/chapter3/chapter3.md:
--------------------------------------------------------------------------------
  1 | ## 3.7
  2 | 
  3 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
  4 | 
  5 | [推导]：令式（3.5）等于0：
  6 | $$ 0 = w\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i $$
  7 | $$ w\sum_{i=1}^{m}x_i^2 = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}bx_i $$
  8 | 由于令式（3.6）等于0可得$ b=\cfrac{1}{m}\sum_{i=1}^{m}(y_i-wx_i) $，又$ \cfrac{1}{m}\sum_{i=1}^{m}y_i=\bar{y} $，$ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} $，则$ b=\bar{y}-w\bar{x} $，代入上式可得：
  9 | $$ 
 10 | \begin{aligned}	 
 11 |     w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\sum_{i=1}^{m}(\bar{y}-w\bar{x})x_i \\
 12 |     w\sum_{i=1}^{m}x_i^2 & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i+w\bar{x}\sum_{i=1}^{m}x_i \\
 13 |     w(\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i) & = \sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i \\
 14 |     w & = \cfrac{\sum_{i=1}^{m}y_ix_i-\bar{y}\sum_{i=1}^{m}x_i}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i}
 15 | \end{aligned}
 16 | $$
 17 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}y_i\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i $，$ \bar{x}\sum_{i=1}^{m}x_i=\cfrac{1}{m}\sum_{i=1}^{m}x_i\sum_{i=1}^{m}x_i=\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2 $，代入上式即可得式（3.7）：
 18 | $$ w=\cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2} $$
 19 | 
 20 | 【注】：式（3.7）还可以进一步化简为能用向量表达的形式，将$ \cfrac{1}{m}(\sum_{i=1}^{m}x_i)^2=\bar{x}\sum_{i=1}^{m}x_i $代入分母可得：
 21 | $$ 
 22 | \begin{aligned}	  
 23 |      w & = \cfrac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\bar{x}\sum_{i=1}^{m}x_i} \\
 24 |      & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x})}
 25 | \end{aligned}
 26 | $$
 27 | 又$ \bar{y}\sum_{i=1}^{m}x_i=\bar{x}\sum_{i=1}^{m}y_i=\sum_{i=1}^{m}\bar{y}x_i=\sum_{i=1}^{m}\bar{x}y_i=m\bar{x}\bar{y}=\sum_{i=1}^{m}\bar{x}\bar{y} $，则上式可化为：
 28 | $$ 
 29 | \begin{aligned}
 30 |     w & = \cfrac{\sum_{i=1}^{m}(y_ix_i-y_i\bar{x}-x_i\bar{y}+\bar{x}\bar{y})}{\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+\bar{x}^2)} \\
 31 |     & = \cfrac{\sum_{i=1}^{m}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{m}(x_i-\bar{x})^2} 
 32 | \end{aligned}
 33 | $$
 34 | 若令$ \boldsymbol{x}=(x_1,x_2,...,x_m) $，$ \boldsymbol{x}_{d} $为去均值后的$ \boldsymbol{x} $，$ \boldsymbol{y}=(y_1,y_2,...,y_m) $，$ \boldsymbol{y}_{d} $为去均值后的$ \boldsymbol{y} $，其中$ \boldsymbol{x} $、$ \boldsymbol{x}_{d} $、$ \boldsymbol{y} $、$ \boldsymbol{y}_{d} $均为m行1列的列向量，代入上式可得：
 35 | $$ w=\cfrac{\boldsymbol{y}_{d}^T\boldsymbol{x}_{d}}{\boldsymbol{x}_d^T\boldsymbol{x}_{d}}$$
 36 | ## 3.10
 37 | 
 38 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
 39 | 
 40 | [推导]：将$ E_{\hat{\boldsymbol w}}=(\boldsymbol{y}-\boldsymbol{X}\hat{\boldsymbol w})^T(\boldsymbol{y}-\mathbf{X}\hat{\boldsymbol w}) $展开可得：
 41 | $$ E_{\hat{\boldsymbol w}}= \boldsymbol{y}^T\boldsymbol{y}-\boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}-\hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}+\hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w} $$
 42 | 对$ \hat{\boldsymbol w} $求导可得：
 43 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= \cfrac{\partial \boldsymbol{y}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \boldsymbol{y}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}}-\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\boldsymbol{y}}{\partial \hat{\boldsymbol w}}+\cfrac{\partial \hat{\boldsymbol w}^T\mathbf{X}^T\mathbf{X}\hat{\boldsymbol w}}{\partial \hat{\boldsymbol w}} $$
 44 | 由向量的求导公式可得：
 45 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}= 0-\mathbf{X}^T\boldsymbol{y}-\mathbf{X}^T\boldsymbol{y}+(\mathbf{X}^T\mathbf{X}+\mathbf{X}^T\mathbf{X})\hat{\boldsymbol w} $$
 46 | $$ \cfrac{\partial E_{\hat{\boldsymbol w}}}{\partial \hat{\boldsymbol w}}=2\mathbf{X}^T(\mathbf{X}\hat{\boldsymbol w}-\boldsymbol{y}) $$
 47 | 
 48 | ## 3.27
 49 | 
 50 | $$ l(\beta)=\sum_{i=1}^{m}(-y_i\beta^T\hat{\boldsymbol x}_i+\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})) $$
 51 | 
 52 | [推导]：将式（3.26）代入式（3.25）可得：
 53 | $$ l(\beta)=\sum_{i=1}^{m}\ln\left(y_ip_1(\hat{\boldsymbol x}_i;\beta)+(1-y_i)p_0(\hat{\boldsymbol x}_i;\beta)\right) $$
 54 | 其中$ p_1(\hat{\boldsymbol x}_i;\beta)=\cfrac{e^{\beta^T\hat{\boldsymbol x}_i}}{1+e^{\beta^T\hat{\boldsymbol x}_i}},p_0(\hat{\boldsymbol x}_i;\beta)=\cfrac{1}{1+e^{\beta^T\hat{\boldsymbol x}_i}} $，代入上式可得：
 55 | $$\begin{aligned} 
 56 | l(\beta)&=\sum_{i=1}^{m}\ln\left(\cfrac{y_ie^{\beta^T\hat{\boldsymbol x}_i}+1-y_i}{1+e^{\beta^T\hat{\boldsymbol x}_i}}\right) \\
 57 | &=\sum_{i=1}^{m}\left(\ln(y_ie^{\beta^T\hat{\boldsymbol x}_i}+1-y_i)-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})\right) 
 58 | \end{aligned}$$
 59 | 由于$ y_i $=0或1，则：
 60 | $$ l(\beta) =
 61 | \begin{cases} 
 62 | \sum_{i=1}^{m}(-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})),  & y_i=0 \\
 63 | \sum_{i=1}^{m}(\beta^T\hat{\boldsymbol x}_i-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})), & y_i=1
 64 | \end{cases} $$
 65 | 两式综合可得：
 66 | $$ l(\beta)=\sum_{i=1}^{m}\left(y_i\beta^T\hat{\boldsymbol x}_i-\ln(1+e^{\beta^T\hat{\boldsymbol x}_i})\right) $$
 67 | 由于此式仍为极大似然估计的似然函数，所以最大化似然函数等价于最小化似然函数的相反数，也即在似然函数前添加负号即可得式（3.27）。
 68 | 
 69 | 【注】：若式（3.26）中的似然项改写方式为$ p(y_i|\boldsymbol x_i;\boldsymbol w,b)=[p_1(\hat{\boldsymbol x}_i;\beta)]^{y_i}[p_0(\hat{\boldsymbol x}_i;\beta)]^{1-y_i} $，再将其代入式（3.25）可得：
 70 | $$ l(\beta)=\sum_{i=1}^{m}\left(y_i\ln(p_1(\hat{\boldsymbol x}_i;\beta))+(1-y_i)\ln(p_0(\hat{\boldsymbol x}_i;\beta))\right) $$
 71 | 此式显然更易推导出式（3.27）
 72 | 
 73 | ## 3.30
 74 | 
 75 | $$\frac{\partial l(\beta)}{\partial \beta}=-\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-p_1(\hat{\boldsymbol x}_i;\beta))$$
 76 | 
 77 | [解析]：此式可以进行向量化，令$p_1(\hat{\boldsymbol x}_i;\beta)=\hat{y}_i$，代入上式得：
 78 | $$\begin{aligned}
 79 | 	\frac{\partial l(\beta)}{\partial \beta} &= -\sum_{i=1}^{m}\hat{\boldsymbol x}_i(y_i-\hat{y}_i) \\
 80 | 	& =\sum_{i=1}^{m}\hat{\boldsymbol x}_i(\hat{y}_i-y_i) \\
 81 | 	& ={\boldsymbol X^T}(\hat{\boldsymbol y}-\boldsymbol{y}) \\
 82 | 	& ={\boldsymbol X^T}(p_1(\boldsymbol X;\beta)-\boldsymbol{y}) \\
 83 | \end{aligned}$$
 84 | 
 85 | ## 3.32
 86 | 
 87 | $$J=\cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}$$
 88 | 
 89 | [推导]：
 90 | $$\begin{aligned}
 91 | 	J &= \cfrac{\big|\big|\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
 92 | 	&= \cfrac{\big|\big|(\boldsymbol w^T\mu_0-\boldsymbol w^T\mu_1)^T\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
 93 | 	&= \cfrac{\big|\big|(\mu_0-\mu_1)^T\boldsymbol w\big|\big|_2^2}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
 94 | 	&= \cfrac{[(\mu_0-\mu_1)^T\boldsymbol w]^T(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w} \\
 95 | 	&= \cfrac{\boldsymbol w^T(\mu_0-\mu_1)(\mu_0-\mu_1)^T\boldsymbol w}{\boldsymbol w^T(\Sigma_0+\Sigma_1)\boldsymbol w}
 96 | \end{aligned}$$
 97 | 
 98 | ## 3.37
 99 | 
100 | $$\boldsymbol S_b\boldsymbol w=\lambda\boldsymbol S_w\boldsymbol w$$
101 | 
102 | [推导]：由3.36可列拉格朗日函数：
103 | $$l(\boldsymbol w)=-\boldsymbol w^T\boldsymbol S_b\boldsymbol w+\lambda(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)$$
104 | 对$\boldsymbol w$求偏导可得：
105 | $$\begin{aligned}
106 | \cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} &= -\cfrac{\partial(\boldsymbol w^T\boldsymbol S_b\boldsymbol w)}{\partial \boldsymbol w}+\lambda \cfrac{(\boldsymbol w^T\boldsymbol S_w\boldsymbol w-1)}{\partial \boldsymbol w} \\
107 | 	&= -(\boldsymbol S_b+\boldsymbol S_b^T)\boldsymbol w+\lambda(\boldsymbol S_w+\boldsymbol S_w^T)\boldsymbol w
108 | \end{aligned}$$
109 | 又$\boldsymbol S_b=\boldsymbol S_b^T,\boldsymbol S_w=\boldsymbol S_w^T$，则：
110 | $$\cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} = -2\boldsymbol S_b\boldsymbol w+2\lambda\boldsymbol S_w\boldsymbol w$$
111 | 令导函数等于0即可得式3.37。
112 | 
113 | ## 3.43
114 | 
115 | $$\begin{aligned}
116 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
117 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
118 | \end{aligned}$$
119 | [推导]：由式3.40、3.41、3.42可得：
120 | $$\begin{aligned}
121 | \boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
122 | &= \sum_{i=1}^m(\boldsymbol x_i-\boldsymbol\mu)(\boldsymbol x_i-\boldsymbol\mu)^T-\sum_{i=1}^N\sum_{\boldsymbol x\in X_i}(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T \\
123 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x-\boldsymbol\mu)^T-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T\right)\right) \\
124 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x^T-\boldsymbol\mu^T)-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x^T-\boldsymbol\mu_i^T)\right)\right) \\
125 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(\boldsymbol x\boldsymbol x^T - \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T-\boldsymbol x\boldsymbol x^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
126 | &= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(- \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
127 | &= \sum_{i=1}^N\left(-\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol x^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol\mu^T+\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu_i^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol x^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
128 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T-m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
129 | &= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
130 | &= \sum_{i=1}^Nm_i\left(-\boldsymbol\mu_i\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol\mu_i^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
131 | &= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
132 | \end{aligned}$$
133 | 
134 | ## 3.44
135 | $$\max\limits_{\mathbf{W}}\cfrac{
136 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})}{tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})}$$
137 | [解析]：此式是式3.35的推广形式，证明如下：
138 | 设$\mathbf{W}=[\boldsymbol w_1,\boldsymbol w_2,...,\boldsymbol w_i,...,\boldsymbol w_{N-1}]$，其中$\boldsymbol w_i$为$d$行1列的列向量，则：
139 | $$\left\{
140 | \begin{aligned}
141 | tr(\mathbf{W}^T\boldsymbol S_b \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i \\
142 | tr(\mathbf{W}^T\boldsymbol S_w \mathbf{W})&=\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i
143 | \end{aligned}
144 | \right.$$
145 | 所以式3.44可变形为：
146 | $$\max\limits_{\mathbf{W}}\cfrac{
147 | \sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_b \boldsymbol w_i}{\sum_{i=1}^{N-1}\boldsymbol w_i^T\boldsymbol S_w \boldsymbol w_i}$$
148 | 对比式3.35易知上式即为式3.35的推广形式。
149 | 


--------------------------------------------------------------------------------
/docs/chapter9/chapter9.md:
--------------------------------------------------------------------------------
  1 | ## 9.33
  2 | 
  3 | $$
  4 | \sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0
  5 | $$
  6 | 
  7 | [推导]：根据公式(9.28)可知：
  8 | $$
  9 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
 10 | $$
 11 | 
 12 | 
 13 | 又根据公式(9.32)，由
 14 | $$
 15 | \frac {\partial LL(D)}{\partial \boldsymbol\mu_{i}}=0
 16 | $$
 17 | 可得
 18 | $$\begin{aligned}
 19 | \frac {\partial LL(D)}{\partial\boldsymbol\mu_{i}}&=\frac {\partial}{\partial \boldsymbol\mu_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
 20 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\mu_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
 21 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol{\mu_{i}}}(p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}}))}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})} \\
 22 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot \frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}\frac{\partial}{\partial \boldsymbol\mu_{i}}\left(-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 23 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(\left(\boldsymbol\Sigma_{i}^{-1}+\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\right)\cdot\left(\boldsymbol{x_{j}-\mu_{i}}\right)\cdot(-1)\right) \\
 24 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^{-1}\right)^T\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 25 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\left(\boldsymbol\Sigma_{i}^T\right)^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 26 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 27 | &=\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}(-\frac{1}{2})\left(-2\cdot\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 28 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}}) \\
 29 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p\left(\boldsymbol{x_{j}}|\boldsymbol\mu _{i},\boldsymbol\Sigma_{i}\right)}{\sum_{l=1}^k \alpha_{l}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{l},\boldsymbol\Sigma_{l})}(\boldsymbol{x_{j}-\mu_{i}})=0
 30 | \end{aligned}$$
 31 | 
 32 | ## 9.35
 33 | 
 34 | $$
 35 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
 36 | $$
 37 | 
 38 | [推导]：根据公式(9.28)可知：
 39 | $$
 40 | p(\boldsymbol{x_{j}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i}})=\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}
 41 | $$
 42 | 又根据公式(9.32)，由
 43 | $$
 44 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=0
 45 | $$
 46 | 可得
 47 | $$\begin{aligned}
 48 | \frac {\partial LL(D)}{\partial\boldsymbol\Sigma_{i}}&=\frac {\partial}{\partial \boldsymbol\Sigma_{i}}\sum_{j=1}^mln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
 49 | &=\sum_{j=1}^m\frac{\partial}{\partial\boldsymbol\Sigma_{i}}ln\Bigg(\sum_{i=1}^k \alpha_{i}\cdot p(\boldsymbol{x_{j}}|\boldsymbol\mu_{i},\boldsymbol\Sigma_{i})\Bigg) \\
 50 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
 51 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\\
 52 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{ln\left(\frac{1}{(2\pi)^\frac{n}{2}\left| \boldsymbol\Sigma_{i}\right |^\frac{1}{2}}e^{-\frac{1}{2}(\boldsymbol{x_{j}}-\boldsymbol\mu_{i})^T\boldsymbol\Sigma_{i}^{-1}(\boldsymbol{x_{j}-\mu_{i}})}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
 53 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot \frac{\partial}{\partial\boldsymbol\Sigma_{i}}e^{-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)}}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})} \\
 54 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(-\frac{n}{2}ln\left(2\pi\right)-\frac{1}{2}ln\left(|\boldsymbol\Sigma_{i}|\right)-\frac{1}{2}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right) \\
 55 | &=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left(-\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}\right)^T-\frac{1}{2}\frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)
 56 | \end{aligned}$$
 57 | 
 58 | 为求得
 59 | $$
 60 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
 61 | $$
 62 | 
 63 | 首先分析对$\boldsymbol \Sigma_{i}$中单一元素的求导，用$r$代表矩阵$\boldsymbol\Sigma_{i}$的行索引，$c$代表矩阵$\boldsymbol\Sigma_{i}$的列索引，则
 64 | $$\begin{aligned}
 65 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\frac{\partial\boldsymbol\Sigma_{i}^{-1}}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right) \\
 66 | &=-\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)
 67 | \end{aligned}$$
 68 | 设$B=\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)$，则
 69 | $$\begin{aligned}
 70 | B^T&=\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T \\
 71 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\left(\boldsymbol\Sigma_{i}^{-1}\right)^T \\
 72 | &=\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
 73 | \end{aligned}$$
 74 | 所以
 75 | $$\begin{aligned}
 76 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B\end{aligned}$$
 77 | 其中$B$为$n\times1$阶矩阵，$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$为$n$阶方阵，且$\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}$仅在$\left(r,c\right)$位置处的元素值为1，其它位置处的元素值均为$0$，所以
 78 | $$\begin{aligned}
 79 | \frac{\partial}{\partial\Sigma_{i_{rc}}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)=-B^T\frac{\partial\boldsymbol\Sigma_{i}}{\partial\Sigma_{i_{rc}}}B=-B_{r}\cdot B_{c}=-\left(B\cdot B^T\right)_{rc}=\left(-B\cdot B^T\right)_{rc}\end{aligned}$$
 80 | 即对$\boldsymbol\Sigma_{i}$中特定位置的元素的求导结果对应于$\left(-B\cdot B^T\right)$中相同位置的元素值，所以
 81 | $$\begin{aligned}
 82 | \frac{\partial}{\partial\boldsymbol\Sigma_{i}}\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)&=-B\cdot B^T\\
 83 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\right)^T\\
 84 | &=-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}
 85 | \end{aligned}$$
 86 | 
 87 | 因此最终结果为
 88 | $$
 89 | \frac {\partial LL(D)}{\partial \boldsymbol\Sigma_{i}}=\sum_{j=1}^m \frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol \mu_{i},\boldsymbol\Sigma_{i})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j},|\boldsymbol \mu_{l},\boldsymbol\Sigma_{l})}\left( -\frac{1}{2}\left(\boldsymbol\Sigma_{i}^{-1}-\boldsymbol\Sigma_{i}^{-1}\left(\boldsymbol{x_{j}-\mu_{i}}\right)\left(\boldsymbol{x_{j}-\mu_{i}}\right)^T\boldsymbol\Sigma_{i}^{-1}\right)\right)=0
 90 | $$
 91 | 
 92 | 整理可得
 93 | $$
 94 | \boldsymbol\Sigma_{i}=\frac{\sum_{j=1}^m\gamma_{ji}(\boldsymbol{x_{j}-\mu_{i}})(\boldsymbol{x_{j}-\mu_{i}})^T}{\sum_{j=1}^m}\gamma_{ji}
 95 | $$
 96 | 
 97 | ## 9.38
 98 | 
 99 | $$
100 | \alpha_{i}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
101 | $$
102 | 
103 | [推导]：基于公式(9.37)进行恒等变形：
104 | $$
105 | \sum_{j=1}^m\frac{p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\lambda=0
106 | $$
107 | 
108 | $$
109 | \Rightarrow\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
110 | $$
111 | 
112 | 对所有混合成分进行求和：
113 | $$
114 | \Rightarrow\sum_{i=1}^k\left(\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda\right)=0
115 | $$
116 | 
117 | $$
118 | \Rightarrow\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\sum_{i=1}^k\alpha_{i}\lambda=0
119 | $$
120 | 
121 | $$
122 | \Rightarrow\lambda=-\sum_{i=1}^k\sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}=-m
123 | $$
124 | 
125 | 又
126 | $$
127 | \sum_{j=1}^m\frac{\alpha_{i}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{i},\Sigma_{i}})}{\sum_{l=1}^k\alpha_{l}\cdot p(\boldsymbol x_{j}|\boldsymbol{\mu_{l},\Sigma_{l}})}+\alpha_{i}\lambda=0
128 | $$
129 | 
130 | $$
131 | \Rightarrow\sum_{j=1}^m\gamma_{ji}+\alpha_{i}\lambda=0
132 | $$
133 | 
134 | $$
135 | \Rightarrow\alpha_{i}=-\frac{\sum_{j=1}^m\gamma_{ji}}{\lambda}=\frac{1}{m}\sum_{j=1}^m\gamma_{ji}
136 | $$
137 | 
138 | 
139 | 
140 | ## 附录
141 | 参考公式
142 | $$
143 | \frac{\partial\boldsymbol x^TB\boldsymbol x}{\partial\boldsymbol x}=\left(B+B^T\right)\boldsymbol x 
144 | $$
145 | $$
146 | \frac{\partial}{\partial A}ln|A|=\left(A^{-1}\right)^T
147 | $$
148 | $$
149 | \frac{\partial}{\partial x}\left(A^{-1}\right)=-A^{-1}\frac{\partial A}{\partial x}A^{-1}
150 | $$
151 | 参考资料<br>
152 | Petersen, K. B. & Pedersen, M. S. *The Matrix Cookbook*.<br>
153 | Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer.
154 | 
155 | 
156 | 


--------------------------------------------------------------------------------
/docs/chapter13/chapter13.md:
--------------------------------------------------------------------------------
  1 | ## 13.1
  2 | 
  3 | $$p(\boldsymbol{x})=\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)$$
  4 | [解析]： 该式即为 9.4.3 节的式(9.29)，式(9.29)中的$k$个混合成分对应于此处的$N$个可能的类别
  5 | 
  6 | ## 13.2
  7 | $$
  8 | \begin{aligned} f(\boldsymbol{x}) &=\underset{j \in \mathcal{Y}}{\arg \max } p(y=j | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j, \Theta=i | \boldsymbol{x}) \\ &=\underset{j \in \mathcal{Y}}{\arg \max } \sum_{i=1}^{N} p(y=j | \Theta=i, \boldsymbol{x}) \cdot p(\Theta=i | \boldsymbol{x}) \end{aligned}
  9 | $$
 10 | [解析]：
 11 | 首先，该式的变量$\theta \in \{1,2,...,N\}$即为 9.4.3 节的式(9.30)中的 $\ z_j\in\{1,2,...k\}$
 12 | 从公式第 1 行到第 2 行是做了边际化（marginalization）；具体来说第 2 行比第 1 行多了$\theta$为了消掉$\theta$对其进行求和（若是连续变量则为积分）$\sum_{i=1}^N$
 13 | [推导]：从公式第 2 行到第 3 行推导如下
 14 | $$\begin{aligned} p(y = j,\theta = i \vert x) &= \cfrac {p(y=j, \theta=i,x)} {p(x)} \\
 15 | &=\cfrac{p(y=j ,\theta=i,x)}{p(\theta=i,x)}\cdot \cfrac{p(\theta=i,x)}{p(x)} \\
 16 | &=p(y=j\vert \theta=i,x)\cdot p(\theta=i\vert x)\end{aligned}$$
 17 | [解析]：
 18 | 其中$p(y=j\vert x)$表示$x$的类别$y$为第$j$个类别标记的后验概率（注意条件是已知$x$）； 
 19 | $p(y=j,\theta=i\vert x)$表示$x$的类别$y$为第$j$个类别标记且由第$i$个高斯混合成分生成的后验概率（注意条件是已知$x$ ）； 
 20 | $p(y=j,\theta=i,x)$表示第$i$个高斯混合成分生成的$x$其类别$y$为第$j$个类别标记的概率（注意条件是已知$\theta$和$x$，这里修改了西瓜书式(13.3)下方对$p(y=j\vert\theta=i,x)$的表述；
 21 | $p(\theta=i \vert x)$表示$x$由第$i$个高斯混合成分生成的后验概率（注意条件是已知$x$）；
 22 | 西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成，且每个类别对应一个高斯混合成分”，也就是说，如果已知$x$是由哪个高斯混合成分生成的，也就知道了其类别。而$p(y=j,\theta=i\vert x)$表示已知$\theta$和$x$ 的条件概率（其实已知$\theta$就足够，不需$x$的信息），因此
 23 | $$p(y=j\vert \theta=i,x)=
 24 | \begin{cases}
 25 | 1,&i=j \\
 26 | 0,&i\not=j
 27 | \end{cases}$$
 28 | ## 13.3
 29 | $$
 30 | p(\Theta=i | \boldsymbol{x})=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
 31 | $$
 32 | [解析]：该式即为 9.4.3 节的式(9.30)，具体推导参见有关式(9.30)的解释。
 33 | ## 13.4
 34 | $$
 35 | \begin{aligned} L L\left(D_{l} \cup D_{u}\right)=& \sum_{\left(x_{j}, y_{j}\right) \in D_{l}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right) \cdot p\left(y_{j} | \Theta=i, \boldsymbol{x}_{j}\right)\right) \\ &+\sum_{x_{j} \in D_{u}} \ln \left(\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)\right) \end{aligned}
 36 | $$
 37 | [解析]：由式(13.2)对概率$p(y=j\vert\theta =i,x)=$的分析，式中第 1 项中的$p(y_j\vert\theta =i,x_j)$ 为
 38 | $$p(y_j\vert \theta=i,x_j)=
 39 | \begin{cases}
 40 | 1,&y_i=i \\
 41 | 0,&y_i\not=i
 42 | \end{cases}$$
 43 | 该式第 1 项针对有标记样本$(x_i,y_i) \in D_i$来说，因为有标记样本的类别是确定的，因此在计算它的对数似然时，它只可能来自$N$个高斯混合成分中的一个（西瓜书第 296 页第 2 行提到“假设样本由高斯混合模型生成，且每个类别对应一个高斯混合成分”），所以计算第 1 项计算有标记样本似然时乘以了$p(y_j\vert\theta =i,x_j)$ ；
 44 | 该式第 2 项针对未标记样本$x_j\in D_u$；来说的，因为未标记样本的类别不确定，即它可能来自$N$个高斯混合成分中的任何一个，所以第 1 项使用了式(13.1)。
 45 | ## 13.5
 46 | $$
 47 | \gamma_{j i}=\frac{\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}{\sum_{i=1}^{N} \alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)}
 48 | $$
 49 | [解析]：该式与式(13.3)相同，即后验概率。 可通过有标记数据对模型参数$(\alpha_i,\mu_i,\Sigma_i)$进行初始化，具体来说：
 50 | $$\alpha_i = \cfrac{l_i}{|D_l|},where |D_l| = \sum_{i=1}^N l_i$$
 51 | $$\mu_i = \cfrac{1}{l_i}\sum_{(x_j,y_j) \in D_l\wedge y_i=i}(x_j-\mu_j)(x_j-\mu_j)^T$$
 52 | $$
 53 | \Sigma_{i}=\frac{1}{l_{i}} \sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left( x_{j}- \mu_{i}\right)\left( x_{j}-\mu_{i}\right)^{\top}
 54 | $$
 55 | 其中$l_i$表示第$i$类样本的有标记样本数目，$|D_l|$为有标记样本集样本总数，$\wedge$为“逻辑与”。
 56 | ## 13.6
 57 | $$
 58 | \boldsymbol{\mu}_{i}=\frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \boldsymbol{x}_{j}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{x}_{j}\right)
 59 | $$
 60 | [推导]：类似于式(9.34)该式由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=0$而得，将式(13.4)的两项分别记为：
 61 | $$LL(D_l)=\sum_{(x_j,y_j \in D_l)}ln(\sum_{s=1}^{N}\alpha_s \cdot p(x_j \vert \mu_s,\Sigma_s) \cdot p(y_i|\theta = s,x_j)$$
 62 | $$LL(D_u)=\sum_{x_j \in D_u} ln(\sum_{s=1}^N \alpha_s \cdot p(x_j | \mu_s,\Sigma_s))$$
 63 | 对于式(13.4)中的第 1 项$LL(D_l)$，由于$p(y_j\vert \theta=i,x_j)$取值非1即0（详见13.2,13.4分析），因此
 64 | $$LL(D_l)=\sum_{(x_j,y_j)\in D_l} ln(\alpha_{y_j} \cdot p(x_j|\mu_{y_j}, \Sigma_{y_j}))$$
 65 | 若求$LL(D_l)$对$\mu_i$的偏导，则$LL(D_l)$求和号中只有$y_j=i$ 的项能留下来，即
 66 | 
 67 | $$\begin{aligned}
 68 | \cfrac{\partial LL(D_l) }{\partial \mu_i} &=
 69 | \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\mu_i}\\
 70 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot  \cfrac{\partial p(x_j|\mu_i,\Sigma_i)}{\partial\mu_i}\\
 71 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{p(x_j|\mu_i,\Sigma_i) }\cdot  p(x_j|\mu_i,\Sigma_i)  \cdot \Sigma_i^{-1}(x_j-\mu_i)\\
 72 | &=\sum_{x_j \in D_u } \Sigma_i^{-1}(x_j-\mu_i)
 73 | \end{aligned}$$
 74 | 
 75 | 对于式(13.4)中的第 2 项$LL(D_u)$，求导结果与式(9.33)的推导过程一样
 76 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}=\sum_{x_j \in {D_u}} \cfrac{\alpha_i}{\sum_{s=1}^N \alpha_s \cdotp(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i )\cdot \Sigma_i^{-1}(x_j-\mu_i)$$
 77 | $$=\sum_{x_j \in D_u }\gamma_{ji} \cdot  \Sigma_i^{-1}(x_j-\mu_i)$$
 78 | 综合两项结果，则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i}$为
 79 | $$
 80 | \begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \mu_{i}} &=\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \Sigma_{i}^{-1}\left(x_{j}-\mu_{i}\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(x_{j}-\mu_{i}\right)+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot\left(x_{j}-\mu_{i}\right)\right) \\ &=\Sigma_{i}^{-1}\left(\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}+\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}-\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \mu_{i}-\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}\right) \end{aligned}
 81 | $$
 82 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=0$，两边同时左乘$\Sigma_i$可将$\Sigma_i^{-1}$消掉，移项即得
 83 | $$
 84 | \sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot \mu_{i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{t} \wedge y_{j}=i} \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
 85 | $$
 86 | 上式中， 可以作为常量提到求和号外面，而$\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} 1=l_{i}$，即第 类样本的有标记 样本数目，因此
 87 | $$
 88 | \left(\sum_{x_{j} \in D_{u}} \gamma_{j i}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \backslash y_{j}=i} 1\right) \mu_{i}=\sum_{x_{j} \in D_{u}} \gamma_{j i} \cdot x_{j}+\sum_{\left(x_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} x_{j}
 89 | $$
 90 | 即得式(13.6)；
 91 | ## 13.7
 92 | $$
 93 | \begin{aligned} \boldsymbol{\Sigma}_{i}=& \frac{1}{\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}}\right.\\+& \sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\mathrm{T}} ) \end{aligned}
 94 | $$
 95 | [推导]：类似于13.6 由$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}=0$得，化简过程同13.6过程类似
 96 | 对于式(13.4)中的第 1 项$LL(D_l)$ ，类似于刚才式(13.6)的推导过程;
 97 | $$
 98 | \begin{aligned} \frac{\partial L L\left(D_{l}\right)}{\partial \boldsymbol{\Sigma}_{i}} &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{\partial \ln \left(\alpha_{i} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)\right)}{\partial \boldsymbol{\Sigma}_{i}} \\ &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot \frac{\partial p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right)}{\partial \boldsymbol{\Sigma}_{i}} \\
 99 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \frac{1}{p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \mathbf{\Sigma}_{i}\right)} \cdot p\left(\boldsymbol{x}_{j} | \boldsymbol{\mu}_{i}, \boldsymbol{\Sigma}_{i}\right) \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}\\
100 | &=\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\mathbf{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
101 | \end{aligned}
102 | $$
103 | 对于式(13.4)中的第 2 项$LL(D_u)$ ，求导结果与式(9.35)的推导过程一样；
104 | $$
105 | \frac{\partial L L\left(D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
106 | $$
107 | 综合两项结果，则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \Sigma_i}$为
108 | $$\begin{aligned} \frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\mu}_{i}}=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1} \\
109 | &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right)\right.\\ &+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}-\boldsymbol{I}\right) ) \cdot \frac{1}{2} \boldsymbol{\Sigma}_{i}^{-1}
110 | \end{aligned}
111 | $$
112 | 令$\frac{\partial L L\left(D_{l} \cup D_{u}\right)}{\partial \boldsymbol{\Sigma}_{i}}=0$，两边同时右乘$2\Sigma_i$可将 $\cfrac{1}{2}\Sigma_i^{-1}$消掉，移项即得
113 | $$
114 | \begin{aligned} \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+& \sum_{\left(\boldsymbol{x}_{j}, y_{j} \in D_{l} \wedge y_{j}=i\right.} \boldsymbol{\Sigma}_{i}^{-1}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top} \\=& \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot \boldsymbol{I}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i} \boldsymbol{I} \\ &=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{I} \end{aligned}
115 | $$
116 | 两边同时左乘以$\Sigma_i$，上式变为
117 | $$
118 | \sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i} \cdot\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}+\sum_{\left(\boldsymbol{x}_{j}, y_{j}\right) \in D_{l} \wedge y_{j}=i}\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)\left(\boldsymbol{x}_{j}-\boldsymbol{\mu}_{i}\right)^{\top}=\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right) \boldsymbol{\Sigma}_{i}
119 | $$
120 | 即得式(13.7)；
121 | ## 13.8
122 | $$
123 | \alpha_{i}=\frac{1}{m}\left(\sum_{\boldsymbol{x}_{j} \in D_{u}} \gamma_{j i}+l_{i}\right)
124 | $$
125 | [推导]：类似于式(9.36)，写出$LL(D_l \cup D_u)$的拉格朗日形式
126 | $$\begin{aligned}
127 | \mathcal{L}(D_l \cup D_u,\lambda) &= LL(D_l \cup D_u)+\lambda(\sum_{s=1}^N \alpha_s -1)\\
128 | & =LL(D_l)+LL(D_u)+\lambda(\sum_{s=1}^N \alpha_s - 1)\\
129 | \end{aligned}$$
130 | 类似于式(9.37)，对$\alpha_i$求偏导。对于LL(D_u)，求导结果与式(9.37)的推导过程一样：
131 | $$\cfrac{\partial LL(D_u)}{\partial\alpha_i} = \sum_{x_j \in D_u} \cfrac{1}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j|\mu_s,\Sigma_s)} \cdot p(x_j|\mu_i,\Sigma_i)$$
132 | 对于$LL(D_l)$，类似于类似于(13.6)和(13.7)的推导过程
133 | $$\begin{aligned}
134 | \cfrac{\partial LL(D_l)}{\partial\alpha_i} &= \sum_{(x_i,y_i)\in D_l \wedge y_j=i} \cfrac{\partial ln(\alpha_i \cdot p(x_j| \mu_i,\Sigma_i))}{\partial\alpha_i}\\
135 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{ \alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot  \cfrac{\partial (\alpha_i \cdot  p(x_j|\mu_i,\Sigma_i))}{\partial \alpha_i}\\
136 | &=\sum_{(x_i,y_i)\in D_l \wedge y_j=i}\cfrac{1}{\alpha_i \cdot p(x_j|\mu_i,\Sigma_i) }\cdot  p(x_j|\mu_i,\Sigma_i)  \\
137 | &=\cfrac{1}{\alpha_i} \cdot \sum_{(x_i,y_i)\in D_l \wedge y_j=i} 1 \\
138 | &=\cfrac{l_i}{\alpha_i}
139 | \end{aligned}$$
140 | 上式推导过程中，重点注意变量是$\alpha_i$ ，$p(x_j|\mu_i,\Sigma_i)$是常量；最后一行$\alpha_i$相对于求和变量为常量，因此作为公因子提到求和号外面； 为第$i$类样本的有标记样本数目。
141 | 综合两项结果，则$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}$为
142 | $$\cfrac{\partial LL(D_l \cup D_u) }{\partial \mu_i} = \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda$$
143 | 令$\cfrac{\partial LL(D_l \cup D_u) }{\partial \alpha_i}=0$并且两边同乘以$\alpha_i$,得
144 | $$ \alpha_i \cdot \cfrac{l_i}{\alpha_i} + \sum_{x_j \in D_u} \cfrac{\alpha_i \cdot  p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}+\lambda \cdot \alpha_i=0$$
145 | 结合式(9.30)发现，求和号内即为后验概率$\gamma_{ji}$,即
146 | $$l_i+\sum_{x_i \in D_u} \gamma_{ji}+\lambda \alpha_i = 0$$
147 | 对所有混合成分求和，得
148 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N  \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
149 | 这里$\Sigma_{i=1}^N \alpha_i =1$ ，因此$\sum_{i=1}^N \lambda \alpha_i=\lambda\sum_{i=1}^N \alpha_i=\lambda$
150 | 根据（9.30）中$\gamma_{ji}$表达式可知
151 | $$\sum_{i=1}^N \gamma_{ji} =  \sum_{i =1}^{N} \cfrac{\alpha_i \cdot  p(x_j|\mu_i,\Sigma_i)}{\Sigma_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}=  \cfrac{\sum_{i =1}^{N}\alpha_i \cdot  p(x_j|\mu_i,\Sigma_i)}{\sum_{s=1}^N \alpha_s \cdot p(x_j| \mu_s, \Sigma_s)}=1$$
152 | 再结合加法满足交换律，所以
153 | $$\sum_{i=1}^N  \sum_{x_i \in D_u} \gamma_{ji}=\sum_{x_i \in D_u} \sum_{i=1}^N  \gamma_{ji} =\sum_{x_i \in D_u} 1=u$$
154 | 以上分析过程中，$\sum_{x_j\in D_u}$ 形式与$\sum_{j=1}^u$等价，其中u为未标记样本集的样本个数； $\sum_{i=1}^Nl_i=l$其中$l$为有标记样本集的样本个数；将这些结果代入
155 | $$\sum_{i=1}^N l_i+\sum_{i=1}^N  \sum_{x_i \in D_u} \gamma_{ji}+\sum_{i=1}^N \lambda \alpha_i = 0$$
156 | 解出$l+u+\lambda = 0$  且$l+u =m$ 其中$m$为样本总个数，移项即得$\lambda = -m$
157 | 最后带入整理解得
158 | $$l_i + \Sigma_{X_j \in{D_u}} \gamma_{ji}-m \alpha_i = 0$$
159 | 整理即得式(13.8)；
160 | 
161 | 
162 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------