├── .gitignore
├── README.md
├── codes
├── MyAdaBoost.py
├── MyDecisionTree.py
├── MyEM.py
├── MyHMM.py
├── MyHMMTestData.txt
├── MyHMMTrainData.txt
├── MyKNN.py
├── MyLogisticRegression.py
├── MyMaxEnt.py
├── MyNaiveBayes.py
├── MyPerceptron.py
├── MySVM.py
└── iris.csv
├── docImage
├── 10_1_1.jpg
├── 10_2_1.jpg
├── 11_1_1.jpg
├── 11_1_2.jpg
├── 11_2_1.jpg
├── 11_2_2.jpg
├── 11_2_3.jpg
├── 8_1_1.jpg
├── 8_1_2.jpg
├── 8_1_3.jpg
├── 8_2_1.jpg
├── 8_2_2.jpg
├── 8_2_3.jpg
├── 8_2_4.jpg
├── 9_1_1.jpg
├── 9_1_2.jpg
├── 9_2_1.jpg
├── 9_2_2.jpg
├── 9_2_3.jpg
├── Maximum_separation_hyperplane1.jpg
├── Maximum_separation_hyperplane2.jpg
├── Maximum_separation_hyperplane3.jpg
├── Maximum_separation_hyperplane4.jpg
├── Novikoff1.jpg
├── Novikoff2.jpg
├── Novikoff3.jpg
├── Soft_interval_maximization_dual1.jpg
├── Soft_interval_maximization_dual2.jpg
├── Soft_interval_maximization_dual3.jpg
├── bayes_naive_bayes1.jpg
├── bayes_naive_bayes2.jpg
├── bayesian_estimation.jpg
├── hoeffding1.jpg
├── hoeffding2.jpg
├── iterative_method1.jpg
├── iterative_method2.jpg
├── iterative_method3.jpg
├── lagrange_duality1.jpg
├── lagrange_duality2.jpg
├── lagrange_duality3.jpg
├── maximum_entropy1.jpg
├── maximum_entropy2.jpg
├── maximum_likelihood_estimation.jpg
├── mle_naive_bayes.jpg
├── poster_prob1.jpg
└── poster_prob2.jpg
└── notes
├── chapter1.pdf
├── chapter10.pdf
├── chapter11.pdf
├── chapter2.pdf
├── chapter3.pdf
├── chapter4.pdf
├── chapter5.pdf
├── chapter6.pdf
├── chapter7.pdf
├── chapter8.pdf
└── chapter9.pdf
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | .DS_Store
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Statistic-study-notes
2 | # 李航统计学习方法(第二版)的学习笔记,包括:
3 | ## 1、每章重点数学公式的手动推导
4 |
均为手写然后扫描成图片,字迹不工整还望谅解,之后有时间会用Latex修正
5 | 点击数学公式没有出现图片的情况 需要搭梯子才可以在线预览到数学推导的图片...
6 |
7 | - [1.第一章数学公式推导](#1第一章数学公式推导)
8 | - [1.1 极大似然估计推导](#11极大似然估计推导)
9 | - [1.2 贝叶斯估计推导](#12贝叶斯估计推导)
10 | - [1.3 利用Hoeffding推导泛化误差上界](#13利用Hoeffding推导泛化误差上界)
11 | - [2.第二章数学公式推导](#2第二章数学公式推导)
12 | - [2.1 算法的收敛性证明Novikoff](#21算法的收敛性证明Novikoff)
13 | - [3.第三章数学公式推导](#3第三章数学公式推导)
14 | - 3.1 无数学推导,偏重算法实现-KNN
15 | - [4.第四章数学公式推导](#4第四章数学公式推导)
16 | - [4.1 用极大似然法估计朴素贝叶斯参数](#41用极大似然法估计朴素贝叶斯参数)
17 | - [4.2 用贝叶斯估计法朴素贝叶斯参数](#42用贝叶斯估计法朴素贝叶斯参数)
18 | - [4.3 证明后验概率最大化即期望风险最小化](#43证明后验概率最大化即期望风险最小化)
19 | - [5.第五章数学公式推导](#5第五章数学公式推导)
20 | - 5.1 无数学推导,偏重算法实现-决策树
21 | - [6.第六章数学公式推导](#6第六章数学公式推导)
22 | - [6.1 最大熵模型的数学推导](#61最大熵模型的数学推导)
23 | - [6.2 拉格朗日对偶性问题的数学推导](#62拉格朗日对偶性问题的数学推导)
24 | - [6.3 改进的迭代尺度法数学推导](#63改进的迭代尺度法数学推导)
25 | - [7.第七章数学公式推导](#7第七章数学公式推导)
26 | - [7.1 软间隔最大化对偶问题](#71软间隔最大化对偶问题)
27 | - [7.2 证明最大间隔分离超平面存在唯一性](#72证明最大间隔分离超平面存在唯一性)
28 | - [8.第八章数学公式推导](#8第八章数学公式推导)
29 | - [8.1 证明AdaBoost是前向分步加法算法的特例](#81证明AdaBoost是前向分步加法算法的特例)
30 | - [8.2 证明AdaBoost的训练误差界](#82证明AdaBoost的训练误差界)
31 | - [9.第九章数学公式推导](#9第九章数学公式推导)
32 | - [9.1 EM算法的导出](#91EM算法的导出)
33 | - [9.2 用EM算法估计高斯模混合模型](#92用EM算法估计高斯模混合模型)
34 | - [10.第十章数学公式推导](#10第十章数学公式推导)
35 | - [10.1 前向算法两个公式的证明](#101前向算法两个公式的证明)
36 | - [10.2 维特比算法推导](#102维特比算法推导)
37 | - [11.第十一章数学公式推导](#11第十一章数学公式推导)
38 | - [11.1 条件随机场的矩阵形式推导](#111条件随机场的矩阵形式推导)
39 | - [11.2 牛顿法和拟牛顿法的推导](#112牛顿法和拟牛顿法的推导)
40 |
41 |
42 |
43 | ## 2、每章算法的Python自实现
44 | [数据集为iris.csv(带Header)](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/iris.csv)
45 | ### 第2章 感知机模型(使用Iris数据集)
46 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyPerceptron.py)
47 | ### 第3章 KNN模型(线性-使用Iris数据集 与 KD树-有点问题..修改后再上传)
48 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyKNN.py)
49 | ### 第4章 朴素贝叶斯模型(使用Iris数据集)
50 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyNaiveBayes.py)
51 | ### 第5章 决策树模型(使用Iris数据集)
52 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyDecisionTree.py)
53 | ### 第6章 逻辑斯提回归模型(使用Iris数据集,采用梯度下降方法)
54 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyLogisticRegression.py)
55 | ### 第6章 最大熵模型(使用Iris数据集)
56 | 源代码[MyMaxEnt.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyMaxEnt.py)
57 | ### 第7章 SVM(使用Iris数据集)
58 | 源代码[MySVM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MySVM.py)
59 | ### 第8章 AdaBoost(使用Iris数据集)
60 | 源代码[MyAdaBoost.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyAdaBoost.py)
61 | ### 第9章 EM算法(使用自己随机生成的符合高斯分布的数据)
62 | 源代码[MyEM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyEM.py)
63 | ### 第10章 HMM算法(使用人民日报语料库进行训练,对输入的文本进行分词,12.8前完成)
64 | 源代码[MyHMM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyHMM.py)
65 |
66 | ## 3、学习笔记汇总
67 |
学习笔记均为自己学习过程中记录在笔记本上然后拍照扫描成pdf
68 | ### [第1章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter1.pdf)
69 | ### [第2章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter2.pdf)
70 | ### [第3章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter3.pdf)
71 | ### [第4章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter4.pdf)
72 | ### [第5章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter5.pdf)
73 | ### [第6章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter6.pdf)
74 | ### [第7章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter7.pdf)
75 | ### [第8章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter8.pdf)
76 | ### [第9章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter9.pdf)
77 | ### [第10章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter10.pdf)
78 | ### [第11章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter11.pdf)
79 |
80 |
81 | ## 4、每章节的课后习题实现
82 |
接下来每周都会定时更新课后习题的实现
83 |
84 | ## 1第一章数学公式推导
85 |
86 | ### 1.1极大似然估计推导
87 |
88 | 
89 |
90 | ### 1.2贝叶斯估计推导
91 |
92 | 
93 |
94 |
95 | ### 1.3利用Hoeffding推导泛化误差上界
96 |
97 | 
98 |
99 | 
100 |
101 | ## 2第二章数学公式推导
102 |
103 | ### 2.1算法的收敛性证明Novikoff
104 |
105 | 
106 | 
107 | 
108 |
109 | ## 3第三章数学公式推导
110 |
111 | ## 4第四章数学公式推导
112 |
113 | ### 4.1用极大似然法估计朴素贝叶斯参数
114 | 
115 |
116 | ### 4.2用贝叶斯估计法朴素贝叶斯参数
117 | 
118 | 
119 |
120 | ### 4.3证明后验概率最大化即期望风险最小化
121 | 
122 | 
123 |
124 | ## 5第五章数学公式推导
125 |
126 | ## 6第六章数学公式推导
127 |
128 | ### 6.1最大熵模型的数学推导
129 | 
130 | 
131 |
132 | ### 6.2拉格朗日对偶性问题的数学推导
133 | 
134 | 
135 | 
136 |
137 | ### 6.3改进的迭代尺度法数学推导
138 | 
139 | 
140 | 
141 |
142 | ## 7第七章数学公式推导
143 |
144 | ### 7.1软间隔最大化对偶问题
145 | 
146 | 
147 | 
148 |
149 | ### 7.2证明最大间隔分离超平面存在唯一性
150 | 
151 | 
152 | 
153 | 
154 |
155 | ## 8第八章数学公式推导
156 |
157 | ### 8.1证明AdaBoost是前向分步加法算法的特例
158 | 
159 | 
160 | 
161 |
162 | ### 8.2 证明AdaBoost的训练误差界
163 | 
164 | 
165 | 
166 | 
167 |
168 | ## 9第九章数学公式推导
169 | ### 9.1 EM算法的导出
170 | 
171 | 
172 |
173 | ### 9.2 用EM算法估计高斯模混合模型
174 | 
175 | 
176 | 
177 |
178 | ## 10.第十章数学公式推导
179 |
180 | ### 10.1 前向算法两个公式的证明
181 | 
182 |
183 | ### 10.2 维特比算法推导
184 | 
185 |
186 | ## 11.第十一章数学公式推导
187 | ### 11.1 条件随机场的矩阵形式推导
188 | 
189 | 
190 | ### 11.2 牛顿法和拟牛顿法的推导
191 | 
192 | 
193 | 
194 |
195 |
196 |
197 |
198 |
199 |
200 |
201 |
202 |
203 |
204 |
205 |
206 |
207 |
--------------------------------------------------------------------------------
/codes/MyAdaBoost.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 |
8 | #根据文件路径读取Iris数据集数据
9 | # #return type: np.array
10 | def processData(filePath):
11 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
12 | X = []
13 | y = []
14 | #默认读取csv的头部
15 | df = pd.read_csv(filePath)
16 | #利用数据集合的第一个维度特征分类
17 | #遍历pandas中df的每一行
18 | for index, row in df.iterrows():
19 | if row["Species"] == "setosa":
20 | y.append(1)
21 | else:
22 | y.append(-1)
23 | #进行二值化处理,使得样本数据为0/1
24 | X.append([int(float(row["Sepal.Length"])>5.0),int(float(row["Sepal.Width"])>3.5),int(float(row["Petal.Length"])>1.4)])
25 |
26 | return np.array(X), np.array(y)
27 |
28 | def createOneLayerBoostTree(X_train, y_train, D):
29 | #获得样本数目及特征数量
30 | m, n = np.shape(X_train)
31 | #该字典代表了一层提升树,用于存放当前层提升树的参数
32 | oneLowerNumayerBoostTree = {}
33 | #初始化分类误差率为1,即100%
34 | oneLowerNumayerBoostTree['errorRate'] = 1
35 | #对每一个特征进行遍历,寻找用于划分的最合适的特征
36 | for i in range(n):
37 | #因为特征已经经过二值化,只能为0和1,因此分切点为-0.5, 0.5, 1.5
38 | for division in [-0.5, 0.5, 1.5]:
39 | #规则为如下所示:
40 | #LowerNumowerNumSetOne:LowerNumow is one:小于某值得是1
41 | #UpperNumSetOne:UpperNumigh is one:大于某值得是1
42 | for rule in ['LowerNumSetOne', 'UpperNumSetOne']:
43 | #按照第i个特征,以值division进行切割,进行当前设置得到的预测和分类错误率
44 | Gx, e = calculate_e_Gx(X_train, y_train, i, division, rule, D)
45 | #如果分类错误率e小于当前最小的e,那么将它作为最小的分类错误率保存
46 | if e < oneLowerNumayerBoostTree['errorRate']:
47 | # 分类错误率
48 | oneLowerNumayerBoostTree['errorRate'] = e
49 | # 最优划分点
50 | oneLowerNumayerBoostTree['division'] = division
51 | # 划分规则
52 | oneLowerNumayerBoostTree['rule'] = rule
53 | # 预测结果
54 | oneLowerNumayerBoostTree['Gx'] = Gx
55 | # 特征索引
56 | oneLowerNumayerBoostTree['feature'] = i
57 | return oneLowerNumayerBoostTree
58 |
59 | def createBoostTree(X_train, y_train, treeNum):
60 | #将数据和标签转化为数组形式
61 | X_train = np.array(X_train)
62 | y_train = np.array(y_train)
63 | #获得训练集数量以及特征个数
64 | m, n = np.shape(X_train)
65 | #初始化D为1/N
66 | D = [1 / m] * m
67 | #初始化提升树列表,每个位置为一层
68 | tree = []
69 | #循环创建提升树
70 | for i in range(treeNum):
71 | #得到当前层的提升树
72 | currentTree = createOneLayerBoostTree(X_train, y_train, D)
73 | # 这边由于用的是Iris数据集,数据量过小,所以currentTree['errorRate']即误差分类率可能为0
74 | # 因此在最后加上了0.0001来避免除数为0的错误
75 | alpha = 1/2 * np.log((1 - currentTree['errorRate']) / (currentTree['errorRate']+0.0001))
76 | #获得当前层的预测结果,用于下一步更新D
77 | Gx = currentTree['Gx']
78 | D = np.multiply(D, np.exp(-1 * alpha * np.multiply(y_train, Gx))) / sum(D)
79 | currentTree['alpha'] = alpha
80 | tree.append(currentTree)
81 | return tree
82 |
83 | # 前提:数据进行二值处理
84 | def predict(x, division, rule, feature):
85 | if rule == 'LowerNumSetOne':
86 | LowerNum = 1
87 | UpperNum = -1
88 | else:
89 | LowerNum = -1
90 | UpperNum = 1
91 |
92 | if x[feature] < division:
93 | return LowerNum
94 | else:
95 | return UpperNum
96 |
97 | def test(X_test, y_test, tree):
98 | rightCount = 0
99 | for i in range(len(X_test)):
100 | result = 0
101 | for currentTree in tree:
102 | division = currentTree['division']
103 | rule = currentTree['rule']
104 | feature = currentTree['feature']
105 | alpha = currentTree['alpha']
106 | result += alpha * predict(X_test[i], division, rule, feature)
107 | #预测结果取sign值,如果大于0 sign为1,反之为0
108 | if np.sign(result) == y_test[i]:
109 | rightCount += 1
110 | #返回准确率
111 | return rightCount / len(X_test)
112 |
113 | #计算分类错误率
114 | def calculate_e_Gx(X_train, y_train, n, division, rule, D):
115 | #初始化分类误差率为0
116 | e = 0
117 | x = X_train[:, n]
118 | y = y_train
119 | train = []
120 | if rule == 'LowerNumSetOne':
121 | LowerNum = 1
122 | UpperNum = -1
123 | else:
124 | LowerNum = -1
125 | UpperNum = 1
126 |
127 | #遍历样本的特征
128 | for i in range(X_train.shape[0]):
129 | if x[i] < division:
130 | #如果小于划分点,则预测为LowerNum
131 | #如果设置小于division为1,那么LowerNum就是1,
132 | #如果设置小于division为-1,LowerNum就是-1
133 | train.append(LowerNum)
134 | #如果预测错误,分类错误率要加上该分错的样本的权值
135 | if y[i] != LowerNum:
136 | e += D[i]
137 | elif x[i] >= division:
138 | train.append(UpperNum)
139 | if y[i] != UpperNum:
140 | e += D[i]
141 | return np.array(train), e
142 |
143 | if __name__ == '__main__':
144 | X, y = processData('iris.csv')
145 |
146 | X_train = X[0:149:50]
147 | y_train = y[0:149:50]
148 |
149 | # 自己在数据集后面加上了干扰的实例
150 | X_test = X[0:150:1]
151 | y_test = y[0:150:1]
152 |
153 | #创建提升树,最后一个参数代表的是公式的m,即多少个模型
154 | tree = createBoostTree(X_train, y_train, 5)
155 |
156 | #准确率测试
157 | rightRate = test(X_test, y_test, tree)
158 | print('分类正确率为:',rightRate * 100, '%')
--------------------------------------------------------------------------------
/codes/MyDecisionTree.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 | import math
8 | from collections import namedtuple
9 |
10 | # 定义节点
11 | # 孩子节点、分类特征的取值、节点内容、节点分类特征、标签
12 | class Node(namedtuple("Node","children type content feature label")):
13 | def __repr__(self):
14 | return str(tuple(self))
15 |
16 | #决策树
17 | class DecisionTree():
18 | def __init__(self,method="info_gain_ratio"):
19 | self.tree=None
20 | self.method=method
21 |
22 | #计算经验熵
23 | def _experienc_entropy(self,X):
24 |
25 | # 统计每个取值的出现频率
26 | x_types_prob=X.iloc[:,0].value_counts()/X.shape[0]
27 | # 计算经验熵
28 | x_experienc_entropy=sum((-p*math.log(p,2) for p in x_types_prob))
29 | return x_experienc_entropy
30 |
31 | #计算条件熵
32 | def _conditinal_entropy(self,X_train,y_train,feature):
33 | # feature特征下每个特征取值数量统计
34 | x_types_count= X_train[feature].value_counts()
35 | # 每个特征取值频率计算
36 | x_types_prob = x_types_count / X_train.shape[0]
37 | # 每个特征取值下类别y的经验熵
38 | x_experienc_entropy=[self._experienc_entropy(y_train[(X_train[feature]==i).values]) for i in x_types_count.index]
39 | # 特征feature对数据集的经验条件熵
40 | x_conditinal_entropy=(x_types_prob.mul(x_experienc_entropy)).sum()
41 | return x_conditinal_entropy
42 |
43 | #计算信息增益
44 | def _information_gain(self,X_train,y_train,feature):
45 | return self._experienc_entropy(y_train)-self._conditinal_entropy(X_train,y_train,feature)
46 |
47 | #计算信息增益比
48 | def _information_gain_ratio(self,X_train,y_train,features,feature):
49 | index=features.index(feature)
50 | return self._information_gain(X_train,y_train,feature)/self._experienc_entropy(X_train.iloc[:,index:index+1])
51 |
52 | #选择分类特征
53 | def _choose_feature(self,X_train,y_train,features):
54 | if self.method=="info_gain_ratio":
55 | info=[self._information_gain_ratio(X_train,y_train,features,feature) for feature in features]
56 | elif self.method=="info_gain":
57 | info=[self._information_gain(X_train,y_train,feature) for feature in features]
58 | else:
59 | raise TypeError
60 | optimal_feature=features[np.argmax(info)]
61 | return optimal_feature
62 |
63 | #递归构造决策树
64 | def _built_tree(self,X_train,y_train,features,type=None):
65 | # 只有一个节点或已经完全分类,则决策树停止继续分叉
66 | if len(features)==1 or len(np.unique(y_train))==1:
67 | label=list(y_train[0].value_counts().index)[0]
68 | return Node(children=None,type=type,content=(X_train,y_train),feature=None,label=label)
69 | else:
70 | # 选择分类特征值
71 | feature=self._choose_feature(X_train,y_train,features)
72 | features.remove(feature)
73 | # 构建节点,同时递归创建孩子节点
74 | features_iter=np.unique(X_train[feature])
75 | children=[]
76 | for item in features_iter:
77 | X_item=X_train[(X_train[feature]==item).values]
78 | y_item=y_train[(X_train[feature]==item).values]
79 | children.append(self._built_tree(X_item,y_item,features,type=item))
80 | return Node(children=children,type=type,content=None,feature=feature,label=None)
81 |
82 | #进行剪枝
83 | def _prune(self):
84 | pass
85 |
86 | def fit(self,X_train,y_train,features):
87 | self.tree=self._built_tree(X_train,y_train,features)
88 |
89 |
90 | def _search(self,X_new):
91 | tree=self.tree
92 | # 若还有孩子节点,则继续向下搜索,否则搜索停止,在当前节点获取标签
93 | while tree.children:
94 | for child in tree.children:
95 | if X_new[tree.feature].loc[0]==child.type:
96 | tree=child
97 | break
98 | return tree.label
99 |
100 | def predict(self,X_new):
101 | return self._search(X_new)
102 |
103 | def processData(filePath):
104 | print('开始读取数据')
105 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
106 | X = []
107 | y = []
108 | #默认读取csv的头部
109 | df = pd.read_csv(filePath)
110 | #利用数据集合的第一个维度特征分类
111 | #遍历pandas中df的每一行
112 | for index, row in df.iterrows():
113 | if row["Species"] == "setosa" :
114 | y.append("是setosa花")
115 | else:
116 | y.append("不是setosa花")
117 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"]),float(row["Petal.Width"])])
118 | return np.array(X), np.array(y)
119 |
120 | def main():
121 | # 训练数据集
122 | features = ["萼片长", "萼片宽", "花瓣长", "花瓣宽"]
123 |
124 | X , y = processData('iris.csv')
125 |
126 | X_train = X[0:149:4]
127 | y_train = y[0:149:4]
128 |
129 | X_test = X[0:149:10]
130 | y_test = y[0:149:10]
131 |
132 |
133 | X_train = pd.DataFrame(X_train, columns=features)
134 | y_train = pd.DataFrame(y_train)
135 | # 训练,使用信息增益
136 | clf=DecisionTree(method="info_gain")
137 | clf.fit(X_train,y_train,features.copy())
138 | print('训练结束')
139 |
140 | X_new= pd.DataFrame(X_test, columns=features)
141 | y_predict=clf.predict(X_new)
142 | print(y_predict)
143 |
144 | if __name__=="__main__":
145 | main()
--------------------------------------------------------------------------------
/codes/MyEM.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 |
6 | import numpy as np
7 | import random
8 | import math
9 |
10 | # 通过服从高斯分布的随机函数来伪造数据集
11 | # mean0: 高斯0的均值、
12 | # sigma0: 高斯0的方差
13 | # alpha0: 高斯0的系数
14 |
15 | # mean1: 高斯1的均值
16 | # sigma1: 高斯1的方差
17 | # alpha1: 高斯1的系数
18 | # 混合了两个高斯分布的数据
19 |
20 | def processData(mean0, sigma0, mean1, sigma1, alpha0, alpha1):
21 | #定义数据集长度为1000
22 | length = 1000
23 |
24 | #初始化高斯分布,数据长度为length * alpha
25 | data0 = np.random.normal(mean0, sigma0, int(length * alpha0))
26 | data1 = np.random.normal(mean1, sigma1, int(length * alpha1))
27 |
28 | trainData = []
29 | trainData.extend(data0)
30 | trainData.extend(data1)
31 |
32 | #对总的数据集进行打乱
33 | random.shuffle(trainData)
34 | return trainData
35 |
36 | # 根据高斯密度函数计算值
37 | # 返回整个可观测数据集的高斯分布密度(向量形式)
38 | def calculateGauss(trainDataArr, mean, sigmod):
39 | result = (1 / (math.sqrt(2 * math.pi) * sigmod**2)) * np.exp(-1 * (trainDataArr - mean) * (trainDataArr - mean) / (2 * sigmod**2))
40 | return result
41 |
42 |
43 | def E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1):
44 | gamma0 = alpha0 * calculateGauss(trainDataArr, mean0, sigmod0)
45 | gamma1 = alpha1 * calculateGauss(trainDataArr, mean1, sigmod1)
46 |
47 | sum = gamma0 + gamma1
48 | gamma0 = gamma0 / sum
49 | gamma1 = gamma1 / sum
50 | return gamma0, gamma1
51 |
52 | def M(meano, mean1, gamma0, gamma1, trainDataArr):
53 | mean0_new = np.dot(gamma0, trainDataArr) / np.sum(gamma0)
54 | mean1_new = np.dot(gamma1, trainDataArr) / np.sum(gamma1)
55 |
56 | sigmod0_new = math.sqrt(np.dot(gamma0, (trainDataArr - meano)**2) / np.sum(gamma0))
57 | sigmod1_new = math.sqrt(np.dot(gamma1, (trainDataArr - mean1)**2) / np.sum(gamma1))
58 |
59 | alpha0_new = np.sum(gamma0) / len(gamma0)
60 | alpha1_new = np.sum(gamma1) / len(gamma1)
61 |
62 | return mean0_new, mean1_new, sigmod0_new, sigmod1_new, alpha0_new, alpha1_new
63 |
64 |
65 | def EM(trainDataList, iter = 500):
66 | trainDataArr = np.array(trainDataList)
67 |
68 | alpha0 = 0.5
69 | mean0 = 0
70 | sigmod0 = 1
71 | alpha1 = 0.5
72 | mean1 = 1
73 | sigmod1 = 1
74 |
75 | count = 0
76 | while (count < iter):
77 | count = count+1
78 | # E步
79 | gamma0, gamma1 = E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1)
80 | # M步
81 | mean0, mean1, sigmod0, sigmod1, alpha0, alpha1 = M(mean0, mean1, gamma0, gamma1, trainDataArr)
82 | return alpha0, mean0, sigmod0, alpha1, mean1, sigmod1
83 |
84 | if __name__ == '__main__':
85 | alpha0 = 0.1
86 | mean0 = -4.0
87 | sigmod0 = 0.6
88 |
89 | alpha1 = 0.9
90 | mean1 = 2.2
91 | sigmod1 = 0.1
92 |
93 | #初始化数据集
94 | trainDataList = processData(mean0, sigmod0, mean1, sigmod1, alpha0, alpha1)
95 |
96 | #开始EM算法,进行参数估计
97 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 = EM(trainDataList)
98 |
99 | print('用EM计算之后的数据为:')
100 | print('alpha0:%.1f, mean0:%.1f, sigmod0:%.1f, alpha1:%.1f, mean1:%.1f, sigmod1:%.1f' % (
101 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1
102 | ))
103 |
104 |
105 |
--------------------------------------------------------------------------------
/codes/MyHMM.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 |
7 | # 依据训练文本统计PI、A、B
8 | def trainHMM(fileName):
9 | # B:词语的开头
10 | # M:一个词语的中间词
11 | # E:一个词语的结果
12 | # S:非词语,单个词
13 | statuDict = {'B':0, 'M':1, 'E':2, 'S':3}
14 |
15 | # 每个字只有四种状态,所以下方的各类初始化中大小的参数均为4
16 | PI = np.zeros(4)
17 | # 初始化状态转移矩阵A,涉及到四种状态各自到四种状态的转移,因为大小为4x4
18 | A = np.zeros((4, 4))
19 | # 初始化观测概率矩阵,分别为四种状态到每个字的发射概率
20 | B = np.zeros((4, 65536))
21 | fr = open(fileName, encoding='utf-8')
22 |
23 | for line in fr.readlines():
24 | curLine = line.strip().split()
25 | wordLabel = []
26 | #对每一个单词进行遍历
27 | for i in range(len(curLine)):
28 | #如果长度为1,则直接将该字标记为S,即单个词
29 | if len(curLine[i]) == 1:
30 | label = 'S'
31 | else:
32 | label = 'B' + 'M' * (len(curLine[i]) - 2) + 'E'
33 | #如果是单行开头第一个字,PI中对应位置加1,
34 | if i == 0: PI[statuDict[label[0]]] += 1
35 | for j in range(len(label)):
36 | B[statuDict[label[j]]][ord(curLine[i][j])] += 1
37 | wordLabel.extend(label)
38 | for i in range(1, len(wordLabel)):
39 | A[statuDict[wordLabel[i - 1]]][statuDict[wordLabel[i]]] += 1
40 |
41 | sum = np.sum(PI)
42 |
43 | for i in range(len(PI)):
44 | if PI[i] == 0: PI[i] = -3.14e+100
45 | else: PI[i] = np.log(PI[i] / sum)
46 |
47 | for i in range(len(A)):
48 | sum = np.sum(A[i])
49 | for j in range(len(A[i])):
50 | if A[i][j] == 0: A[i][j] = -3.14e+100
51 | else: A[i][j] = np.log(A[i][j] / sum)
52 |
53 | for i in range(len(B)):
54 | sum = np.sum(len(B[i]))
55 | for j in range(len(B[i])):
56 | if B[i][j] == 0: B[i][j] = -3.14e+100
57 | else:B[i][j] = np.log(B[i][j] / sum)
58 |
59 | return PI, A, B
60 |
61 | def processTrainData(fileName):
62 | textData = []
63 | fr = open(fileName, encoding='utf-8')
64 | for line in fr.readlines():
65 | #读到的每行最后都有一个\n,使用strip将最后的回车符去掉
66 | line = line.strip()
67 | textData.append(line)
68 |
69 | return textData
70 |
71 | def participleTestData(textData, PI, A, B):
72 | retArtical = []
73 | for line in textData:
74 | delta = [[0 for i in range(4)] for i in range(len(line))]
75 | for i in range(4):
76 | delta[0][i] = PI[i] + B[i][ord(line[0])]
77 | psi = [[0 for i in range(4)] for i in range(len(line))]
78 |
79 | for t in range(1, len(line)):
80 | for i in range(4):
81 | tmpDelta = [0] * 4
82 | for j in range(4):
83 | tmpDelta[j] = delta[t - 1][j] + A[j][i]
84 | maxDelta = max(tmpDelta)
85 | maxDeltaIndex = tmpDelta.index(maxDelta)
86 | delta[t][i] = maxDelta + B[i][ord(line[t])]
87 | psi[t][i] = maxDeltaIndex
88 |
89 | sequence = []
90 | i_opt = delta[len(line) - 1].index(max(delta[len(line) - 1]))
91 | sequence.append(i_opt)
92 |
93 | for t in range(len(line) - 1, 0, -1):
94 | i_opt = psi[t][i_opt]
95 | sequence.append(i_opt)
96 |
97 | sequence.reverse()
98 | curLine = ''
99 | for i in range(len(line)):
100 | curLine += line[i]
101 | if (sequence[i] == 3 or sequence[i] == 2) and i != (len(line) - 1):
102 | curLine += '|'
103 | retArtical.append(curLine)
104 | return retArtical
105 |
106 | if __name__ == '__main__':
107 |
108 | # 依据人民日报数据集计算HMM参数:PI、A、B
109 | PI, A, B = trainHMM('MyHMMTrainData.txt')
110 |
111 | # 读取测试文章
112 | textData = processTrainData('MyHMMTestData.txt')
113 |
114 | # 打印原文
115 | for line in textData:
116 | print(line)
117 |
118 | # 分词
119 | partiArtical = participleTestData(textData, PI, A, B)
120 |
121 | # 打印结果
122 | print('分词结果:')
123 | for line in partiArtical:
124 | print(line)
125 |
--------------------------------------------------------------------------------
/codes/MyHMMTestData.txt:
--------------------------------------------------------------------------------
1 | 我本科就读于北京交通大学软件学院,专业是软件工程,本科做的是开发工作,工程性质较为浓厚。2019年,我保研至北京航空航天大学,我的个性不适合做纯理论研究,因此希望研究生毕业以后从事算法工程师的岗位,研究生期间我需要认真学习算法相关知识,但同时也不能落下工程实现能力,仍然需要较强的开发能力与项目落地能力,尤其是基础算法与数据结构,需要日常进行刷题比如:leetcode。
--------------------------------------------------------------------------------
/codes/MyKNN.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 | from collections import Counter
8 | from concurrent import futures
9 | import heapq
10 |
11 | class KNN:
12 | def __init__(self,X_train,y_train,k=3):
13 | # 所需参数初始化
14 | self.k=k
15 | self.X_train=X_train
16 | self.y_train=y_train
17 |
18 | def predict_single(self,X_test):
19 | # 计算与前k个样本点欧氏距离,距离取负值是把原问题转化为取前k个最大的距离
20 | dist_list=[(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i)
21 | for i in range(self.k)]
22 |
23 | # 利用前k个距离构建堆
24 | heapq.heapify(dist_list)
25 |
26 | # 遍历计算与剩下样本点的欧式距离
27 | for i in range(self.k,self.X_train.shape[0]):
28 | dist_i=(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i)
29 | #进行下堆操作
30 | if dist_i[0]>dist_list[0][0]:
31 | heapq.heappushpop(dist_list,dist_i)
32 | # 若dist_i 比 dis_list的最小值小,堆保持不变,继续遍历
33 | else:
34 | continue
35 | y_list=[dist_list[i][1] for i in range(self.k)]
36 | #[-1,1,1,-1...]
37 | # 对上述k个点的分类进行统计
38 | y_count=Counter(y_list).most_common()
39 | #{1:n,-1:m}
40 | return y_count[0][0]
41 |
42 | # 用多线程提高效率
43 | def predict_many(self,X_test):
44 | # 导入多线程
45 | with futures.ProcessPoolExecutor(max_workers=10) as executor:
46 | # 建立多线程任务
47 | tasks=[executor.submit(self.predict_single,X_test[i]) for i in range(X_test.shape[0])]
48 | # 驱动多线程运行
49 | done_iter=futures.as_completed(tasks)
50 | # 提取结果
51 | res=[future.result() for future in done_iter]
52 | return res
53 |
54 | def cal_right_rate(self,res,y_test):
55 | right_count = 0
56 | wrong_count = 0
57 | for i in range(len(res)):
58 | if res[i] == y_test[i]:
59 | right_count += 1
60 | else:
61 | wrong_count += 1
62 | return right_count / (right_count+wrong_count)
63 |
64 | def processData(filePath):
65 | print('开始读取数据')
66 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
67 | X = []
68 | y = []
69 | #默认读取csv的头部
70 | df = pd.read_csv(filePath)
71 | #利用数据集合的第一个维度特征分类
72 | #遍历pandas中df的每一行
73 | for index, row in df.iterrows():
74 | if(row["Sepal.Length"]>=5.5) :
75 | y.append(1)
76 | else:
77 | y.append(-1)
78 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])])
79 | return np.array(X), np.array(y)
80 |
81 |
82 | def main():
83 | #获取数据
84 | X, y = processData('iris.csv')
85 | X_train = X[0:149:4]
86 | y_train = y[0:149:4]
87 |
88 | X_test = X[0:149:10]
89 | y_test = y[0:149:10]
90 |
91 | # 不同的k对分类结果的影响
92 | for k in range(1,6,2):
93 | #构建KNN实例
94 | clf=KNN(X_train,y_train,k=k)
95 | #对测试数据进行分类预测
96 | y_predict=clf.predict_many(X_test)
97 | print("k={},被分类为:{}".format(k,y_predict))
98 | print("正确率为: ", clf.cal_right_rate(y_predict,y_test))
99 |
100 | if __name__=="__main__":
101 | main()
--------------------------------------------------------------------------------
/codes/MyLogisticRegression.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import time
7 | import pandas as pd
8 |
9 | #使用随机梯度下降
10 | class LogisticRegression:
11 | def __init__(self,learn_rate=0.1,max_iter=10000,tol=1e-3):
12 | # 学习速率
13 | self.learn_rate=learn_rate
14 | # 迭代次数
15 | self.max_iter=max_iter
16 | # 迭代停止阈值
17 | self.tol=tol
18 | # 权重
19 | self.w=None
20 |
21 | def preprocessing(self,X):
22 | row=X.shape[0]
23 | #在末尾加上一列,数值为1
24 | y=np.ones(row).reshape(row, 1)
25 | X_prepro =np.hstack((X,y))
26 | return X_prepro
27 |
28 | def sigmod(self,x):
29 | return 1/(1+np.exp(-x))
30 |
31 | def train(self,X_train,y_train):
32 | X=self.preprocessing(X_train)
33 | y=y_train.T
34 | #初始化权重w
35 | self.w=np.array([[0]*X.shape[1]],dtype=np.float)
36 | i=0
37 | k=0
38 | for loop in range(self.max_iter):
39 | # 计算梯度
40 | z=np.dot(X[i],self.w.T)
41 | grad=X[i]*(y[i]-self.sigmod(z))
42 | # 利用梯度的绝对值作为迭代中止的条件
43 | if (np.abs(grad)<=self.tol).all():
44 | break
45 | else:
46 | # 更新权重w 梯度上升——求极大值
47 | self.w+=self.learn_rate*grad
48 | k+=1
49 | i=(i+1)%X.shape[0]
50 | print("迭代次数:{}次".format(k))
51 | print("最终梯度:{}".format(grad))
52 | print("最终权重:{}".format(self.w[0]))
53 |
54 | def predict(self,x):
55 | p=self.sigmod(np.dot(self.preprocessing(x),self.w.T))
56 | print("Y=1的概率被估计为:{:.2%}".format(p[0][0]))
57 | p[np.where(p>0.5)]=1
58 | p[np.where(p<0.5)]=0
59 | return p
60 |
61 | def cal_right_rate(self,X,y):
62 | y_c=self.predict(X)
63 | right_count = 0
64 | wrong_count = 0
65 | for i in range(len(y)):
66 | if y_c[i] == y[i]:
67 | right_count += 1
68 | else:
69 | wrong_count += 1
70 | return right_count / (right_count + wrong_count)
71 | error_rate=np.sum(np.abs(y_c-y.T))/y_c.shape[0]
72 | # return 1-error_rate
73 |
74 | #根据文件路径读取Iris数据集数据
75 | #return type: np.array
76 | def processData(filePath):
77 | print('开始读取数据')
78 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
79 | X = []
80 | y = []
81 | #默认读取csv的头部
82 | df = pd.read_csv(filePath)
83 | #利用数据集合的第一个维度特征分类
84 | #遍历pandas中df的每一行
85 | for index, row in df.iterrows():
86 | if row["Species"] == "setosa" :
87 | y.append(1)
88 | else:
89 | y.append(0)
90 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])])
91 | return np.array(X), np.array(y)
92 |
93 | def main():
94 | star=time.time()
95 | # 训练数据集
96 | X, y = processData('iris.csv')
97 | X_train = X[0:149:30]
98 | y_train = y[0:149:30]
99 |
100 | #自己在数据集后面加上了干扰的实例
101 | X_test = X[0:151:1]
102 | y_test = y[0:151:1]
103 |
104 | # 构建实例,进行训练
105 | clf=LogisticRegression()
106 | clf.train(X_train,y_train)
107 |
108 | # 预测新数据
109 | y_predict=clf.predict(X_test)
110 | print("{}被分类为:{}".format(X_test[0],y_predict[0]))
111 |
112 | # 利用已有数据对训练模型进行评价
113 | correct_rate=clf.cal_right_rate(X_test,y_test)
114 | print("测试一共有{}组实例,正确率:{:.5%}".format(X_test.shape[0],correct_rate))
115 | end=time.time()
116 | print("用时:{:.5f}s".format(end-star))
117 |
118 | if __name__=="__main__":
119 | main()
--------------------------------------------------------------------------------
/codes/MyMaxEnt.py:
--------------------------------------------------------------------------------
1 | import time
2 | import numpy as np
3 | import pandas as pd
4 | from collections import defaultdict
5 |
6 |
7 | #根据文件路径读取Iris数据集数据
8 | #return type: list
9 | def processData(filePath):
10 | print('开始读取数据')
11 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
12 | X = []
13 | y = []
14 | #默认读取csv的头部
15 | df = pd.read_csv(filePath)
16 | #利用数据集合的第一个维度特征分类
17 | #遍历pandas中df的每一行
18 | for index, row in df.iterrows():
19 | if(row["Sepal.Length"]>=5.5) :
20 | y.append(1)
21 | else:
22 | y.append(0)
23 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])])
24 | return X, y
25 |
26 | #最大熵类
27 | class maxEnt:
28 | def __init__(self, trainDataList, trainLabelList, testDataList, testLabelList):
29 |
30 | # 训练数据集
31 | self.trainDataList = trainDataList
32 | # 训练标签集
33 | self.trainLabelList = trainLabelList
34 | # 测试数据集
35 | self.testDataList = testDataList
36 | # 测试标签集
37 | self.testLabelList = testLabelList
38 | # 特征数量
39 | self.featureNum = len(trainDataList[0])
40 | # 总训练集长度
41 | self.N = len(trainDataList)
42 | # 训练集中(xi,y)对数量
43 | self.n = 0
44 | # 训练集中(xi,y)对数量
45 | self.M = 10000
46 | # 所有(x, y)对出现的次数
47 | self.fixy = self.calc_fixy()
48 | # Pw(y|x)中的w
49 | self.w = [0] * self.n
50 | # (x, y)->id和id->(x, y)的搜索字典
51 | self.xy2idDict, self.id2xyDict = self.createSearchDict()
52 | # Ep_xy期望值
53 | self.Ep_xy = self.calcEp_xy()
54 |
55 |
56 | # 计算特征函数f(x, y)
57 | def calcEpxy(self):
58 | # 初始化期望存放列表,对于每一个xy对都有一个期望
59 | Epxy = [0] * self.n
60 | # 对于每一个样本进行遍历
61 | for i in range(self.N):
62 | # 初始化公式中的P(y|x)列表
63 | Pwxy = [0] * 2
64 | # 计算P(y = 0 } X)
65 | # 注:程序中X表示是一个样本的全部特征,x表示单个特征,这里是全部特征的一个样本
66 | Pwxy[0] = self.calcPwy_x(self.trainDataList[i], 0)
67 | # 计算P(y = 1 } X)
68 | Pwxy[1] = self.calcPwy_x(self.trainDataList[i], 1)
69 |
70 | for feature in range(self.featureNum):
71 | for y in range(2):
72 | if (self.trainDataList[i][feature], y) in self.fixy[feature]:
73 | id = self.xy2idDict[feature][(self.trainDataList[i][feature], y)]
74 | Epxy[id] += (1 / self.N) * Pwxy[y]
75 | return Epxy
76 |
77 | # 计算特征函数f(x, y)
78 | # :return: 计算得到的Ep_xy
79 | def calcEp_xy(self):
80 |
81 | # 初始化Ep_xy列表,长度为n
82 | Ep_xy = [0] * self.n
83 |
84 | # 遍历每一个特征
85 | for feature in range(self.featureNum):
86 | # 遍历每个特征中的(x, y)对
87 | for (x, y) in self.fixy[feature]:
88 | # 获得其id
89 | id = self.xy2idDict[feature][(x, y)]
90 | # 将计算得到的Ep_xy写入对应的位置中
91 | # fixy中存放所有对在训练集中出现过的次数,处于训练集总长度N就是概率了
92 | Ep_xy[id] = self.fixy[feature][(x, y)] / self.N
93 |
94 | # 返回期望
95 | return Ep_xy
96 |
97 |
98 | # 创建查询字典
99 | # xy2idDict:通过(x, y)对找到其id, 所有出现过的xy对都有一个id
100 | # id2xyDict:通过id找到对应的(x, y)对
101 | def createSearchDict(self):
102 | # 设置xy搜多id字典
103 | # 不同特征的xy存入不同特征内的字典
104 | xy2idDict = [{} for i in range(self.featureNum)]
105 | # 初始化id到xy对的字典。因为id与(x,y)的指向是唯一的,所以可以使用一个字典
106 | id2xyDict = {}
107 |
108 | # 设置缩影,其实就是最后的id
109 | index = 0
110 | # 对特征进行遍历
111 | for feature in range(self.featureNum):
112 | # 对出现过的每一个(x, y)对进行遍历
113 | # fixy:内部存放特征数目个字典,对于遍历的每一个特征,单独读取对应字典内的(x, y)对
114 | for (x, y) in self.fixy[feature]:
115 | # 将该(x, y)对存入字典中,要注意存入时通过[feature]指定了存入哪个特征内部的字典
116 | # 同时将index作为该对的id号
117 | xy2idDict[feature][(x, y)] = index
118 | # 同时在id->xy字典中写入id号,val为(x, y)对
119 | id2xyDict[index] = (x, y)
120 | # id加一
121 | index += 1
122 |
123 | # 返回创建的两个字典
124 | return xy2idDict, id2xyDict
125 |
126 | # 计算(x, y)在训练集中出现过的次数
127 | def calc_fixy(self):
128 | # 建立特征数目个字典,属于不同特征的(x, y)对存入不同的字典中,保证不被混淆
129 | fixyDict = [defaultdict(int) for i in range(self.featureNum)]
130 | # 遍历训练集中所有样本
131 | for i in range(len(self.trainDataList)):
132 | # 遍历样本中所有特征
133 | for j in range(self.featureNum):
134 | # 将出现过的(x, y)对放入字典中并计数值加1
135 | fixyDict[j][(self.trainDataList[i][j], self.trainLabelList[i])] += 1
136 | # 对整个大字典进行计数,判断去重后还有多少(x, y)对,写入n
137 | for i in fixyDict:
138 | self.n += len(i)
139 | # 返回大字典
140 | return fixyDict
141 |
142 | # 计算得到的Pw(Y | X)
143 | def calcPwy_x(self, X, y):
144 | # 分子
145 | numerator = 0
146 | # 分母
147 | Z = 0
148 | # 对每个特征进行遍历
149 | for i in range(self.featureNum):
150 | # 如果该(xi,y)对在训练集中出现过
151 | if (X[i], y) in self.xy2idDict[i]:
152 | # 在xy->id字典中指定当前特征i,以及(x, y)对:(X[i], y),读取其id
153 | index = self.xy2idDict[i][(X[i], y)]
154 | # 分子是wi和fi(x,y)的连乘再求和,最后指数
155 | # 由于当(x, y)存在时fi(x,y)为1,因为xy对肯定存在,所以直接就是1
156 | # 对于分子来说,就是n个wi累加,最后再指数就可以了
157 | # 因为有n个w,所以通过id将w与xy绑定,前文的两个搜索字典中的id就是用在这里
158 | numerator += self.w[index]
159 | # 同时计算其他一种标签y时候的分子,下面的z并不是全部的分母,再加上上式的分子以后
160 | # 才是完整的分母,即z = z + numerator
161 | if (X[i], 1 - y) in self.xy2idDict[i]:
162 | # 原理与上式相同
163 | index = self.xy2idDict[i][(X[i], 1 - y)]
164 | Z += self.w[index]
165 | # 计算分子的指数
166 | numerator = np.exp(numerator)
167 | # 计算分母的z
168 | Z = np.exp(Z) + numerator
169 | # 返回Pw(y|x)
170 | return numerator / Z
171 |
172 | def maxEntropyTrain(self, iter=500):
173 | # 设置迭代次数寻找最优解
174 | for i in range(iter):
175 | # 单次迭代起始时间点
176 | iterStart = time.time()
177 |
178 | # 计算“6.2.3 最大熵模型的学习”中的第二个期望(83页最上方哪个)
179 | Epxy = self.calcEpxy()
180 |
181 | # 使用的是IIS,所以设置sigma列表
182 | sigmaList = [0] * self.n
183 | # 对于所有的n进行一次遍历
184 | for j in range(self.n):
185 | # 依据“6.3.1 改进的迭代尺度法” 式6.34计算
186 | sigmaList[j] = (1 / self.M) * np.log(self.Ep_xy[j] / Epxy[j])
187 |
188 | # 按照算法6.1步骤二中的(b)更新w
189 | self.w = [self.w[i] + sigmaList[i] for i in range(self.n)]
190 |
191 | # 单次迭代结束
192 | iterEnd = time.time()
193 |
194 | # 预测标签
195 | def predict(self, X):
196 | # 因为y只有0和1,所有建立两个长度的概率列表
197 | result = [0] * 2
198 | # 循环计算两个概率
199 | for i in range(2):
200 | # 计算样本x的标签为i的概率
201 | result[i] = self.calcPwy_x(X, i)
202 | # 返回标签
203 | # max(result):找到result中最大的那个概率值
204 | # result.index(max(result)):通过最大的那个概率值再找到其索引,索引是0就返回0,1就返回1
205 | return result.index(max(result))
206 |
207 | def test(self):
208 | # 错误值计数
209 | errorCnt = 0
210 | # 对测试集中所有样本进行遍历
211 | for i in range(len(self.testDataList)):
212 | # 预测该样本对应的标签
213 | result = self.predict(self.testDataList[i])
214 | # 如果错误,计数值加1
215 | if result != self.testLabelList[i]: errorCnt += 1
216 | # 返回准确率
217 | return 1 - errorCnt / len(self.testDataList)
218 |
219 |
220 | if __name__ == '__main__':
221 | start = time.time()
222 | X, y = processData('iris.csv')
223 |
224 | X_train = X[0:149:30]
225 | y_train = y[0:149:30]
226 |
227 | # 自己在数据集后面加上了干扰的实例
228 | X_test = X[0:151:1]
229 | y_test = y[0:151:1]
230 |
231 | # 初始化最大熵类
232 | maxEnt = maxEnt(X_train, y_train, X_test, y_test)
233 |
234 | # 开始训练
235 | maxEnt.maxEntropyTrain()
236 |
237 | # 开始测试
238 | right_rate = maxEnt.test()
239 | print('准确度为:', right_rate)
240 |
241 | # 打印时间
242 | print('花费的时间为:', time.time() - start)
243 |
--------------------------------------------------------------------------------
/codes/MyNaiveBayes.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 |
8 | class NaiveBayes():
9 | def __init__(self,lambda_):
10 | # 贝叶斯系数 取0时,即为极大似然估计
11 | self.lambda_=lambda_
12 | # y的(类型:数量)
13 | self.y_types_count=None
14 | # y的(类型:概率)
15 | self.y_types_proba=None
16 | # (xi 的编号,xi的取值,y的类型):概率
17 | self.x_types_proba=dict()
18 |
19 | def fit(self,X_train,y_train):
20 | # y的所有取值类型
21 | self.y_types=np.unique(y_train)
22 | # 转化成pandas df 数据格式
23 | X=pd.DataFrame(X_train)
24 | y=pd.DataFrame(y_train)
25 | # y的(类型:数量)统计
26 | self.y_types_count=y[0].value_counts()
27 | # y的(类型:概率)计算
28 | self.y_types_proba=(self.y_types_count+self.lambda_)/(y.shape[0]+len(self.y_types)*self.lambda_)
29 |
30 | # (xi 的编号,xi的取值,y的类型):概率的计算 - 遍历xi
31 | for idx in X.columns:
32 | # 选取每一个y的类型
33 | for j in self.y_types:
34 | # 选择所有y==j为真的数据点的第idx个特征的值,并对这些值进行(类型:数量)统计
35 | p_x_y=X[(y==j).values][idx].value_counts()
36 | # 计算(xi 的编号,xi的取值,y的类型):概率
37 | for i in p_x_y.index:
38 | self.x_types_proba[(idx,i,j)]=(p_x_y[i]+self.lambda_)/(self.y_types_count[j]+p_x_y.shape[0]*self.lambda_)
39 |
40 | def predict(self,X_new):
41 | res=[]
42 | # 遍历y的可能取值
43 | for y in self.y_types:
44 | # 计算y的先验概率P(Y=ck)
45 | p_y=self.y_types_proba[y]
46 | p_xy=1
47 | for idx,x in enumerate(X_new):
48 | # 计算P(X=(x1,x2...xd)/Y=ck)
49 | p_xy*=self.x_types_proba[(idx,x,y)]
50 | res.append(p_y*p_xy)
51 | for i in range(len(self.y_types)):
52 | print("[{}]对应概率:{:.2%}".format(self.y_types[i],res[i]))
53 | #返回最大后验概率对应的y值
54 | return self.y_types[np.argmax(res)]
55 |
56 | def processData(filePath):
57 | print('开始读取数据')
58 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
59 | X = []
60 | y = []
61 | #默认读取csv的头部
62 | df = pd.read_csv(filePath)
63 | #利用数据集合的第一个维度特征分类
64 | #遍历pandas中df的每一行
65 | for index, row in df.iterrows():
66 | if(row["Sepal.Length"]>=5.5) :
67 | y.append(1)
68 | else:
69 | y.append(-1)
70 | X.append([float(row["Sepal.Width"]),str(row["Species"])])
71 | return np.array(X), np.array(y)
72 |
73 | def main():
74 | X, y = processData('iris.csv')
75 | X_train = X[0:149:4]
76 | y_train = y[0:149:4]
77 |
78 |
79 | clf=NaiveBayes(lambda_= 0.5)
80 | clf.fit(X_train,y_train)
81 |
82 | X_test=np.array([3.5,"setosa"])
83 | y_predict=clf.predict(X_test)
84 | print("{}被分类为:{}".format(X_test,y_predict))
85 |
86 | if __name__=="__main__":
87 | main()
--------------------------------------------------------------------------------
/codes/MyPerceptron.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 |
8 | #根据文件路径读取Iris数据集数据
9 | #return type: list
10 | def processData(filePath):
11 | print('开始读取数据')
12 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
13 | X = []
14 | y = []
15 | #默认读取csv的头部
16 | df = pd.read_csv(filePath)
17 | #利用数据集合的第一个维度特征分类
18 | #遍历pandas中df的每一行
19 | for index, row in df.iterrows():
20 | if(row["Sepal.Length"]>=5.5) :
21 | y.append(1)
22 | else:
23 | y.append(-1)
24 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])])
25 | return X, y
26 |
27 |
28 | #感知机类
29 | class MyPerceptron:
30 | def __init__(self):
31 | # 参数w
32 | self.w = None
33 | # 偏置b
34 | self.b = 0
35 | # 表示学习速率
36 | self.l_rate = 0.0001
37 | #表示迭代次数
38 | self.iter = 100
39 |
40 | #训练
41 | def train(self, X_train, y_train):
42 | print('开始训练')
43 | # 将数据转换成矩阵形式
44 | # 转换后的数据中每一个样本的向量都是横向的
45 | X_trainMat = np.mat(X_train)
46 | y_trainMat = np.mat(y_train).T
47 | # 获取数据矩阵的大小,为m*n
48 | m, n = np.shape(X_trainMat)
49 | #np.shape(X_trainMat)[1]表示的维度=样本的长度
50 | self.w = np.zeros((1, np.shape(X_trainMat)[1]))
51 |
52 | # 进行iter次迭代计算
53 | for k in range(self.iter):
54 | ##利用随机梯度下降
55 | for i in range(m):
56 | # 获取当前样本的向量
57 | xi = X_trainMat[i]
58 | # 获取当前样本所对应的标签
59 | yi = y_trainMat[i]
60 | # 判断是否是误分类样本
61 | # 误分类样本特诊为: -yi(w*xi+b)>=0,详细可参考书中2.2.2小节
62 | # 在书的公式中写的是>0,实际上如果=0,说明改点在超平面上,也是不正确的
63 | if -1 * yi * (self.w * xi.T + self.b) >= 0:
64 | # 对于误分类样本,进行梯度下降,更新w和b
65 | self.w = self.w + self.l_rate * yi * xi
66 | self.b = self.b + self.l_rate * yi
67 |
68 | #测试
69 | def predict(self,X_test, y_test):
70 | print('开始预测')
71 | X_testMat = np.mat(X_test)
72 | y_testMat = np.mat(y_test).T
73 |
74 | #获取测试数据集矩阵的大小
75 | m, n = np.shape(X_testMat)
76 | #错误样本数计数
77 | rightCount = 0
78 |
79 | for i in range(m):
80 | #获得单个样本向量
81 | xi = X_testMat[i]
82 | #获得该样本标记
83 | yi = y_testMat[i]
84 | #获得运算结果
85 | result = yi * (self.w * xi.T + self.b)
86 | #如果-yi(w*xi+b)>=0,说明该样本被误分类,错误样本数加一
87 | if result >= 0: rightCount += 1
88 | #正确率 = 1 - (样本分类错误数 / 样本总数)
89 | rightRate = rightCount / m
90 | #返回正确率
91 | return rightRate
92 |
93 |
94 | def main():
95 | X,y = processData('iris.csv')
96 |
97 | # 构建感知机对象,对数据集训练并且预测
98 | perceptron=MyPerceptron()
99 | perceptron.train(X[0:100],y[0:100])
100 | rightRate = perceptron.predict(X[101:140],y[101:140])
101 | print('对测试集的分类的正确率为:',rightRate)
102 | #有二维输入,所以应该有2个w
103 | print('模型的参数w为:',perceptron.w)
104 | print('模型的参数b为',perceptron.b)
105 |
106 |
107 | if __name__ == '__main__':
108 | main()
--------------------------------------------------------------------------------
/codes/MySVM.py:
--------------------------------------------------------------------------------
1 | #coding=utf-8
2 | #Author:lrrlrr
3 | #Email:kingsundad@gmail.com
4 |
5 | import numpy as np
6 | import pandas as pd
7 | import math
8 | import random
9 |
10 | #根据文件路径读取Iris数据集数据
11 | # #return type: np.array
12 | def processData(filePath):
13 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个
14 | X = []
15 | y = []
16 | #默认读取csv的头部
17 | df = pd.read_csv(filePath)
18 | #利用数据集合的第一个维度特征分类
19 | #遍历pandas中df的每一行
20 | for index, row in df.iterrows():
21 | if row["Species"] == "setosa" :
22 | y.append(1)
23 | else:
24 | y.append(-1)
25 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])])
26 | return np.array(X), np.array(y)
27 |
28 |
29 | # X_train:训练数据集
30 | # y_train: 训练测试集
31 | # sigma: 高斯核中分母的σ,在核函数中σ的值,高度依赖样本特征值范围,特征值范围较大时若不相应增大σ会导致所有计算得到的核函数均为0
32 | # C:软间隔中的惩罚参数,调和间隔与误分类点的系数
33 | # toler:松弛变量
34 | class SVM:
35 | def __init__(self, X_train, y_train, sigma = 10, C = 200, toler = 0.001):
36 |
37 | self.train_XMat = np.mat(X_train)
38 | # 训练标签集,为了方便后续运算提前做了转置,变为列向量
39 | self.train_yMat = np.mat(y_train).T
40 | # m:训练集数量 n:样本特征数目
41 | self.m, self.n = np.shape(self.train_XMat)
42 | self.sigma = sigma
43 | self.C = C
44 | self.toler = toler
45 |
46 | # 核函数(初始化时提前计算)
47 | self.k = self.calculateKernel()
48 | # SVM中的偏置b
49 | self.b = 0
50 | # α 长度为训练集数目
51 | self.alpha = [0] * self.train_XMat.shape[0]
52 | # SMO运算过程中的Ei
53 | self.E = [0 * self.train_yMat[i, 0] for i in range(self.train_yMat.shape[0])]
54 | self.supportVecIndex = []
55 |
56 |
57 | # 使用高斯核函数
58 | def calculateKernel(self):
59 | #初始化高斯核结果矩阵 大小 = 训练集长度m * 训练集长度m
60 | #k[i][j] = Xi * Xj
61 | k = [[0 for i in range(self.m)] for j in range(self.m)]
62 | for i in range(self.m):
63 | X = self.train_XMat[i, :]
64 | for j in range(i, self.m):
65 | Z = self.train_XMat[j, :]
66 | #先计算||X - Z||^2
67 | result = (X - Z) * (X - Z).T
68 | #分子除以分母后去指数,得到高斯核结果
69 | result = np.exp(-1 * result / (2 * self.sigma**2))
70 | #将Xi*Xj的结果存放入k[i][j]和k[j][i]中
71 | k[i][j] = result
72 | k[j][i] = result
73 | return k
74 |
75 | # 查看第i个α是否满足KKT条件
76 | def isSatisfyKKT(self, i):
77 | gxi =self.calculate_gxi(i)
78 | yi = self.train_yMat[i]
79 | if (math.fabs(self.alpha[i]) < self.toler) and (yi * gxi >= 1):
80 | return True
81 | elif (math.fabs(self.alpha[i] - self.C) < self.toler) and (yi * gxi <= 1):
82 | return True
83 | elif (self.alpha[i] > -self.toler) and (self.alpha[i] < (self.C + self.toler)) \
84 | and (math.fabs(yi * gxi - 1) < self.toler):
85 | return True
86 |
87 | return False
88 |
89 | def calculate_gxi(self, i):
90 | gxi = 0
91 | index = [i for i, alpha in enumerate(self.alpha) if alpha != 0]
92 | # 遍历每一个非零α,i为非零α的下标
93 | for j in index:
94 | #计算g(xi)
95 | gxi += self.alpha[j] * self.train_yMat[j] * self.k[j][i]
96 | # 求和结束后再单独加上偏置b
97 | gxi += self.b
98 |
99 | #返回
100 | return gxi
101 |
102 | def calculateEi(self, i):
103 | # 计算g(xi)
104 | gxi = self.calculate_gxi(i)
105 | # Ei = g(xi) - yi,直接将结果作为Ei返回
106 | return gxi - self.train_yMat[i]
107 |
108 |
109 | # E1: 第一个变量的E1
110 | # i: 第一个变量α的下标
111 | def getAlphaJ(self, E1, i):
112 | E2 = 0
113 | maxE1_E2 = -1
114 | maxIndex = -1
115 | nozeroE = [i for i, Ei in enumerate(self.E) if Ei != 0]
116 |
117 | for j in nozeroE:
118 | E2_tmp = self.calculateEi(j)
119 | if math.fabs(E1 - E2_tmp) > maxE1_E2:
120 | #更新
121 | maxE1_E2 = math.fabs(E1 - E2_tmp)
122 | E2 = E2_tmp
123 | maxIndex = j
124 | if maxIndex == -1:
125 | maxIndex = i
126 | while maxIndex == i:
127 | maxIndex = int(random.uniform(0, self.m))
128 | E2 = self.calculateEi(maxIndex)
129 | return E2, maxIndex
130 |
131 | def train(self, count = 100):
132 | countCur = 0; parameterChanged = 1
133 | while (countCur < count) and (parameterChanged > 0):
134 | countCur += 1
135 | parameterChanged = 0
136 |
137 | for i in range(self.m):
138 | #是否满足KKT条件,如果不满足则作为SMO中第一个变量从而进行优化
139 | if self.isSatisfyKKT(i) == False:
140 | #如果下标为i的α不满足KKT条件,则进行优化
141 | E1 = self.calculateEi(i)
142 | E2, j = self.getAlphaJ(E1, i)
143 |
144 | y1 = self.train_yMat[i]
145 | y2 = self.train_yMat[j]
146 |
147 | alphaOld_1 = self.alpha[i]
148 | alphaOld_2 = self.alpha[j]
149 |
150 | if y1 != y2:
151 | L = max(0, alphaOld_2 - alphaOld_1)
152 | H = min(self.C, self.C + alphaOld_2 - alphaOld_1)
153 | else:
154 | L = max(0, alphaOld_2 + alphaOld_1 - self.C)
155 | H = min(self.C, alphaOld_2 + alphaOld_1)
156 |
157 | if L == H:
158 | continue
159 |
160 | #计算α的新值
161 | k11 = self.k[i][i]
162 | k22 = self.k[j][j]
163 | k21 = self.k[j][i]
164 | k12 = self.k[i][j]
165 |
166 | alphaNew_2 = alphaOld_2 + y2 * (E1 - E2) / (k11 + k22 - 2 * k12)
167 |
168 | if alphaNew_2 < L: alphaNew_2 = L
169 | elif alphaNew_2 > H: alphaNew_2 = H
170 | #更新α1
171 | alphaNew_1 = alphaOld_1 + y1 * y2 * (alphaOld_2 - alphaNew_2)
172 |
173 | #计算b1和b2
174 | b1New = -1 * E1 - y1 * k11 * (alphaNew_1 - alphaOld_1) \
175 | - y2 * k21 * (alphaNew_2 - alphaOld_2) + self.b
176 | b2New = -1 * E2 - y1 * k12 * (alphaNew_1 - alphaOld_1) \
177 | - y2 * k22 * (alphaNew_2 - alphaOld_2) + self.b
178 |
179 | #依据α1和α2的值范围确定新b
180 | if (alphaNew_1 > 0) and (alphaNew_1 < self.C):
181 | bNew = b1New
182 | elif (alphaNew_2 > 0) and (alphaNew_2 < self.C):
183 | bNew = b2New
184 | else:
185 | bNew = (b1New + b2New) / 2
186 |
187 | #将更新后的各类值写入,进行更新
188 | self.alpha[i] = alphaNew_1
189 | self.alpha[j] = alphaNew_2
190 | self.b = bNew
191 |
192 | self.E[i] = self.calculateEi(i)
193 | self.E[j] = self.calculateEi(j)
194 |
195 | #如果α2的改变量过于小,就认为该参数未改变,不增加parameterChanged值
196 | #反之则自增1
197 | if math.fabs(alphaNew_2 - alphaOld_2) >= 0.00001:
198 | parameterChanged += 1
199 |
200 | #全部计算结束后,重新遍历一遍α,查找里面的支持向量
201 | for i in range(self.m):
202 | #如果α>0,说明是支持向量
203 | if self.alpha[i] > 0:
204 | #将支持向量的索引保存起来
205 | self.supportVecIndex.append(i)
206 |
207 | # 单独计算核函数
208 | def calculateSinglKernel(self, x1, x2):
209 | # 计算高斯核
210 | result = (x1 - x2) * (x1 - x2).T
211 | result = np.exp(-1 * result / (2 * self.sigma ** 2))
212 | return np.exp(result)
213 |
214 | # 对样本的标签进行预测
215 | def predict(self, x):
216 | result = 0
217 | for i in self.supportVecIndex:
218 | # 遍历所有支持向量,计算求和式
219 | tmp = self.calculateSinglKernel(self.train_XMat[i, :], np.mat(x))
220 | result += self.alpha[i] * self.train_yMat[i] * tmp
221 | # 偏置b
222 | result += self.b
223 |
224 | return np.sign(result)
225 |
226 |
227 |
228 | def test(self, X_test, y_test):
229 |
230 | rightCount = 0
231 |
232 | for i in range(len(X_test)):
233 | result = self.predict(X_test[i])
234 | if result == y_test[i]:
235 | rightCount += 1
236 | return rightCount / len(X_test)
237 |
238 |
239 | if __name__ == '__main__':
240 |
241 | X, y = processData('iris.csv')
242 |
243 | X_train = X[0:149:50]
244 | y_train = y[0:149:50]
245 |
246 | # 自己在数据集后面加上了干扰的实例
247 | X_test = X[0:150:1]
248 | y_test = y[0:150:1]
249 |
250 | # 初始化SVM类
251 | svm = SVM(X_train, y_train, 10, 200, 0.001)
252 |
253 | # 开始训练
254 | svm.train()
255 |
256 | # 开始测试
257 | rightRate = svm.test(X_test, y_test)
258 | print('准确率为百分之 %d' % (rightRate * 100))
--------------------------------------------------------------------------------
/codes/iris.csv:
--------------------------------------------------------------------------------
1 | "Number","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
2 | "1",5.1,3.5,1.4,0.2,"setosa"
3 | "2",4.9,3,1.4,0.2,"setosa"
4 | "3",4.7,3.2,1.3,0.2,"setosa"
5 | "4",4.6,3.1,1.5,0.2,"setosa"
6 | "5",5,3.6,1.4,0.2,"setosa"
7 | "6",5.4,3.9,1.7,0.4,"setosa"
8 | "7",4.6,3.4,1.4,0.3,"setosa"
9 | "8",5,3.4,1.5,0.2,"setosa"
10 | "9",4.4,2.9,1.4,0.2,"setosa"
11 | "10",4.9,3.1,1.5,0.1,"setosa"
12 | "11",5.4,3.7,1.5,0.2,"setosa"
13 | "12",4.8,3.4,1.6,0.2,"setosa"
14 | "13",4.8,3,1.4,0.1,"setosa"
15 | "14",4.3,3,1.1,0.1,"setosa"
16 | "15",5.8,4,1.2,0.2,"setosa"
17 | "16",5.7,4.4,1.5,0.4,"setosa"
18 | "17",5.4,3.9,1.3,0.4,"setosa"
19 | "18",5.1,3.5,1.4,0.3,"setosa"
20 | "19",5.7,3.8,1.7,0.3,"setosa"
21 | "20",5.1,3.8,1.5,0.3,"setosa"
22 | "21",5.4,3.4,1.7,0.2,"setosa"
23 | "22",5.1,3.7,1.5,0.4,"setosa"
24 | "23",4.6,3.6,1,0.2,"setosa"
25 | "24",5.1,3.3,1.7,0.5,"setosa"
26 | "25",4.8,3.4,1.9,0.2,"setosa"
27 | "26",5,3,1.6,0.2,"setosa"
28 | "27",5,3.4,1.6,0.4,"setosa"
29 | "28",5.2,3.5,1.5,0.2,"setosa"
30 | "29",5.2,3.4,1.4,0.2,"setosa"
31 | "30",4.7,3.2,1.6,0.2,"setosa"
32 | "31",4.8,3.1,1.6,0.2,"setosa"
33 | "32",5.4,3.4,1.5,0.4,"setosa"
34 | "33",5.2,4.1,1.5,0.1,"setosa"
35 | "34",5.5,4.2,1.4,0.2,"setosa"
36 | "35",4.9,3.1,1.5,0.2,"setosa"
37 | "36",5,3.2,1.2,0.2,"setosa"
38 | "37",5.5,3.5,1.3,0.2,"setosa"
39 | "38",4.9,3.6,1.4,0.1,"setosa"
40 | "39",4.4,3,1.3,0.2,"setosa"
41 | "40",5.1,3.4,1.5,0.2,"setosa"
42 | "41",5,3.5,1.3,0.3,"setosa"
43 | "42",4.5,2.3,1.3,0.3,"setosa"
44 | "43",4.4,3.2,1.3,0.2,"setosa"
45 | "44",5,3.5,1.6,0.6,"setosa"
46 | "45",5.1,3.8,1.9,0.4,"setosa"
47 | "46",4.8,3,1.4,0.3,"setosa"
48 | "47",5.1,3.8,1.6,0.2,"setosa"
49 | "48",4.6,3.2,1.4,0.2,"setosa"
50 | "49",5.3,3.7,1.5,0.2,"setosa"
51 | "50",5,3.3,1.4,0.2,"setosa"
52 | "51",7,3.2,4.7,1.4,"versicolor"
53 | "52",6.4,3.2,4.5,1.5,"versicolor"
54 | "53",6.9,3.1,4.9,1.5,"versicolor"
55 | "54",5.5,2.3,4,1.3,"versicolor"
56 | "55",6.5,2.8,4.6,1.5,"versicolor"
57 | "56",5.7,2.8,4.5,1.3,"versicolor"
58 | "57",6.3,3.3,4.7,1.6,"versicolor"
59 | "58",4.9,2.4,3.3,1,"versicolor"
60 | "59",6.6,2.9,4.6,1.3,"versicolor"
61 | "60",5.2,2.7,3.9,1.4,"versicolor"
62 | "61",5,2,3.5,1,"versicolor"
63 | "62",5.9,3,4.2,1.5,"versicolor"
64 | "63",6,2.2,4,1,"versicolor"
65 | "64",6.1,2.9,4.7,1.4,"versicolor"
66 | "65",5.6,2.9,3.6,1.3,"versicolor"
67 | "66",6.7,3.1,4.4,1.4,"versicolor"
68 | "67",5.6,3,4.5,1.5,"versicolor"
69 | "68",5.8,2.7,4.1,1,"versicolor"
70 | "69",6.2,2.2,4.5,1.5,"versicolor"
71 | "70",5.6,2.5,3.9,1.1,"versicolor"
72 | "71",5.9,3.2,4.8,1.8,"versicolor"
73 | "72",6.1,2.8,4,1.3,"versicolor"
74 | "73",6.3,2.5,4.9,1.5,"versicolor"
75 | "74",6.1,2.8,4.7,1.2,"versicolor"
76 | "75",6.4,2.9,4.3,1.3,"versicolor"
77 | "76",6.6,3,4.4,1.4,"versicolor"
78 | "77",6.8,2.8,4.8,1.4,"versicolor"
79 | "78",6.7,3,5,1.7,"versicolor"
80 | "79",6,2.9,4.5,1.5,"versicolor"
81 | "80",5.7,2.6,3.5,1,"versicolor"
82 | "81",5.5,2.4,3.8,1.1,"versicolor"
83 | "82",5.5,2.4,3.7,1,"versicolor"
84 | "83",5.8,2.7,3.9,1.2,"versicolor"
85 | "84",6,2.7,5.1,1.6,"versicolor"
86 | "85",5.4,3,4.5,1.5,"versicolor"
87 | "86",6,3.4,4.5,1.6,"versicolor"
88 | "87",6.7,3.1,4.7,1.5,"versicolor"
89 | "88",6.3,2.3,4.4,1.3,"versicolor"
90 | "89",5.6,3,4.1,1.3,"versicolor"
91 | "90",5.5,2.5,4,1.3,"versicolor"
92 | "91",5.5,2.6,4.4,1.2,"versicolor"
93 | "92",6.1,3,4.6,1.4,"versicolor"
94 | "93",5.8,2.6,4,1.2,"versicolor"
95 | "94",5,2.3,3.3,1,"versicolor"
96 | "95",5.6,2.7,4.2,1.3,"versicolor"
97 | "96",5.7,3,4.2,1.2,"versicolor"
98 | "97",5.7,2.9,4.2,1.3,"versicolor"
99 | "98",6.2,2.9,4.3,1.3,"versicolor"
100 | "99",5.1,2.5,3,1.1,"versicolor"
101 | "100",5.7,2.8,4.1,1.3,"versicolor"
102 | "101",6.3,3.3,6,2.5,"virginica"
103 | "102",5.8,2.7,5.1,1.9,"virginica"
104 | "103",7.1,3,5.9,2.1,"virginica"
105 | "104",6.3,2.9,5.6,1.8,"virginica"
106 | "105",6.5,3,5.8,2.2,"virginica"
107 | "106",7.6,3,6.6,2.1,"virginica"
108 | "107",4.9,2.5,4.5,1.7,"virginica"
109 | "108",7.3,2.9,6.3,1.8,"virginica"
110 | "109",6.7,2.5,5.8,1.8,"virginica"
111 | "110",7.2,3.6,6.1,2.5,"virginica"
112 | "111",6.5,3.2,5.1,2,"virginica"
113 | "112",6.4,2.7,5.3,1.9,"virginica"
114 | "113",6.8,3,5.5,2.1,"virginica"
115 | "114",5.7,2.5,5,2,"virginica"
116 | "115",5.8,2.8,5.1,2.4,"virginica"
117 | "116",6.4,3.2,5.3,2.3,"virginica"
118 | "117",6.5,3,5.5,1.8,"virginica"
119 | "118",7.7,3.8,6.7,2.2,"virginica"
120 | "119",7.7,2.6,6.9,2.3,"virginica"
121 | "120",6,2.2,5,1.5,"virginica"
122 | "121",6.9,3.2,5.7,2.3,"virginica"
123 | "122",5.6,2.8,4.9,2,"virginica"
124 | "123",7.7,2.8,6.7,2,"virginica"
125 | "124",6.3,2.7,4.9,1.8,"virginica"
126 | "125",6.7,3.3,5.7,2.1,"virginica"
127 | "126",7.2,3.2,6,1.8,"virginica"
128 | "127",6.2,2.8,4.8,1.8,"virginica"
129 | "128",6.1,3,4.9,1.8,"virginica"
130 | "129",6.4,2.8,5.6,2.1,"virginica"
131 | "130",7.2,3,5.8,1.6,"virginica"
132 | "131",7.4,2.8,6.1,1.9,"virginica"
133 | "132",7.9,3.8,6.4,2,"virginica"
134 | "133",6.4,2.8,5.6,2.2,"virginica"
135 | "134",6.3,2.8,5.1,1.5,"virginica"
136 | "135",6.1,2.6,5.6,1.4,"virginica"
137 | "136",7.7,3,6.1,2.3,"virginica"
138 | "137",6.3,3.4,5.6,2.4,"virginica"
139 | "138",6.4,3.1,5.5,1.8,"virginica"
140 | "139",6,3,4.8,1.8,"virginica"
141 | "140",6.9,3.1,5.4,2.1,"virginica"
142 | "141",6.7,3.1,5.6,2.4,"virginica"
143 | "142",6.9,3.1,5.1,2.3,"virginica"
144 | "143",5.8,2.7,5.1,1.9,"virginica"
145 | "144",6.8,3.2,5.9,2.3,"virginica"
146 | "145",6.7,3.3,5.7,2.5,"virginica"
147 | "146",6.7,3,5.2,2.3,"virginica"
148 | "147",6.3,2.5,5,1.9,"virginica"
149 | "148",6.5,3,5.2,2,"virginica"
150 | "149",6.2,3.4,5.4,2.3,"virginica"
151 | "150",5.9,3,5.1,1.8,"virginica"
152 | "151",52.9,322,52.221,1212.8,"virginica"
153 |
154 |
--------------------------------------------------------------------------------
/docImage/10_1_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_1_1.jpg
--------------------------------------------------------------------------------
/docImage/10_2_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_2_1.jpg
--------------------------------------------------------------------------------
/docImage/11_1_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_1.jpg
--------------------------------------------------------------------------------
/docImage/11_1_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_2.jpg
--------------------------------------------------------------------------------
/docImage/11_2_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_1.jpg
--------------------------------------------------------------------------------
/docImage/11_2_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_2.jpg
--------------------------------------------------------------------------------
/docImage/11_2_3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_3.jpg
--------------------------------------------------------------------------------
/docImage/8_1_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_1.jpg
--------------------------------------------------------------------------------
/docImage/8_1_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_2.jpg
--------------------------------------------------------------------------------
/docImage/8_1_3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_3.jpg
--------------------------------------------------------------------------------
/docImage/8_2_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_1.jpg
--------------------------------------------------------------------------------
/docImage/8_2_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_2.jpg
--------------------------------------------------------------------------------
/docImage/8_2_3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_3.jpg
--------------------------------------------------------------------------------
/docImage/8_2_4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_4.jpg
--------------------------------------------------------------------------------
/docImage/9_1_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_1.jpg
--------------------------------------------------------------------------------
/docImage/9_1_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_2.jpg
--------------------------------------------------------------------------------
/docImage/9_2_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_1.jpg
--------------------------------------------------------------------------------
/docImage/9_2_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_2.jpg
--------------------------------------------------------------------------------
/docImage/9_2_3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_3.jpg
--------------------------------------------------------------------------------
/docImage/Maximum_separation_hyperplane1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane1.jpg
--------------------------------------------------------------------------------
/docImage/Maximum_separation_hyperplane2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane2.jpg
--------------------------------------------------------------------------------
/docImage/Maximum_separation_hyperplane3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane3.jpg
--------------------------------------------------------------------------------
/docImage/Maximum_separation_hyperplane4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane4.jpg
--------------------------------------------------------------------------------
/docImage/Novikoff1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff1.jpg
--------------------------------------------------------------------------------
/docImage/Novikoff2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff2.jpg
--------------------------------------------------------------------------------
/docImage/Novikoff3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff3.jpg
--------------------------------------------------------------------------------
/docImage/Soft_interval_maximization_dual1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual1.jpg
--------------------------------------------------------------------------------
/docImage/Soft_interval_maximization_dual2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual2.jpg
--------------------------------------------------------------------------------
/docImage/Soft_interval_maximization_dual3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual3.jpg
--------------------------------------------------------------------------------
/docImage/bayes_naive_bayes1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes1.jpg
--------------------------------------------------------------------------------
/docImage/bayes_naive_bayes2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes2.jpg
--------------------------------------------------------------------------------
/docImage/bayesian_estimation.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayesian_estimation.jpg
--------------------------------------------------------------------------------
/docImage/hoeffding1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding1.jpg
--------------------------------------------------------------------------------
/docImage/hoeffding2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding2.jpg
--------------------------------------------------------------------------------
/docImage/iterative_method1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method1.jpg
--------------------------------------------------------------------------------
/docImage/iterative_method2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method2.jpg
--------------------------------------------------------------------------------
/docImage/iterative_method3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method3.jpg
--------------------------------------------------------------------------------
/docImage/lagrange_duality1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality1.jpg
--------------------------------------------------------------------------------
/docImage/lagrange_duality2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality2.jpg
--------------------------------------------------------------------------------
/docImage/lagrange_duality3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality3.jpg
--------------------------------------------------------------------------------
/docImage/maximum_entropy1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy1.jpg
--------------------------------------------------------------------------------
/docImage/maximum_entropy2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy2.jpg
--------------------------------------------------------------------------------
/docImage/maximum_likelihood_estimation.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_likelihood_estimation.jpg
--------------------------------------------------------------------------------
/docImage/mle_naive_bayes.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/mle_naive_bayes.jpg
--------------------------------------------------------------------------------
/docImage/poster_prob1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob1.jpg
--------------------------------------------------------------------------------
/docImage/poster_prob2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob2.jpg
--------------------------------------------------------------------------------
/notes/chapter1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter1.pdf
--------------------------------------------------------------------------------
/notes/chapter10.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter10.pdf
--------------------------------------------------------------------------------
/notes/chapter11.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter11.pdf
--------------------------------------------------------------------------------
/notes/chapter2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter2.pdf
--------------------------------------------------------------------------------
/notes/chapter3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter3.pdf
--------------------------------------------------------------------------------
/notes/chapter4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter4.pdf
--------------------------------------------------------------------------------
/notes/chapter5.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter5.pdf
--------------------------------------------------------------------------------
/notes/chapter6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter6.pdf
--------------------------------------------------------------------------------
/notes/chapter7.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter7.pdf
--------------------------------------------------------------------------------
/notes/chapter8.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter8.pdf
--------------------------------------------------------------------------------
/notes/chapter9.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter9.pdf
--------------------------------------------------------------------------------