├── .gitignore ├── README.md ├── codes ├── MyAdaBoost.py ├── MyDecisionTree.py ├── MyEM.py ├── MyHMM.py ├── MyHMMTestData.txt ├── MyHMMTrainData.txt ├── MyKNN.py ├── MyLogisticRegression.py ├── MyMaxEnt.py ├── MyNaiveBayes.py ├── MyPerceptron.py ├── MySVM.py └── iris.csv ├── docImage ├── 10_1_1.jpg ├── 10_2_1.jpg ├── 11_1_1.jpg ├── 11_1_2.jpg ├── 11_2_1.jpg ├── 11_2_2.jpg ├── 11_2_3.jpg ├── 8_1_1.jpg ├── 8_1_2.jpg ├── 8_1_3.jpg ├── 8_2_1.jpg ├── 8_2_2.jpg ├── 8_2_3.jpg ├── 8_2_4.jpg ├── 9_1_1.jpg ├── 9_1_2.jpg ├── 9_2_1.jpg ├── 9_2_2.jpg ├── 9_2_3.jpg ├── Maximum_separation_hyperplane1.jpg ├── Maximum_separation_hyperplane2.jpg ├── Maximum_separation_hyperplane3.jpg ├── Maximum_separation_hyperplane4.jpg ├── Novikoff1.jpg ├── Novikoff2.jpg ├── Novikoff3.jpg ├── Soft_interval_maximization_dual1.jpg ├── Soft_interval_maximization_dual2.jpg ├── Soft_interval_maximization_dual3.jpg ├── bayes_naive_bayes1.jpg ├── bayes_naive_bayes2.jpg ├── bayesian_estimation.jpg ├── hoeffding1.jpg ├── hoeffding2.jpg ├── iterative_method1.jpg ├── iterative_method2.jpg ├── iterative_method3.jpg ├── lagrange_duality1.jpg ├── lagrange_duality2.jpg ├── lagrange_duality3.jpg ├── maximum_entropy1.jpg ├── maximum_entropy2.jpg ├── maximum_likelihood_estimation.jpg ├── mle_naive_bayes.jpg ├── poster_prob1.jpg └── poster_prob2.jpg └── notes ├── chapter1.pdf ├── chapter10.pdf ├── chapter11.pdf ├── chapter2.pdf ├── chapter3.pdf ├── chapter4.pdf ├── chapter5.pdf ├── chapter6.pdf ├── chapter7.pdf ├── chapter8.pdf └── chapter9.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Statistic-study-notes 2 | # 李航统计学习方法（第二版）的学习笔记，包括： 3 | ## 1、每章重点数学公式的手动推导 4 |
均为手写然后扫描成图片，字迹不工整还望谅解，之后有时间会用Latex修正 5 | 点击数学公式没有出现图片的情况需要搭梯子才可以在线预览到数学推导的图片... 6 | 7 | - [1.第一章数学公式推导](#1第一章数学公式推导) 8 | - [1.1 极大似然估计推导](#11极大似然估计推导) 9 | - [1.2 贝叶斯估计推导](#12贝叶斯估计推导) 10 | - [1.3 利用Hoeffding推导泛化误差上界](#13利用Hoeffding推导泛化误差上界) 11 | - [2.第二章数学公式推导](#2第二章数学公式推导) 12 | - [2.1 算法的收敛性证明Novikoff](#21算法的收敛性证明Novikoff) 13 | - [3.第三章数学公式推导](#3第三章数学公式推导) 14 | - 3.1 无数学推导，偏重算法实现-KNN 15 | - [4.第四章数学公式推导](#4第四章数学公式推导) 16 | - [4.1 用极大似然法估计朴素贝叶斯参数](#41用极大似然法估计朴素贝叶斯参数) 17 | - [4.2 用贝叶斯估计法朴素贝叶斯参数](#42用贝叶斯估计法朴素贝叶斯参数) 18 | - [4.3 证明后验概率最大化即期望风险最小化](#43证明后验概率最大化即期望风险最小化) 19 | - [5.第五章数学公式推导](#5第五章数学公式推导) 20 | - 5.1 无数学推导，偏重算法实现-决策树 21 | - [6.第六章数学公式推导](#6第六章数学公式推导) 22 | - [6.1 最大熵模型的数学推导](#61最大熵模型的数学推导) 23 | - [6.2 拉格朗日对偶性问题的数学推导](#62拉格朗日对偶性问题的数学推导) 24 | - [6.3 改进的迭代尺度法数学推导](#63改进的迭代尺度法数学推导) 25 | - [7.第七章数学公式推导](#7第七章数学公式推导) 26 | - [7.1 软间隔最大化对偶问题](#71软间隔最大化对偶问题) 27 | - [7.2 证明最大间隔分离超平面存在唯一性](#72证明最大间隔分离超平面存在唯一性) 28 | - [8.第八章数学公式推导](#8第八章数学公式推导) 29 | - [8.1 证明AdaBoost是前向分步加法算法的特例](#81证明AdaBoost是前向分步加法算法的特例) 30 | - [8.2 证明AdaBoost的训练误差界](#82证明AdaBoost的训练误差界) 31 | - [9.第九章数学公式推导](#9第九章数学公式推导) 32 | - [9.1 EM算法的导出](#91EM算法的导出) 33 | - [9.2 用EM算法估计高斯模混合模型](#92用EM算法估计高斯模混合模型) 34 | - [10.第十章数学公式推导](#10第十章数学公式推导) 35 | - [10.1 前向算法两个公式的证明](#101前向算法两个公式的证明) 36 | - [10.2 维特比算法推导](#102维特比算法推导) 37 | - [11.第十一章数学公式推导](#11第十一章数学公式推导) 38 | - [11.1 条件随机场的矩阵形式推导](#111条件随机场的矩阵形式推导) 39 | - [11.2 牛顿法和拟牛顿法的推导](#112牛顿法和拟牛顿法的推导) 40 | 41 | 42 | 43 | ## 2、每章算法的Python自实现 44 | [数据集为iris.csv（带Header)](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/iris.csv) 45 | ### 第2章感知机模型（使用Iris数据集） 46 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyPerceptron.py) 47 | ### 第3章 KNN模型（线性-使用Iris数据集与 KD树-有点问题..修改后再上传） 48 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyKNN.py) 49 | ### 第4章朴素贝叶斯模型（使用Iris数据集） 50 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyNaiveBayes.py) 51 | ### 第5章决策树模型（使用Iris数据集） 52 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyDecisionTree.py) 53 | ### 第6章逻辑斯提回归模型（使用Iris数据集，采用梯度下降方法） 54 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyLogisticRegression.py) 55 | ### 第6章最大熵模型(使用Iris数据集) 56 | 源代码[MyMaxEnt.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyMaxEnt.py) 57 | ### 第7章 SVM(使用Iris数据集) 58 | 源代码[MySVM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MySVM.py) 59 | ### 第8章 AdaBoost(使用Iris数据集) 60 | 源代码[MyAdaBoost.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyAdaBoost.py) 61 | ### 第9章 EM算法(使用自己随机生成的符合高斯分布的数据) 62 | 源代码[MyEM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyEM.py) 63 | ### 第10章 HMM算法(使用人民日报语料库进行训练,对输入的文本进行分词，12.8前完成) 64 | 源代码[MyHMM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyHMM.py) 65 | 66 | ## 3、学习笔记汇总 67 |
学习笔记均为自己学习过程中记录在笔记本上然后拍照扫描成pdf 68 | ### [第1章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter1.pdf) 69 | ### [第2章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter2.pdf) 70 | ### [第3章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter3.pdf) 71 | ### [第4章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter4.pdf) 72 | ### [第5章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter5.pdf) 73 | ### [第6章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter6.pdf) 74 | ### [第7章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter7.pdf) 75 | ### [第8章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter8.pdf) 76 | ### [第9章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter9.pdf) 77 | ### [第10章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter10.pdf) 78 | ### [第11章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter11.pdf) 79 | 80 | 81 | ## 4、每章节的课后习题实现 82 |
接下来每周都会定时更新课后习题的实现 83 | 84 | ## 1第一章数学公式推导 85 | 86 | ### 1.1极大似然估计推导 87 | 88 | ![](/docImage/maximum_likelihood_estimation.jpg) 89 | 90 | ### 1.2贝叶斯估计推导 91 | 92 | ![](/docImage/bayesian_estimation.jpg) 93 | 94 | 95 | ### 1.3利用Hoeffding推导泛化误差上界 96 | 97 | ![](/docImage/hoeffding1.jpg) 98 | 99 | ![](/docImage/hoeffding2.jpg) 100 | 101 | ## 2第二章数学公式推导 102 | 103 | ### 2.1算法的收敛性证明Novikoff 104 | 105 | ![](/docImage/Novikoff1.jpg) 106 | ![](/docImage/Novikoff2.jpg) 107 | ![](/docImage/Novikoff3.jpg) 108 | 109 | ## 3第三章数学公式推导 110 | 111 | ## 4第四章数学公式推导 112 | 113 | ### 4.1用极大似然法估计朴素贝叶斯参数 114 | ![](/docImage/mle_naive_bayes.jpg) 115 | 116 | ### 4.2用贝叶斯估计法朴素贝叶斯参数 117 | ![](/docImage/bayes_naive_bayes1.jpg) 118 | ![](/docImage/bayes_naive_bayes2.jpg) 119 | 120 | ### 4.3证明后验概率最大化即期望风险最小化 121 | ![](/docImage/poster_prob1.jpg) 122 | ![](/docImage/poster_prob2.jpg) 123 | 124 | ## 5第五章数学公式推导 125 | 126 | ## 6第六章数学公式推导 127 | 128 | ### 6.1最大熵模型的数学推导 129 | ![](/docImage/maximum_entropy1.jpg) 130 | ![](/docImage/maximum_entropy2.jpg) 131 | 132 | ### 6.2拉格朗日对偶性问题的数学推导 133 | ![](/docImage/lagrange_duality1.jpg) 134 | ![](/docImage/lagrange_duality2.jpg) 135 | ![](/docImage/lagrange_duality3.jpg) 136 | 137 | ### 6.3改进的迭代尺度法数学推导 138 | ![](/docImage/iterative_method1.jpg) 139 | ![](/docImage/iterative_method2.jpg) 140 | ![](/docImage/iterative_method3.jpg) 141 | 142 | ## 7第七章数学公式推导 143 | 144 | ### 7.1软间隔最大化对偶问题 145 | ![](/docImage/Soft_interval_maximization_dual1.jpg) 146 | ![](/docImage/Soft_interval_maximization_dual2.jpg) 147 | ![](/docImage/Soft_interval_maximization_dual3.jpg) 148 | 149 | ### 7.2证明最大间隔分离超平面存在唯一性 150 | ![](/docImage/Maximum_separation_hyperplane1.jpg) 151 | ![](/docImage/Maximum_separation_hyperplane2.jpg) 152 | ![](/docImage/Maximum_separation_hyperplane3.jpg) 153 | ![](/docImage/Maximum_separation_hyperplane4.jpg) 154 | 155 | ## 8第八章数学公式推导 156 | 157 | ### 8.1证明AdaBoost是前向分步加法算法的特例 158 | ![](/docImage/8_1_1.jpg) 159 | ![](/docImage/8_1_2.jpg) 160 | ![](/docImage/8_1_3.jpg) 161 | 162 | ### 8.2 证明AdaBoost的训练误差界 163 | ![](/docImage/8_2_1.jpg) 164 | ![](/docImage/8_2_2.jpg) 165 | ![](/docImage/8_2_3.jpg) 166 | ![](/docImage/8_2_4.jpg) 167 | 168 | ## 9第九章数学公式推导 169 | ### 9.1 EM算法的导出 170 | ![](/docImage/9_1_1.jpg) 171 | ![](/docImage/9_1_2.jpg) 172 | 173 | ### 9.2 用EM算法估计高斯模混合模型 174 | ![](/docImage/9_2_1.jpg) 175 | ![](/docImage/9_2_2.jpg) 176 | ![](/docImage/9_2_3.jpg) 177 | 178 | ## 10.第十章数学公式推导 179 | 180 | ### 10.1 前向算法两个公式的证明 181 | ![](/docImage/10_1_1.jpg) 182 | 183 | ### 10.2 维特比算法推导 184 | ![](/docImage/10_2_1.jpg) 185 | 186 | ## 11.第十一章数学公式推导 187 | ### 11.1 条件随机场的矩阵形式推导 188 | ![](/docImage/11_1_1.jpg) 189 | ![](/docImage/11_1_1.jpg) 190 | ### 11.2 牛顿法和拟牛顿法的推导 191 | ![](/docImage/11_2_1.jpg) 192 | ![](/docImage/11_2_2.jpg) 193 | ![](/docImage/11_2_3.jpg) 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | -------------------------------------------------------------------------------- /codes/MyAdaBoost.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | #根据文件路径读取Iris数据集数据 9 | # #return type: np.array 10 | def processData(filePath): 11 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 12 | X = [] 13 | y = [] 14 | #默认读取csv的头部 15 | df = pd.read_csv(filePath) 16 | #利用数据集合的第一个维度特征分类 17 | #遍历pandas中df的每一行 18 | for index, row in df.iterrows(): 19 | if row["Species"] == "setosa": 20 | y.append(1) 21 | else: 22 | y.append(-1) 23 | #进行二值化处理，使得样本数据为0/1 24 | X.append([int(float(row["Sepal.Length"])>5.0),int(float(row["Sepal.Width"])>3.5),int(float(row["Petal.Length"])>1.4)]) 25 | 26 | return np.array(X), np.array(y) 27 | 28 | def createOneLayerBoostTree(X_train, y_train, D): 29 | #获得样本数目及特征数量 30 | m, n = np.shape(X_train) 31 | #该字典代表了一层提升树，用于存放当前层提升树的参数 32 | oneLowerNumayerBoostTree = {} 33 | #初始化分类误差率为1，即100% 34 | oneLowerNumayerBoostTree['errorRate'] = 1 35 | #对每一个特征进行遍历，寻找用于划分的最合适的特征 36 | for i in range(n): 37 | #因为特征已经经过二值化，只能为0和1，因此分切点为-0.5， 0.5， 1.5 38 | for division in [-0.5, 0.5, 1.5]: 39 | #规则为如下所示： 40 | #LowerNumowerNumSetOne：LowerNumow is one：小于某值得是1 41 | #UpperNumSetOne：UpperNumigh is one：大于某值得是1 42 | for rule in ['LowerNumSetOne', 'UpperNumSetOne']: 43 | #按照第i个特征，以值division进行切割，进行当前设置得到的预测和分类错误率 44 | Gx, e = calculate_e_Gx(X_train, y_train, i, division, rule, D) 45 | #如果分类错误率e小于当前最小的e，那么将它作为最小的分类错误率保存 46 | if e < oneLowerNumayerBoostTree['errorRate']: 47 | # 分类错误率 48 | oneLowerNumayerBoostTree['errorRate'] = e 49 | # 最优划分点 50 | oneLowerNumayerBoostTree['division'] = division 51 | # 划分规则 52 | oneLowerNumayerBoostTree['rule'] = rule 53 | # 预测结果 54 | oneLowerNumayerBoostTree['Gx'] = Gx 55 | # 特征索引 56 | oneLowerNumayerBoostTree['feature'] = i 57 | return oneLowerNumayerBoostTree 58 | 59 | def createBoostTree(X_train, y_train, treeNum): 60 | #将数据和标签转化为数组形式 61 | X_train = np.array(X_train) 62 | y_train = np.array(y_train) 63 | #获得训练集数量以及特征个数 64 | m, n = np.shape(X_train) 65 | #初始化D为1/N 66 | D = [1 / m] * m 67 | #初始化提升树列表，每个位置为一层 68 | tree = [] 69 | #循环创建提升树 70 | for i in range(treeNum): 71 | #得到当前层的提升树 72 | currentTree = createOneLayerBoostTree(X_train, y_train, D) 73 | # 这边由于用的是Iris数据集，数据量过小，所以currentTree['errorRate']即误差分类率可能为0 74 | # 因此在最后加上了0.0001来避免除数为0的错误 75 | alpha = 1/2 * np.log((1 - currentTree['errorRate']) / (currentTree['errorRate']+0.0001)) 76 | #获得当前层的预测结果，用于下一步更新D 77 | Gx = currentTree['Gx'] 78 | D = np.multiply(D, np.exp(-1 * alpha * np.multiply(y_train, Gx))) / sum(D) 79 | currentTree['alpha'] = alpha 80 | tree.append(currentTree) 81 | return tree 82 | 83 | # 前提：数据进行二值处理 84 | def predict(x, division, rule, feature): 85 | if rule == 'LowerNumSetOne': 86 | LowerNum = 1 87 | UpperNum = -1 88 | else: 89 | LowerNum = -1 90 | UpperNum = 1 91 | 92 | if x[feature] < division: 93 | return LowerNum 94 | else: 95 | return UpperNum 96 | 97 | def test(X_test, y_test, tree): 98 | rightCount = 0 99 | for i in range(len(X_test)): 100 | result = 0 101 | for currentTree in tree: 102 | division = currentTree['division'] 103 | rule = currentTree['rule'] 104 | feature = currentTree['feature'] 105 | alpha = currentTree['alpha'] 106 | result += alpha * predict(X_test[i], division, rule, feature) 107 | #预测结果取sign值，如果大于0 sign为1，反之为0 108 | if np.sign(result) == y_test[i]: 109 | rightCount += 1 110 | #返回准确率 111 | return rightCount / len(X_test) 112 | 113 | #计算分类错误率 114 | def calculate_e_Gx(X_train, y_train, n, division, rule, D): 115 | #初始化分类误差率为0 116 | e = 0 117 | x = X_train[:, n] 118 | y = y_train 119 | train = [] 120 | if rule == 'LowerNumSetOne': 121 | LowerNum = 1 122 | UpperNum = -1 123 | else: 124 | LowerNum = -1 125 | UpperNum = 1 126 | 127 | #遍历样本的特征 128 | for i in range(X_train.shape[0]): 129 | if x[i] < division: 130 | #如果小于划分点，则预测为LowerNum 131 | #如果设置小于division为1，那么LowerNum就是1， 132 | #如果设置小于division为-1，LowerNum就是-1 133 | train.append(LowerNum) 134 | #如果预测错误，分类错误率要加上该分错的样本的权值 135 | if y[i] != LowerNum: 136 | e += D[i] 137 | elif x[i] >= division: 138 | train.append(UpperNum) 139 | if y[i] != UpperNum: 140 | e += D[i] 141 | return np.array(train), e 142 | 143 | if __name__ == '__main__': 144 | X, y = processData('iris.csv') 145 | 146 | X_train = X[0:149:50] 147 | y_train = y[0:149:50] 148 | 149 | # 自己在数据集后面加上了干扰的实例 150 | X_test = X[0:150:1] 151 | y_test = y[0:150:1] 152 | 153 | #创建提升树,最后一个参数代表的是公式的m，即多少个模型 154 | tree = createBoostTree(X_train, y_train, 5) 155 | 156 | #准确率测试 157 | rightRate = test(X_test, y_test, tree) 158 | print('分类正确率为:',rightRate * 100, '%') -------------------------------------------------------------------------------- /codes/MyDecisionTree.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | import math 8 | from collections import namedtuple 9 | 10 | # 定义节点 11 | # 孩子节点、分类特征的取值、节点内容、节点分类特征、标签 12 | class Node(namedtuple("Node","children type content feature label")): 13 | def __repr__(self): 14 | return str(tuple(self)) 15 | 16 | #决策树 17 | class DecisionTree(): 18 | def __init__(self,method="info_gain_ratio"): 19 | self.tree=None 20 | self.method=method 21 | 22 | #计算经验熵 23 | def _experienc_entropy(self,X): 24 | 25 | # 统计每个取值的出现频率 26 | x_types_prob=X.iloc[:,0].value_counts()/X.shape[0] 27 | # 计算经验熵 28 | x_experienc_entropy=sum((-p*math.log(p,2) for p in x_types_prob)) 29 | return x_experienc_entropy 30 | 31 | #计算条件熵 32 | def _conditinal_entropy(self,X_train,y_train,feature): 33 | # feature特征下每个特征取值数量统计 34 | x_types_count= X_train[feature].value_counts() 35 | # 每个特征取值频率计算 36 | x_types_prob = x_types_count / X_train.shape[0] 37 | # 每个特征取值下类别y的经验熵 38 | x_experienc_entropy=[self._experienc_entropy(y_train[(X_train[feature]==i).values]) for i in x_types_count.index] 39 | # 特征feature对数据集的经验条件熵 40 | x_conditinal_entropy=(x_types_prob.mul(x_experienc_entropy)).sum() 41 | return x_conditinal_entropy 42 | 43 | #计算信息增益 44 | def _information_gain(self,X_train,y_train,feature): 45 | return self._experienc_entropy(y_train)-self._conditinal_entropy(X_train,y_train,feature) 46 | 47 | #计算信息增益比 48 | def _information_gain_ratio(self,X_train,y_train,features,feature): 49 | index=features.index(feature) 50 | return self._information_gain(X_train,y_train,feature)/self._experienc_entropy(X_train.iloc[:,index:index+1]) 51 | 52 | #选择分类特征 53 | def _choose_feature(self,X_train,y_train,features): 54 | if self.method=="info_gain_ratio": 55 | info=[self._information_gain_ratio(X_train,y_train,features,feature) for feature in features] 56 | elif self.method=="info_gain": 57 | info=[self._information_gain(X_train,y_train,feature) for feature in features] 58 | else: 59 | raise TypeError 60 | optimal_feature=features[np.argmax(info)] 61 | return optimal_feature 62 | 63 | #递归构造决策树 64 | def _built_tree(self,X_train,y_train,features,type=None): 65 | # 只有一个节点或已经完全分类，则决策树停止继续分叉 66 | if len(features)==1 or len(np.unique(y_train))==1: 67 | label=list(y_train[0].value_counts().index)[0] 68 | return Node(children=None,type=type,content=(X_train,y_train),feature=None,label=label) 69 | else: 70 | # 选择分类特征值 71 | feature=self._choose_feature(X_train,y_train,features) 72 | features.remove(feature) 73 | # 构建节点，同时递归创建孩子节点 74 | features_iter=np.unique(X_train[feature]) 75 | children=[] 76 | for item in features_iter: 77 | X_item=X_train[(X_train[feature]==item).values] 78 | y_item=y_train[(X_train[feature]==item).values] 79 | children.append(self._built_tree(X_item,y_item,features,type=item)) 80 | return Node(children=children,type=type,content=None,feature=feature,label=None) 81 | 82 | #进行剪枝 83 | def _prune(self): 84 | pass 85 | 86 | def fit(self,X_train,y_train,features): 87 | self.tree=self._built_tree(X_train,y_train,features) 88 | 89 | 90 | def _search(self,X_new): 91 | tree=self.tree 92 | # 若还有孩子节点，则继续向下搜索，否则搜索停止，在当前节点获取标签 93 | while tree.children: 94 | for child in tree.children: 95 | if X_new[tree.feature].loc[0]==child.type: 96 | tree=child 97 | break 98 | return tree.label 99 | 100 | def predict(self,X_new): 101 | return self._search(X_new) 102 | 103 | def processData(filePath): 104 | print('开始读取数据') 105 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 106 | X = [] 107 | y = [] 108 | #默认读取csv的头部 109 | df = pd.read_csv(filePath) 110 | #利用数据集合的第一个维度特征分类 111 | #遍历pandas中df的每一行 112 | for index, row in df.iterrows(): 113 | if row["Species"] == "setosa" : 114 | y.append("是setosa花") 115 | else: 116 | y.append("不是setosa花") 117 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"]),float(row["Petal.Width"])]) 118 | return np.array(X), np.array(y) 119 | 120 | def main(): 121 | # 训练数据集 122 | features = ["萼片长", "萼片宽", "花瓣长", "花瓣宽"] 123 | 124 | X , y = processData('iris.csv') 125 | 126 | X_train = X[0:149:4] 127 | y_train = y[0:149:4] 128 | 129 | X_test = X[0:149:10] 130 | y_test = y[0:149:10] 131 | 132 | 133 | X_train = pd.DataFrame(X_train, columns=features) 134 | y_train = pd.DataFrame(y_train) 135 | # 训练,使用信息增益 136 | clf=DecisionTree(method="info_gain") 137 | clf.fit(X_train,y_train,features.copy()) 138 | print('训练结束') 139 | 140 | X_new= pd.DataFrame(X_test, columns=features) 141 | y_predict=clf.predict(X_new) 142 | print(y_predict) 143 | 144 | if __name__=="__main__": 145 | main() -------------------------------------------------------------------------------- /codes/MyEM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | 6 | import numpy as np 7 | import random 8 | import math 9 | 10 | # 通过服从高斯分布的随机函数来伪造数据集 11 | # mean0: 高斯0的均值、 12 | # sigma0: 高斯0的方差 13 | # alpha0: 高斯0的系数 14 | 15 | # mean1: 高斯1的均值 16 | # sigma1: 高斯1的方差 17 | # alpha1: 高斯1的系数 18 | # 混合了两个高斯分布的数据 19 | 20 | def processData(mean0, sigma0, mean1, sigma1, alpha0, alpha1): 21 | #定义数据集长度为1000 22 | length = 1000 23 | 24 | #初始化高斯分布，数据长度为length * alpha 25 | data0 = np.random.normal(mean0, sigma0, int(length * alpha0)) 26 | data1 = np.random.normal(mean1, sigma1, int(length * alpha1)) 27 | 28 | trainData = [] 29 | trainData.extend(data0) 30 | trainData.extend(data1) 31 | 32 | #对总的数据集进行打乱 33 | random.shuffle(trainData) 34 | return trainData 35 | 36 | # 根据高斯密度函数计算值 37 | # 返回整个可观测数据集的高斯分布密度（向量形式） 38 | def calculateGauss(trainDataArr, mean, sigmod): 39 | result = (1 / (math.sqrt(2 * math.pi) * sigmod**2)) * np.exp(-1 * (trainDataArr - mean) * (trainDataArr - mean) / (2 * sigmod**2)) 40 | return result 41 | 42 | 43 | def E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1): 44 | gamma0 = alpha0 * calculateGauss(trainDataArr, mean0, sigmod0) 45 | gamma1 = alpha1 * calculateGauss(trainDataArr, mean1, sigmod1) 46 | 47 | sum = gamma0 + gamma1 48 | gamma0 = gamma0 / sum 49 | gamma1 = gamma1 / sum 50 | return gamma0, gamma1 51 | 52 | def M(meano, mean1, gamma0, gamma1, trainDataArr): 53 | mean0_new = np.dot(gamma0, trainDataArr) / np.sum(gamma0) 54 | mean1_new = np.dot(gamma1, trainDataArr) / np.sum(gamma1) 55 | 56 | sigmod0_new = math.sqrt(np.dot(gamma0, (trainDataArr - meano)**2) / np.sum(gamma0)) 57 | sigmod1_new = math.sqrt(np.dot(gamma1, (trainDataArr - mean1)**2) / np.sum(gamma1)) 58 | 59 | alpha0_new = np.sum(gamma0) / len(gamma0) 60 | alpha1_new = np.sum(gamma1) / len(gamma1) 61 | 62 | return mean0_new, mean1_new, sigmod0_new, sigmod1_new, alpha0_new, alpha1_new 63 | 64 | 65 | def EM(trainDataList, iter = 500): 66 | trainDataArr = np.array(trainDataList) 67 | 68 | alpha0 = 0.5 69 | mean0 = 0 70 | sigmod0 = 1 71 | alpha1 = 0.5 72 | mean1 = 1 73 | sigmod1 = 1 74 | 75 | count = 0 76 | while (count < iter): 77 | count = count+1 78 | # E步 79 | gamma0, gamma1 = E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1) 80 | # M步 81 | mean0, mean1, sigmod0, sigmod1, alpha0, alpha1 = M(mean0, mean1, gamma0, gamma1, trainDataArr) 82 | return alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 83 | 84 | if __name__ == '__main__': 85 | alpha0 = 0.1 86 | mean0 = -4.0 87 | sigmod0 = 0.6 88 | 89 | alpha1 = 0.9 90 | mean1 = 2.2 91 | sigmod1 = 0.1 92 | 93 | #初始化数据集 94 | trainDataList = processData(mean0, sigmod0, mean1, sigmod1, alpha0, alpha1) 95 | 96 | #开始EM算法，进行参数估计 97 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 = EM(trainDataList) 98 | 99 | print('用EM计算之后的数据为:') 100 | print('alpha0:%.1f, mean0:%.1f, sigmod0:%.1f, alpha1:%.1f, mean1:%.1f, sigmod1:%.1f' % ( 101 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 102 | )) 103 | 104 | 105 | -------------------------------------------------------------------------------- /codes/MyHMM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | 7 | # 依据训练文本统计PI、A、B 8 | def trainHMM(fileName): 9 | # B：词语的开头 10 | # M：一个词语的中间词 11 | # E：一个词语的结果 12 | # S：非词语，单个词 13 | statuDict = {'B':0, 'M':1, 'E':2, 'S':3} 14 | 15 | # 每个字只有四种状态，所以下方的各类初始化中大小的参数均为4 16 | PI = np.zeros(4) 17 | # 初始化状态转移矩阵A，涉及到四种状态各自到四种状态的转移，因为大小为4x4 18 | A = np.zeros((4, 4)) 19 | # 初始化观测概率矩阵，分别为四种状态到每个字的发射概率 20 | B = np.zeros((4, 65536)) 21 | fr = open(fileName, encoding='utf-8') 22 | 23 | for line in fr.readlines(): 24 | curLine = line.strip().split() 25 | wordLabel = [] 26 | #对每一个单词进行遍历 27 | for i in range(len(curLine)): 28 | #如果长度为1，则直接将该字标记为S，即单个词 29 | if len(curLine[i]) == 1: 30 | label = 'S' 31 | else: 32 | label = 'B' + 'M' * (len(curLine[i]) - 2) + 'E' 33 | #如果是单行开头第一个字，PI中对应位置加1, 34 | if i == 0: PI[statuDict[label[0]]] += 1 35 | for j in range(len(label)): 36 | B[statuDict[label[j]]][ord(curLine[i][j])] += 1 37 | wordLabel.extend(label) 38 | for i in range(1, len(wordLabel)): 39 | A[statuDict[wordLabel[i - 1]]][statuDict[wordLabel[i]]] += 1 40 | 41 | sum = np.sum(PI) 42 | 43 | for i in range(len(PI)): 44 | if PI[i] == 0: PI[i] = -3.14e+100 45 | else: PI[i] = np.log(PI[i] / sum) 46 | 47 | for i in range(len(A)): 48 | sum = np.sum(A[i]) 49 | for j in range(len(A[i])): 50 | if A[i][j] == 0: A[i][j] = -3.14e+100 51 | else: A[i][j] = np.log(A[i][j] / sum) 52 | 53 | for i in range(len(B)): 54 | sum = np.sum(len(B[i])) 55 | for j in range(len(B[i])): 56 | if B[i][j] == 0: B[i][j] = -3.14e+100 57 | else:B[i][j] = np.log(B[i][j] / sum) 58 | 59 | return PI, A, B 60 | 61 | def processTrainData(fileName): 62 | textData = [] 63 | fr = open(fileName, encoding='utf-8') 64 | for line in fr.readlines(): 65 | #读到的每行最后都有一个\n，使用strip将最后的回车符去掉 66 | line = line.strip() 67 | textData.append(line) 68 | 69 | return textData 70 | 71 | def participleTestData(textData, PI, A, B): 72 | retArtical = [] 73 | for line in textData: 74 | delta = [[0 for i in range(4)] for i in range(len(line))] 75 | for i in range(4): 76 | delta[0][i] = PI[i] + B[i][ord(line[0])] 77 | psi = [[0 for i in range(4)] for i in range(len(line))] 78 | 79 | for t in range(1, len(line)): 80 | for i in range(4): 81 | tmpDelta = [0] * 4 82 | for j in range(4): 83 | tmpDelta[j] = delta[t - 1][j] + A[j][i] 84 | maxDelta = max(tmpDelta) 85 | maxDeltaIndex = tmpDelta.index(maxDelta) 86 | delta[t][i] = maxDelta + B[i][ord(line[t])] 87 | psi[t][i] = maxDeltaIndex 88 | 89 | sequence = [] 90 | i_opt = delta[len(line) - 1].index(max(delta[len(line) - 1])) 91 | sequence.append(i_opt) 92 | 93 | for t in range(len(line) - 1, 0, -1): 94 | i_opt = psi[t][i_opt] 95 | sequence.append(i_opt) 96 | 97 | sequence.reverse() 98 | curLine = '' 99 | for i in range(len(line)): 100 | curLine += line[i] 101 | if (sequence[i] == 3 or sequence[i] == 2) and i != (len(line) - 1): 102 | curLine += '|' 103 | retArtical.append(curLine) 104 | return retArtical 105 | 106 | if __name__ == '__main__': 107 | 108 | # 依据人民日报数据集计算HMM参数：PI、A、B 109 | PI, A, B = trainHMM('MyHMMTrainData.txt') 110 | 111 | # 读取测试文章 112 | textData = processTrainData('MyHMMTestData.txt') 113 | 114 | # 打印原文 115 | for line in textData: 116 | print(line) 117 | 118 | # 分词 119 | partiArtical = participleTestData(textData, PI, A, B) 120 | 121 | # 打印结果 122 | print('分词结果：') 123 | for line in partiArtical: 124 | print(line) 125 | -------------------------------------------------------------------------------- /codes/MyHMMTestData.txt: -------------------------------------------------------------------------------- 1 | 我本科就读于北京交通大学软件学院，专业是软件工程，本科做的是开发工作，工程性质较为浓厚。2019年，我保研至北京航空航天大学，我的个性不适合做纯理论研究，因此希望研究生毕业以后从事算法工程师的岗位，研究生期间我需要认真学习算法相关知识，但同时也不能落下工程实现能力，仍然需要较强的开发能力与项目落地能力，尤其是基础算法与数据结构，需要日常进行刷题比如：leetcode。 -------------------------------------------------------------------------------- /codes/MyKNN.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | from collections import Counter 8 | from concurrent import futures 9 | import heapq 10 | 11 | class KNN: 12 | def __init__(self,X_train,y_train,k=3): 13 | # 所需参数初始化 14 | self.k=k 15 | self.X_train=X_train 16 | self.y_train=y_train 17 | 18 | def predict_single(self,X_test): 19 | # 计算与前k个样本点欧氏距离，距离取负值是把原问题转化为取前k个最大的距离 20 | dist_list=[(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i) 21 | for i in range(self.k)] 22 | 23 | # 利用前k个距离构建堆 24 | heapq.heapify(dist_list) 25 | 26 | # 遍历计算与剩下样本点的欧式距离 27 | for i in range(self.k,self.X_train.shape[0]): 28 | dist_i=(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i) 29 | #进行下堆操作 30 | if dist_i[0]>dist_list[0][0]: 31 | heapq.heappushpop(dist_list,dist_i) 32 | # 若dist_i 比 dis_list的最小值小，堆保持不变，继续遍历 33 | else: 34 | continue 35 | y_list=[dist_list[i][1] for i in range(self.k)] 36 | #[-1,1,1,-1...] 37 | # 对上述k个点的分类进行统计 38 | y_count=Counter(y_list).most_common() 39 | #{1:n,-1:m} 40 | return y_count[0][0] 41 | 42 | # 用多线程提高效率 43 | def predict_many(self,X_test): 44 | # 导入多线程 45 | with futures.ProcessPoolExecutor(max_workers=10) as executor: 46 | # 建立多线程任务 47 | tasks=[executor.submit(self.predict_single,X_test[i]) for i in range(X_test.shape[0])] 48 | # 驱动多线程运行 49 | done_iter=futures.as_completed(tasks) 50 | # 提取结果 51 | res=[future.result() for future in done_iter] 52 | return res 53 | 54 | def cal_right_rate(self,res,y_test): 55 | right_count = 0 56 | wrong_count = 0 57 | for i in range(len(res)): 58 | if res[i] == y_test[i]: 59 | right_count += 1 60 | else: 61 | wrong_count += 1 62 | return right_count / (right_count+wrong_count) 63 | 64 | def processData(filePath): 65 | print('开始读取数据') 66 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 67 | X = [] 68 | y = [] 69 | #默认读取csv的头部 70 | df = pd.read_csv(filePath) 71 | #利用数据集合的第一个维度特征分类 72 | #遍历pandas中df的每一行 73 | for index, row in df.iterrows(): 74 | if(row["Sepal.Length"]>=5.5) : 75 | y.append(1) 76 | else: 77 | y.append(-1) 78 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 79 | return np.array(X), np.array(y) 80 | 81 | 82 | def main(): 83 | #获取数据 84 | X, y = processData('iris.csv') 85 | X_train = X[0:149:4] 86 | y_train = y[0:149:4] 87 | 88 | X_test = X[0:149:10] 89 | y_test = y[0:149:10] 90 | 91 | # 不同的k对分类结果的影响 92 | for k in range(1,6,2): 93 | #构建KNN实例 94 | clf=KNN(X_train,y_train,k=k) 95 | #对测试数据进行分类预测 96 | y_predict=clf.predict_many(X_test) 97 | print("k={},被分类为：{}".format(k,y_predict)) 98 | print("正确率为: ", clf.cal_right_rate(y_predict,y_test)) 99 | 100 | if __name__=="__main__": 101 | main() -------------------------------------------------------------------------------- /codes/MyLogisticRegression.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import time 7 | import pandas as pd 8 | 9 | #使用随机梯度下降 10 | class LogisticRegression: 11 | def __init__(self,learn_rate=0.1,max_iter=10000,tol=1e-3): 12 | # 学习速率 13 | self.learn_rate=learn_rate 14 | # 迭代次数 15 | self.max_iter=max_iter 16 | # 迭代停止阈值 17 | self.tol=tol 18 | # 权重 19 | self.w=None 20 | 21 | def preprocessing(self,X): 22 | row=X.shape[0] 23 | #在末尾加上一列，数值为1 24 | y=np.ones(row).reshape(row, 1) 25 | X_prepro =np.hstack((X,y)) 26 | return X_prepro 27 | 28 | def sigmod(self,x): 29 | return 1/(1+np.exp(-x)) 30 | 31 | def train(self,X_train,y_train): 32 | X=self.preprocessing(X_train) 33 | y=y_train.T 34 | #初始化权重w 35 | self.w=np.array([[0]*X.shape[1]],dtype=np.float) 36 | i=0 37 | k=0 38 | for loop in range(self.max_iter): 39 | # 计算梯度 40 | z=np.dot(X[i],self.w.T) 41 | grad=X[i]*(y[i]-self.sigmod(z)) 42 | # 利用梯度的绝对值作为迭代中止的条件 43 | if (np.abs(grad)<=self.tol).all(): 44 | break 45 | else: 46 | # 更新权重w 梯度上升——求极大值 47 | self.w+=self.learn_rate*grad 48 | k+=1 49 | i=(i+1)%X.shape[0] 50 | print("迭代次数：{}次".format(k)) 51 | print("最终梯度：{}".format(grad)) 52 | print("最终权重：{}".format(self.w[0])) 53 | 54 | def predict(self,x): 55 | p=self.sigmod(np.dot(self.preprocessing(x),self.w.T)) 56 | print("Y=1的概率被估计为：{:.2%}".format(p[0][0])) 57 | p[np.where(p>0.5)]=1 58 | p[np.where(p<0.5)]=0 59 | return p 60 | 61 | def cal_right_rate(self,X,y): 62 | y_c=self.predict(X) 63 | right_count = 0 64 | wrong_count = 0 65 | for i in range(len(y)): 66 | if y_c[i] == y[i]: 67 | right_count += 1 68 | else: 69 | wrong_count += 1 70 | return right_count / (right_count + wrong_count) 71 | error_rate=np.sum(np.abs(y_c-y.T))/y_c.shape[0] 72 | # return 1-error_rate 73 | 74 | #根据文件路径读取Iris数据集数据 75 | #return type: np.array 76 | def processData(filePath): 77 | print('开始读取数据') 78 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 79 | X = [] 80 | y = [] 81 | #默认读取csv的头部 82 | df = pd.read_csv(filePath) 83 | #利用数据集合的第一个维度特征分类 84 | #遍历pandas中df的每一行 85 | for index, row in df.iterrows(): 86 | if row["Species"] == "setosa" : 87 | y.append(1) 88 | else: 89 | y.append(0) 90 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])]) 91 | return np.array(X), np.array(y) 92 | 93 | def main(): 94 | star=time.time() 95 | # 训练数据集 96 | X, y = processData('iris.csv') 97 | X_train = X[0:149:30] 98 | y_train = y[0:149:30] 99 | 100 | #自己在数据集后面加上了干扰的实例 101 | X_test = X[0:151:1] 102 | y_test = y[0:151:1] 103 | 104 | # 构建实例，进行训练 105 | clf=LogisticRegression() 106 | clf.train(X_train,y_train) 107 | 108 | # 预测新数据 109 | y_predict=clf.predict(X_test) 110 | print("{}被分类为：{}".format(X_test[0],y_predict[0])) 111 | 112 | # 利用已有数据对训练模型进行评价 113 | correct_rate=clf.cal_right_rate(X_test,y_test) 114 | print("测试一共有{}组实例，正确率：{:.5%}".format(X_test.shape[0],correct_rate)) 115 | end=time.time() 116 | print("用时：{:.5f}s".format(end-star)) 117 | 118 | if __name__=="__main__": 119 | main() -------------------------------------------------------------------------------- /codes/MyMaxEnt.py: -------------------------------------------------------------------------------- 1 | import time 2 | import numpy as np 3 | import pandas as pd 4 | from collections import defaultdict 5 | 6 | 7 | #根据文件路径读取Iris数据集数据 8 | #return type: list 9 | def processData(filePath): 10 | print('开始读取数据') 11 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 12 | X = [] 13 | y = [] 14 | #默认读取csv的头部 15 | df = pd.read_csv(filePath) 16 | #利用数据集合的第一个维度特征分类 17 | #遍历pandas中df的每一行 18 | for index, row in df.iterrows(): 19 | if(row["Sepal.Length"]>=5.5) : 20 | y.append(1) 21 | else: 22 | y.append(0) 23 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 24 | return X, y 25 | 26 | #最大熵类 27 | class maxEnt: 28 | def __init__(self, trainDataList, trainLabelList, testDataList, testLabelList): 29 | 30 | # 训练数据集 31 | self.trainDataList = trainDataList 32 | # 训练标签集 33 | self.trainLabelList = trainLabelList 34 | # 测试数据集 35 | self.testDataList = testDataList 36 | # 测试标签集 37 | self.testLabelList = testLabelList 38 | # 特征数量 39 | self.featureNum = len(trainDataList[0]) 40 | # 总训练集长度 41 | self.N = len(trainDataList) 42 | # 训练集中（xi，y）对数量 43 | self.n = 0 44 | # 训练集中（xi，y）对数量 45 | self.M = 10000 46 | # 所有(x, y)对出现的次数 47 | self.fixy = self.calc_fixy() 48 | # Pw(y|x)中的w 49 | self.w = [0] * self.n 50 | # (x, y)->id和id->(x, y)的搜索字典 51 | self.xy2idDict, self.id2xyDict = self.createSearchDict() 52 | # Ep_xy期望值 53 | self.Ep_xy = self.calcEp_xy() 54 | 55 | 56 | # 计算特征函数f(x, y) 57 | def calcEpxy(self): 58 | # 初始化期望存放列表，对于每一个xy对都有一个期望 59 | Epxy = [0] * self.n 60 | # 对于每一个样本进行遍历 61 | for i in range(self.N): 62 | # 初始化公式中的P(y|x)列表 63 | Pwxy = [0] * 2 64 | # 计算P(y = 0 } X) 65 | # 注：程序中X表示是一个样本的全部特征，x表示单个特征，这里是全部特征的一个样本 66 | Pwxy[0] = self.calcPwy_x(self.trainDataList[i], 0) 67 | # 计算P(y = 1 } X) 68 | Pwxy[1] = self.calcPwy_x(self.trainDataList[i], 1) 69 | 70 | for feature in range(self.featureNum): 71 | for y in range(2): 72 | if (self.trainDataList[i][feature], y) in self.fixy[feature]: 73 | id = self.xy2idDict[feature][(self.trainDataList[i][feature], y)] 74 | Epxy[id] += (1 / self.N) * Pwxy[y] 75 | return Epxy 76 | 77 | # 计算特征函数f(x, y) 78 | # :return: 计算得到的Ep_xy 79 | def calcEp_xy(self): 80 | 81 | # 初始化Ep_xy列表，长度为n 82 | Ep_xy = [0] * self.n 83 | 84 | # 遍历每一个特征 85 | for feature in range(self.featureNum): 86 | # 遍历每个特征中的(x, y)对 87 | for (x, y) in self.fixy[feature]: 88 | # 获得其id 89 | id = self.xy2idDict[feature][(x, y)] 90 | # 将计算得到的Ep_xy写入对应的位置中 91 | # fixy中存放所有对在训练集中出现过的次数，处于训练集总长度N就是概率了 92 | Ep_xy[id] = self.fixy[feature][(x, y)] / self.N 93 | 94 | # 返回期望 95 | return Ep_xy 96 | 97 | 98 | # 创建查询字典 99 | # xy2idDict：通过(x, y)对找到其id, 所有出现过的xy对都有一个id 100 | # id2xyDict：通过id找到对应的(x, y)对 101 | def createSearchDict(self): 102 | # 设置xy搜多id字典 103 | # 不同特征的xy存入不同特征内的字典 104 | xy2idDict = [{} for i in range(self.featureNum)] 105 | # 初始化id到xy对的字典。因为id与(x，y)的指向是唯一的，所以可以使用一个字典 106 | id2xyDict = {} 107 | 108 | # 设置缩影，其实就是最后的id 109 | index = 0 110 | # 对特征进行遍历 111 | for feature in range(self.featureNum): 112 | # 对出现过的每一个(x, y)对进行遍历 113 | # fixy：内部存放特征数目个字典，对于遍历的每一个特征，单独读取对应字典内的(x, y)对 114 | for (x, y) in self.fixy[feature]: 115 | # 将该(x, y)对存入字典中，要注意存入时通过[feature]指定了存入哪个特征内部的字典 116 | # 同时将index作为该对的id号 117 | xy2idDict[feature][(x, y)] = index 118 | # 同时在id->xy字典中写入id号，val为(x, y)对 119 | id2xyDict[index] = (x, y) 120 | # id加一 121 | index += 1 122 | 123 | # 返回创建的两个字典 124 | return xy2idDict, id2xyDict 125 | 126 | # 计算(x, y)在训练集中出现过的次数 127 | def calc_fixy(self): 128 | # 建立特征数目个字典，属于不同特征的(x, y)对存入不同的字典中，保证不被混淆 129 | fixyDict = [defaultdict(int) for i in range(self.featureNum)] 130 | # 遍历训练集中所有样本 131 | for i in range(len(self.trainDataList)): 132 | # 遍历样本中所有特征 133 | for j in range(self.featureNum): 134 | # 将出现过的(x, y)对放入字典中并计数值加1 135 | fixyDict[j][(self.trainDataList[i][j], self.trainLabelList[i])] += 1 136 | # 对整个大字典进行计数，判断去重后还有多少(x, y)对，写入n 137 | for i in fixyDict: 138 | self.n += len(i) 139 | # 返回大字典 140 | return fixyDict 141 | 142 | # 计算得到的Pw(Y | X) 143 | def calcPwy_x(self, X, y): 144 | # 分子 145 | numerator = 0 146 | # 分母 147 | Z = 0 148 | # 对每个特征进行遍历 149 | for i in range(self.featureNum): 150 | # 如果该(xi,y)对在训练集中出现过 151 | if (X[i], y) in self.xy2idDict[i]: 152 | # 在xy->id字典中指定当前特征i，以及(x, y)对：(X[i], y)，读取其id 153 | index = self.xy2idDict[i][(X[i], y)] 154 | # 分子是wi和fi(x，y)的连乘再求和，最后指数 155 | # 由于当(x, y)存在时fi(x，y)为1，因为xy对肯定存在，所以直接就是1 156 | # 对于分子来说，就是n个wi累加，最后再指数就可以了 157 | # 因为有n个w，所以通过id将w与xy绑定，前文的两个搜索字典中的id就是用在这里 158 | numerator += self.w[index] 159 | # 同时计算其他一种标签y时候的分子，下面的z并不是全部的分母，再加上上式的分子以后 160 | # 才是完整的分母，即z = z + numerator 161 | if (X[i], 1 - y) in self.xy2idDict[i]: 162 | # 原理与上式相同 163 | index = self.xy2idDict[i][(X[i], 1 - y)] 164 | Z += self.w[index] 165 | # 计算分子的指数 166 | numerator = np.exp(numerator) 167 | # 计算分母的z 168 | Z = np.exp(Z) + numerator 169 | # 返回Pw(y|x) 170 | return numerator / Z 171 | 172 | def maxEntropyTrain(self, iter=500): 173 | # 设置迭代次数寻找最优解 174 | for i in range(iter): 175 | # 单次迭代起始时间点 176 | iterStart = time.time() 177 | 178 | # 计算“6.2.3 最大熵模型的学习”中的第二个期望（83页最上方哪个） 179 | Epxy = self.calcEpxy() 180 | 181 | # 使用的是IIS，所以设置sigma列表 182 | sigmaList = [0] * self.n 183 | # 对于所有的n进行一次遍历 184 | for j in range(self.n): 185 | # 依据“6.3.1 改进的迭代尺度法” 式6.34计算 186 | sigmaList[j] = (1 / self.M) * np.log(self.Ep_xy[j] / Epxy[j]) 187 | 188 | # 按照算法6.1步骤二中的（b）更新w 189 | self.w = [self.w[i] + sigmaList[i] for i in range(self.n)] 190 | 191 | # 单次迭代结束 192 | iterEnd = time.time() 193 | 194 | # 预测标签 195 | def predict(self, X): 196 | # 因为y只有0和1，所有建立两个长度的概率列表 197 | result = [0] * 2 198 | # 循环计算两个概率 199 | for i in range(2): 200 | # 计算样本x的标签为i的概率 201 | result[i] = self.calcPwy_x(X, i) 202 | # 返回标签 203 | # max(result)：找到result中最大的那个概率值 204 | # result.index(max(result))：通过最大的那个概率值再找到其索引，索引是0就返回0，1就返回1 205 | return result.index(max(result)) 206 | 207 | def test(self): 208 | # 错误值计数 209 | errorCnt = 0 210 | # 对测试集中所有样本进行遍历 211 | for i in range(len(self.testDataList)): 212 | # 预测该样本对应的标签 213 | result = self.predict(self.testDataList[i]) 214 | # 如果错误，计数值加1 215 | if result != self.testLabelList[i]: errorCnt += 1 216 | # 返回准确率 217 | return 1 - errorCnt / len(self.testDataList) 218 | 219 | 220 | if __name__ == '__main__': 221 | start = time.time() 222 | X, y = processData('iris.csv') 223 | 224 | X_train = X[0:149:30] 225 | y_train = y[0:149:30] 226 | 227 | # 自己在数据集后面加上了干扰的实例 228 | X_test = X[0:151:1] 229 | y_test = y[0:151:1] 230 | 231 | # 初始化最大熵类 232 | maxEnt = maxEnt(X_train, y_train, X_test, y_test) 233 | 234 | # 开始训练 235 | maxEnt.maxEntropyTrain() 236 | 237 | # 开始测试 238 | right_rate = maxEnt.test() 239 | print('准确度为:', right_rate) 240 | 241 | # 打印时间 242 | print('花费的时间为:', time.time() - start) 243 | -------------------------------------------------------------------------------- /codes/MyNaiveBayes.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | class NaiveBayes(): 9 | def __init__(self,lambda_): 10 | # 贝叶斯系数取0时，即为极大似然估计 11 | self.lambda_=lambda_ 12 | # y的（类型：数量） 13 | self.y_types_count=None 14 | # y的（类型：概率） 15 | self.y_types_proba=None 16 | # （xi 的编号,xi的取值，y的类型）：概率 17 | self.x_types_proba=dict() 18 | 19 | def fit(self,X_train,y_train): 20 | # y的所有取值类型 21 | self.y_types=np.unique(y_train) 22 | # 转化成pandas df 数据格式 23 | X=pd.DataFrame(X_train) 24 | y=pd.DataFrame(y_train) 25 | # y的（类型：数量）统计 26 | self.y_types_count=y[0].value_counts() 27 | # y的（类型：概率）计算 28 | self.y_types_proba=(self.y_types_count+self.lambda_)/(y.shape[0]+len(self.y_types)*self.lambda_) 29 | 30 | # （xi 的编号,xi的取值，y的类型）：概率的计算 - 遍历xi 31 | for idx in X.columns: 32 | # 选取每一个y的类型 33 | for j in self.y_types: 34 | # 选择所有y==j为真的数据点的第idx个特征的值，并对这些值进行（类型：数量）统计 35 | p_x_y=X[(y==j).values][idx].value_counts() 36 | # 计算（xi 的编号,xi的取值，y的类型）：概率 37 | for i in p_x_y.index: 38 | self.x_types_proba[(idx,i,j)]=(p_x_y[i]+self.lambda_)/(self.y_types_count[j]+p_x_y.shape[0]*self.lambda_) 39 | 40 | def predict(self,X_new): 41 | res=[] 42 | # 遍历y的可能取值 43 | for y in self.y_types: 44 | # 计算y的先验概率P(Y=ck) 45 | p_y=self.y_types_proba[y] 46 | p_xy=1 47 | for idx,x in enumerate(X_new): 48 | # 计算P(X=(x1,x2...xd)/Y=ck) 49 | p_xy*=self.x_types_proba[(idx,x,y)] 50 | res.append(p_y*p_xy) 51 | for i in range(len(self.y_types)): 52 | print("[{}]对应概率：{:.2%}".format(self.y_types[i],res[i])) 53 | #返回最大后验概率对应的y值 54 | return self.y_types[np.argmax(res)] 55 | 56 | def processData(filePath): 57 | print('开始读取数据') 58 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 59 | X = [] 60 | y = [] 61 | #默认读取csv的头部 62 | df = pd.read_csv(filePath) 63 | #利用数据集合的第一个维度特征分类 64 | #遍历pandas中df的每一行 65 | for index, row in df.iterrows(): 66 | if(row["Sepal.Length"]>=5.5) : 67 | y.append(1) 68 | else: 69 | y.append(-1) 70 | X.append([float(row["Sepal.Width"]),str(row["Species"])]) 71 | return np.array(X), np.array(y) 72 | 73 | def main(): 74 | X, y = processData('iris.csv') 75 | X_train = X[0:149:4] 76 | y_train = y[0:149:4] 77 | 78 | 79 | clf=NaiveBayes(lambda_= 0.5) 80 | clf.fit(X_train,y_train) 81 | 82 | X_test=np.array([3.5,"setosa"]) 83 | y_predict=clf.predict(X_test) 84 | print("{}被分类为:{}".format(X_test,y_predict)) 85 | 86 | if __name__=="__main__": 87 | main() -------------------------------------------------------------------------------- /codes/MyPerceptron.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | #根据文件路径读取Iris数据集数据 9 | #return type: list 10 | def processData(filePath): 11 | print('开始读取数据') 12 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 13 | X = [] 14 | y = [] 15 | #默认读取csv的头部 16 | df = pd.read_csv(filePath) 17 | #利用数据集合的第一个维度特征分类 18 | #遍历pandas中df的每一行 19 | for index, row in df.iterrows(): 20 | if(row["Sepal.Length"]>=5.5) : 21 | y.append(1) 22 | else: 23 | y.append(-1) 24 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 25 | return X, y 26 | 27 | 28 | #感知机类 29 | class MyPerceptron: 30 | def __init__(self): 31 | # 参数w 32 | self.w = None 33 | # 偏置b 34 | self.b = 0 35 | # 表示学习速率 36 | self.l_rate = 0.0001 37 | #表示迭代次数 38 | self.iter = 100 39 | 40 | #训练 41 | def train(self, X_train, y_train): 42 | print('开始训练') 43 | # 将数据转换成矩阵形式 44 | # 转换后的数据中每一个样本的向量都是横向的 45 | X_trainMat = np.mat(X_train) 46 | y_trainMat = np.mat(y_train).T 47 | # 获取数据矩阵的大小，为m*n 48 | m, n = np.shape(X_trainMat) 49 | #np.shape(X_trainMat)[1]表示的维度=样本的长度 50 | self.w = np.zeros((1, np.shape(X_trainMat)[1])) 51 | 52 | # 进行iter次迭代计算 53 | for k in range(self.iter): 54 | ##利用随机梯度下降 55 | for i in range(m): 56 | # 获取当前样本的向量 57 | xi = X_trainMat[i] 58 | # 获取当前样本所对应的标签 59 | yi = y_trainMat[i] 60 | # 判断是否是误分类样本 61 | # 误分类样本特诊为： -yi(w*xi+b)>=0，详细可参考书中2.2.2小节 62 | # 在书的公式中写的是>0，实际上如果=0，说明改点在超平面上，也是不正确的 63 | if -1 * yi * (self.w * xi.T + self.b) >= 0: 64 | # 对于误分类样本，进行梯度下降，更新w和b 65 | self.w = self.w + self.l_rate * yi * xi 66 | self.b = self.b + self.l_rate * yi 67 | 68 | #测试 69 | def predict(self,X_test, y_test): 70 | print('开始预测') 71 | X_testMat = np.mat(X_test) 72 | y_testMat = np.mat(y_test).T 73 | 74 | #获取测试数据集矩阵的大小 75 | m, n = np.shape(X_testMat) 76 | #错误样本数计数 77 | rightCount = 0 78 | 79 | for i in range(m): 80 | #获得单个样本向量 81 | xi = X_testMat[i] 82 | #获得该样本标记 83 | yi = y_testMat[i] 84 | #获得运算结果 85 | result = yi * (self.w * xi.T + self.b) 86 | #如果-yi(w*xi+b)>=0，说明该样本被误分类，错误样本数加一 87 | if result >= 0: rightCount += 1 88 | #正确率 = 1 - （样本分类错误数 / 样本总数） 89 | rightRate = rightCount / m 90 | #返回正确率 91 | return rightRate 92 | 93 | 94 | def main(): 95 | X,y = processData('iris.csv') 96 | 97 | # 构建感知机对象，对数据集训练并且预测 98 | perceptron=MyPerceptron() 99 | perceptron.train(X[0:100],y[0:100]) 100 | rightRate = perceptron.predict(X[101:140],y[101:140]) 101 | print('对测试集的分类的正确率为：',rightRate) 102 | #有二维输入，所以应该有2个w 103 | print('模型的参数w为：',perceptron.w) 104 | print('模型的参数b为',perceptron.b) 105 | 106 | 107 | if __name__ == '__main__': 108 | main() -------------------------------------------------------------------------------- /codes/MySVM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | import math 8 | import random 9 | 10 | #根据文件路径读取Iris数据集数据 11 | # #return type: np.array 12 | def processData(filePath): 13 | # 存放数据集的list，X表示的是输入可能有很多维度，y表示输出的分类只有一个 14 | X = [] 15 | y = [] 16 | #默认读取csv的头部 17 | df = pd.read_csv(filePath) 18 | #利用数据集合的第一个维度特征分类 19 | #遍历pandas中df的每一行 20 | for index, row in df.iterrows(): 21 | if row["Species"] == "setosa" : 22 | y.append(1) 23 | else: 24 | y.append(-1) 25 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])]) 26 | return np.array(X), np.array(y) 27 | 28 | 29 | # X_train:训练数据集 30 | # y_train: 训练测试集 31 | # sigma: 高斯核中分母的σ,在核函数中σ的值,高度依赖样本特征值范围，特征值范围较大时若不相应增大σ会导致所有计算得到的核函数均为0 32 | # C:软间隔中的惩罚参数,调和间隔与误分类点的系数 33 | # toler:松弛变量 34 | class SVM: 35 | def __init__(self, X_train, y_train, sigma = 10, C = 200, toler = 0.001): 36 | 37 | self.train_XMat = np.mat(X_train) 38 | # 训练标签集，为了方便后续运算提前做了转置，变为列向量 39 | self.train_yMat = np.mat(y_train).T 40 | # m：训练集数量 n：样本特征数目 41 | self.m, self.n = np.shape(self.train_XMat) 42 | self.sigma = sigma 43 | self.C = C 44 | self.toler = toler 45 | 46 | # 核函数（初始化时提前计算） 47 | self.k = self.calculateKernel() 48 | # SVM中的偏置b 49 | self.b = 0 50 | # α 长度为训练集数目 51 | self.alpha = [0] * self.train_XMat.shape[0] 52 | # SMO运算过程中的Ei 53 | self.E = [0 * self.train_yMat[i, 0] for i in range(self.train_yMat.shape[0])] 54 | self.supportVecIndex = [] 55 | 56 | 57 | # 使用高斯核函数 58 | def calculateKernel(self): 59 | #初始化高斯核结果矩阵大小 = 训练集长度m * 训练集长度m 60 | #k[i][j] = Xi * Xj 61 | k = [[0 for i in range(self.m)] for j in range(self.m)] 62 | for i in range(self.m): 63 | X = self.train_XMat[i, :] 64 | for j in range(i, self.m): 65 | Z = self.train_XMat[j, :] 66 | #先计算||X - Z||^2 67 | result = (X - Z) * (X - Z).T 68 | #分子除以分母后去指数，得到高斯核结果 69 | result = np.exp(-1 * result / (2 * self.sigma**2)) 70 | #将Xi*Xj的结果存放入k[i][j]和k[j][i]中 71 | k[i][j] = result 72 | k[j][i] = result 73 | return k 74 | 75 | # 查看第i个α是否满足KKT条件 76 | def isSatisfyKKT(self, i): 77 | gxi =self.calculate_gxi(i) 78 | yi = self.train_yMat[i] 79 | if (math.fabs(self.alpha[i]) < self.toler) and (yi * gxi >= 1): 80 | return True 81 | elif (math.fabs(self.alpha[i] - self.C) < self.toler) and (yi * gxi <= 1): 82 | return True 83 | elif (self.alpha[i] > -self.toler) and (self.alpha[i] < (self.C + self.toler)) \ 84 | and (math.fabs(yi * gxi - 1) < self.toler): 85 | return True 86 | 87 | return False 88 | 89 | def calculate_gxi(self, i): 90 | gxi = 0 91 | index = [i for i, alpha in enumerate(self.alpha) if alpha != 0] 92 | # 遍历每一个非零α，i为非零α的下标 93 | for j in index: 94 | #计算g(xi) 95 | gxi += self.alpha[j] * self.train_yMat[j] * self.k[j][i] 96 | # 求和结束后再单独加上偏置b 97 | gxi += self.b 98 | 99 | #返回 100 | return gxi 101 | 102 | def calculateEi(self, i): 103 | # 计算g(xi) 104 | gxi = self.calculate_gxi(i) 105 | # Ei = g(xi) - yi,直接将结果作为Ei返回 106 | return gxi - self.train_yMat[i] 107 | 108 | 109 | # E1: 第一个变量的E1 110 | # i: 第一个变量α的下标 111 | def getAlphaJ(self, E1, i): 112 | E2 = 0 113 | maxE1_E2 = -1 114 | maxIndex = -1 115 | nozeroE = [i for i, Ei in enumerate(self.E) if Ei != 0] 116 | 117 | for j in nozeroE: 118 | E2_tmp = self.calculateEi(j) 119 | if math.fabs(E1 - E2_tmp) > maxE1_E2: 120 | #更新 121 | maxE1_E2 = math.fabs(E1 - E2_tmp) 122 | E2 = E2_tmp 123 | maxIndex = j 124 | if maxIndex == -1: 125 | maxIndex = i 126 | while maxIndex == i: 127 | maxIndex = int(random.uniform(0, self.m)) 128 | E2 = self.calculateEi(maxIndex) 129 | return E2, maxIndex 130 | 131 | def train(self, count = 100): 132 | countCur = 0; parameterChanged = 1 133 | while (countCur < count) and (parameterChanged > 0): 134 | countCur += 1 135 | parameterChanged = 0 136 | 137 | for i in range(self.m): 138 | #是否满足KKT条件，如果不满足则作为SMO中第一个变量从而进行优化 139 | if self.isSatisfyKKT(i) == False: 140 | #如果下标为i的α不满足KKT条件，则进行优化 141 | E1 = self.calculateEi(i) 142 | E2, j = self.getAlphaJ(E1, i) 143 | 144 | y1 = self.train_yMat[i] 145 | y2 = self.train_yMat[j] 146 | 147 | alphaOld_1 = self.alpha[i] 148 | alphaOld_2 = self.alpha[j] 149 | 150 | if y1 != y2: 151 | L = max(0, alphaOld_2 - alphaOld_1) 152 | H = min(self.C, self.C + alphaOld_2 - alphaOld_1) 153 | else: 154 | L = max(0, alphaOld_2 + alphaOld_1 - self.C) 155 | H = min(self.C, alphaOld_2 + alphaOld_1) 156 | 157 | if L == H: 158 | continue 159 | 160 | #计算α的新值 161 | k11 = self.k[i][i] 162 | k22 = self.k[j][j] 163 | k21 = self.k[j][i] 164 | k12 = self.k[i][j] 165 | 166 | alphaNew_2 = alphaOld_2 + y2 * (E1 - E2) / (k11 + k22 - 2 * k12) 167 | 168 | if alphaNew_2 < L: alphaNew_2 = L 169 | elif alphaNew_2 > H: alphaNew_2 = H 170 | #更新α1 171 | alphaNew_1 = alphaOld_1 + y1 * y2 * (alphaOld_2 - alphaNew_2) 172 | 173 | #计算b1和b2 174 | b1New = -1 * E1 - y1 * k11 * (alphaNew_1 - alphaOld_1) \ 175 | - y2 * k21 * (alphaNew_2 - alphaOld_2) + self.b 176 | b2New = -1 * E2 - y1 * k12 * (alphaNew_1 - alphaOld_1) \ 177 | - y2 * k22 * (alphaNew_2 - alphaOld_2) + self.b 178 | 179 | #依据α1和α2的值范围确定新b 180 | if (alphaNew_1 > 0) and (alphaNew_1 < self.C): 181 | bNew = b1New 182 | elif (alphaNew_2 > 0) and (alphaNew_2 < self.C): 183 | bNew = b2New 184 | else: 185 | bNew = (b1New + b2New) / 2 186 | 187 | #将更新后的各类值写入，进行更新 188 | self.alpha[i] = alphaNew_1 189 | self.alpha[j] = alphaNew_2 190 | self.b = bNew 191 | 192 | self.E[i] = self.calculateEi(i) 193 | self.E[j] = self.calculateEi(j) 194 | 195 | #如果α2的改变量过于小，就认为该参数未改变，不增加parameterChanged值 196 | #反之则自增1 197 | if math.fabs(alphaNew_2 - alphaOld_2) >= 0.00001: 198 | parameterChanged += 1 199 | 200 | #全部计算结束后，重新遍历一遍α，查找里面的支持向量 201 | for i in range(self.m): 202 | #如果α>0，说明是支持向量 203 | if self.alpha[i] > 0: 204 | #将支持向量的索引保存起来 205 | self.supportVecIndex.append(i) 206 | 207 | # 单独计算核函数 208 | def calculateSinglKernel(self, x1, x2): 209 | # 计算高斯核 210 | result = (x1 - x2) * (x1 - x2).T 211 | result = np.exp(-1 * result / (2 * self.sigma ** 2)) 212 | return np.exp(result) 213 | 214 | # 对样本的标签进行预测 215 | def predict(self, x): 216 | result = 0 217 | for i in self.supportVecIndex: 218 | # 遍历所有支持向量，计算求和式 219 | tmp = self.calculateSinglKernel(self.train_XMat[i, :], np.mat(x)) 220 | result += self.alpha[i] * self.train_yMat[i] * tmp 221 | # 偏置b 222 | result += self.b 223 | 224 | return np.sign(result) 225 | 226 | 227 | 228 | def test(self, X_test, y_test): 229 | 230 | rightCount = 0 231 | 232 | for i in range(len(X_test)): 233 | result = self.predict(X_test[i]) 234 | if result == y_test[i]: 235 | rightCount += 1 236 | return rightCount / len(X_test) 237 | 238 | 239 | if __name__ == '__main__': 240 | 241 | X, y = processData('iris.csv') 242 | 243 | X_train = X[0:149:50] 244 | y_train = y[0:149:50] 245 | 246 | # 自己在数据集后面加上了干扰的实例 247 | X_test = X[0:150:1] 248 | y_test = y[0:150:1] 249 | 250 | # 初始化SVM类 251 | svm = SVM(X_train, y_train, 10, 200, 0.001) 252 | 253 | # 开始训练 254 | svm.train() 255 | 256 | # 开始测试 257 | rightRate = svm.test(X_test, y_test) 258 | print('准确率为百分之 %d' % (rightRate * 100)) -------------------------------------------------------------------------------- /codes/iris.csv: -------------------------------------------------------------------------------- 1 | "Number","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species" 2 | "1",5.1,3.5,1.4,0.2,"setosa" 3 | "2",4.9,3,1.4,0.2,"setosa" 4 | "3",4.7,3.2,1.3,0.2,"setosa" 5 | "4",4.6,3.1,1.5,0.2,"setosa" 6 | "5",5,3.6,1.4,0.2,"setosa" 7 | "6",5.4,3.9,1.7,0.4,"setosa" 8 | "7",4.6,3.4,1.4,0.3,"setosa" 9 | "8",5,3.4,1.5,0.2,"setosa" 10 | "9",4.4,2.9,1.4,0.2,"setosa" 11 | "10",4.9,3.1,1.5,0.1,"setosa" 12 | "11",5.4,3.7,1.5,0.2,"setosa" 13 | "12",4.8,3.4,1.6,0.2,"setosa" 14 | "13",4.8,3,1.4,0.1,"setosa" 15 | "14",4.3,3,1.1,0.1,"setosa" 16 | "15",5.8,4,1.2,0.2,"setosa" 17 | "16",5.7,4.4,1.5,0.4,"setosa" 18 | "17",5.4,3.9,1.3,0.4,"setosa" 19 | "18",5.1,3.5,1.4,0.3,"setosa" 20 | "19",5.7,3.8,1.7,0.3,"setosa" 21 | "20",5.1,3.8,1.5,0.3,"setosa" 22 | "21",5.4,3.4,1.7,0.2,"setosa" 23 | "22",5.1,3.7,1.5,0.4,"setosa" 24 | "23",4.6,3.6,1,0.2,"setosa" 25 | "24",5.1,3.3,1.7,0.5,"setosa" 26 | "25",4.8,3.4,1.9,0.2,"setosa" 27 | "26",5,3,1.6,0.2,"setosa" 28 | "27",5,3.4,1.6,0.4,"setosa" 29 | "28",5.2,3.5,1.5,0.2,"setosa" 30 | "29",5.2,3.4,1.4,0.2,"setosa" 31 | "30",4.7,3.2,1.6,0.2,"setosa" 32 | "31",4.8,3.1,1.6,0.2,"setosa" 33 | "32",5.4,3.4,1.5,0.4,"setosa" 34 | "33",5.2,4.1,1.5,0.1,"setosa" 35 | "34",5.5,4.2,1.4,0.2,"setosa" 36 | "35",4.9,3.1,1.5,0.2,"setosa" 37 | "36",5,3.2,1.2,0.2,"setosa" 38 | "37",5.5,3.5,1.3,0.2,"setosa" 39 | "38",4.9,3.6,1.4,0.1,"setosa" 40 | "39",4.4,3,1.3,0.2,"setosa" 41 | "40",5.1,3.4,1.5,0.2,"setosa" 42 | "41",5,3.5,1.3,0.3,"setosa" 43 | "42",4.5,2.3,1.3,0.3,"setosa" 44 | "43",4.4,3.2,1.3,0.2,"setosa" 45 | "44",5,3.5,1.6,0.6,"setosa" 46 | "45",5.1,3.8,1.9,0.4,"setosa" 47 | "46",4.8,3,1.4,0.3,"setosa" 48 | "47",5.1,3.8,1.6,0.2,"setosa" 49 | "48",4.6,3.2,1.4,0.2,"setosa" 50 | "49",5.3,3.7,1.5,0.2,"setosa" 51 | "50",5,3.3,1.4,0.2,"setosa" 52 | "51",7,3.2,4.7,1.4,"versicolor" 53 | "52",6.4,3.2,4.5,1.5,"versicolor" 54 | "53",6.9,3.1,4.9,1.5,"versicolor" 55 | "54",5.5,2.3,4,1.3,"versicolor" 56 | "55",6.5,2.8,4.6,1.5,"versicolor" 57 | "56",5.7,2.8,4.5,1.3,"versicolor" 58 | "57",6.3,3.3,4.7,1.6,"versicolor" 59 | "58",4.9,2.4,3.3,1,"versicolor" 60 | "59",6.6,2.9,4.6,1.3,"versicolor" 61 | "60",5.2,2.7,3.9,1.4,"versicolor" 62 | "61",5,2,3.5,1,"versicolor" 63 | "62",5.9,3,4.2,1.5,"versicolor" 64 | "63",6,2.2,4,1,"versicolor" 65 | "64",6.1,2.9,4.7,1.4,"versicolor" 66 | "65",5.6,2.9,3.6,1.3,"versicolor" 67 | "66",6.7,3.1,4.4,1.4,"versicolor" 68 | "67",5.6,3,4.5,1.5,"versicolor" 69 | "68",5.8,2.7,4.1,1,"versicolor" 70 | "69",6.2,2.2,4.5,1.5,"versicolor" 71 | "70",5.6,2.5,3.9,1.1,"versicolor" 72 | "71",5.9,3.2,4.8,1.8,"versicolor" 73 | "72",6.1,2.8,4,1.3,"versicolor" 74 | "73",6.3,2.5,4.9,1.5,"versicolor" 75 | "74",6.1,2.8,4.7,1.2,"versicolor" 76 | "75",6.4,2.9,4.3,1.3,"versicolor" 77 | "76",6.6,3,4.4,1.4,"versicolor" 78 | "77",6.8,2.8,4.8,1.4,"versicolor" 79 | "78",6.7,3,5,1.7,"versicolor" 80 | "79",6,2.9,4.5,1.5,"versicolor" 81 | "80",5.7,2.6,3.5,1,"versicolor" 82 | "81",5.5,2.4,3.8,1.1,"versicolor" 83 | "82",5.5,2.4,3.7,1,"versicolor" 84 | "83",5.8,2.7,3.9,1.2,"versicolor" 85 | "84",6,2.7,5.1,1.6,"versicolor" 86 | "85",5.4,3,4.5,1.5,"versicolor" 87 | "86",6,3.4,4.5,1.6,"versicolor" 88 | "87",6.7,3.1,4.7,1.5,"versicolor" 89 | "88",6.3,2.3,4.4,1.3,"versicolor" 90 | "89",5.6,3,4.1,1.3,"versicolor" 91 | "90",5.5,2.5,4,1.3,"versicolor" 92 | "91",5.5,2.6,4.4,1.2,"versicolor" 93 | "92",6.1,3,4.6,1.4,"versicolor" 94 | "93",5.8,2.6,4,1.2,"versicolor" 95 | "94",5,2.3,3.3,1,"versicolor" 96 | "95",5.6,2.7,4.2,1.3,"versicolor" 97 | "96",5.7,3,4.2,1.2,"versicolor" 98 | "97",5.7,2.9,4.2,1.3,"versicolor" 99 | "98",6.2,2.9,4.3,1.3,"versicolor" 100 | "99",5.1,2.5,3,1.1,"versicolor" 101 | "100",5.7,2.8,4.1,1.3,"versicolor" 102 | "101",6.3,3.3,6,2.5,"virginica" 103 | "102",5.8,2.7,5.1,1.9,"virginica" 104 | "103",7.1,3,5.9,2.1,"virginica" 105 | "104",6.3,2.9,5.6,1.8,"virginica" 106 | "105",6.5,3,5.8,2.2,"virginica" 107 | "106",7.6,3,6.6,2.1,"virginica" 108 | "107",4.9,2.5,4.5,1.7,"virginica" 109 | "108",7.3,2.9,6.3,1.8,"virginica" 110 | "109",6.7,2.5,5.8,1.8,"virginica" 111 | "110",7.2,3.6,6.1,2.5,"virginica" 112 | "111",6.5,3.2,5.1,2,"virginica" 113 | "112",6.4,2.7,5.3,1.9,"virginica" 114 | "113",6.8,3,5.5,2.1,"virginica" 115 | "114",5.7,2.5,5,2,"virginica" 116 | "115",5.8,2.8,5.1,2.4,"virginica" 117 | "116",6.4,3.2,5.3,2.3,"virginica" 118 | "117",6.5,3,5.5,1.8,"virginica" 119 | "118",7.7,3.8,6.7,2.2,"virginica" 120 | "119",7.7,2.6,6.9,2.3,"virginica" 121 | "120",6,2.2,5,1.5,"virginica" 122 | "121",6.9,3.2,5.7,2.3,"virginica" 123 | "122",5.6,2.8,4.9,2,"virginica" 124 | "123",7.7,2.8,6.7,2,"virginica" 125 | "124",6.3,2.7,4.9,1.8,"virginica" 126 | "125",6.7,3.3,5.7,2.1,"virginica" 127 | "126",7.2,3.2,6,1.8,"virginica" 128 | "127",6.2,2.8,4.8,1.8,"virginica" 129 | "128",6.1,3,4.9,1.8,"virginica" 130 | "129",6.4,2.8,5.6,2.1,"virginica" 131 | "130",7.2,3,5.8,1.6,"virginica" 132 | "131",7.4,2.8,6.1,1.9,"virginica" 133 | "132",7.9,3.8,6.4,2,"virginica" 134 | "133",6.4,2.8,5.6,2.2,"virginica" 135 | "134",6.3,2.8,5.1,1.5,"virginica" 136 | "135",6.1,2.6,5.6,1.4,"virginica" 137 | "136",7.7,3,6.1,2.3,"virginica" 138 | "137",6.3,3.4,5.6,2.4,"virginica" 139 | "138",6.4,3.1,5.5,1.8,"virginica" 140 | "139",6,3,4.8,1.8,"virginica" 141 | "140",6.9,3.1,5.4,2.1,"virginica" 142 | "141",6.7,3.1,5.6,2.4,"virginica" 143 | "142",6.9,3.1,5.1,2.3,"virginica" 144 | "143",5.8,2.7,5.1,1.9,"virginica" 145 | "144",6.8,3.2,5.9,2.3,"virginica" 146 | "145",6.7,3.3,5.7,2.5,"virginica" 147 | "146",6.7,3,5.2,2.3,"virginica" 148 | "147",6.3,2.5,5,1.9,"virginica" 149 | "148",6.5,3,5.2,2,"virginica" 150 | "149",6.2,3.4,5.4,2.3,"virginica" 151 | "150",5.9,3,5.1,1.8,"virginica" 152 | "151",52.9,322,52.221,1212.8,"virginica" 153 | 154 | -------------------------------------------------------------------------------- /docImage/10_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_1_1.jpg -------------------------------------------------------------------------------- /docImage/10_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_2_1.jpg -------------------------------------------------------------------------------- /docImage/11_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_1.jpg -------------------------------------------------------------------------------- /docImage/11_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_2.jpg -------------------------------------------------------------------------------- /docImage/11_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_1.jpg -------------------------------------------------------------------------------- /docImage/11_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_2.jpg -------------------------------------------------------------------------------- /docImage/11_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_3.jpg -------------------------------------------------------------------------------- /docImage/8_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_1.jpg -------------------------------------------------------------------------------- /docImage/8_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_2.jpg -------------------------------------------------------------------------------- /docImage/8_1_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_3.jpg -------------------------------------------------------------------------------- /docImage/8_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_1.jpg -------------------------------------------------------------------------------- /docImage/8_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_2.jpg -------------------------------------------------------------------------------- /docImage/8_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_3.jpg -------------------------------------------------------------------------------- /docImage/8_2_4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_4.jpg -------------------------------------------------------------------------------- /docImage/9_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_1.jpg -------------------------------------------------------------------------------- /docImage/9_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_2.jpg -------------------------------------------------------------------------------- /docImage/9_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_1.jpg -------------------------------------------------------------------------------- /docImage/9_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_2.jpg -------------------------------------------------------------------------------- /docImage/9_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_3.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane1.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane2.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane3.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane4.jpg -------------------------------------------------------------------------------- /docImage/Novikoff1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff1.jpg -------------------------------------------------------------------------------- /docImage/Novikoff2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff2.jpg -------------------------------------------------------------------------------- /docImage/Novikoff3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff3.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual1.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual2.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual3.jpg -------------------------------------------------------------------------------- /docImage/bayes_naive_bayes1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes1.jpg -------------------------------------------------------------------------------- /docImage/bayes_naive_bayes2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes2.jpg -------------------------------------------------------------------------------- /docImage/bayesian_estimation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayesian_estimation.jpg -------------------------------------------------------------------------------- /docImage/hoeffding1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding1.jpg -------------------------------------------------------------------------------- /docImage/hoeffding2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding2.jpg -------------------------------------------------------------------------------- /docImage/iterative_method1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method1.jpg -------------------------------------------------------------------------------- /docImage/iterative_method2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method2.jpg -------------------------------------------------------------------------------- /docImage/iterative_method3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method3.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality1.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality2.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality3.jpg -------------------------------------------------------------------------------- /docImage/maximum_entropy1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy1.jpg -------------------------------------------------------------------------------- /docImage/maximum_entropy2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy2.jpg -------------------------------------------------------------------------------- /docImage/maximum_likelihood_estimation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_likelihood_estimation.jpg -------------------------------------------------------------------------------- /docImage/mle_naive_bayes.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/mle_naive_bayes.jpg -------------------------------------------------------------------------------- /docImage/poster_prob1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob1.jpg -------------------------------------------------------------------------------- /docImage/poster_prob2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob2.jpg -------------------------------------------------------------------------------- /notes/chapter1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter1.pdf -------------------------------------------------------------------------------- /notes/chapter10.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter10.pdf -------------------------------------------------------------------------------- /notes/chapter11.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter11.pdf -------------------------------------------------------------------------------- /notes/chapter2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter2.pdf -------------------------------------------------------------------------------- /notes/chapter3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter3.pdf -------------------------------------------------------------------------------- /notes/chapter4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter4.pdf -------------------------------------------------------------------------------- /notes/chapter5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter5.pdf -------------------------------------------------------------------------------- /notes/chapter6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter6.pdf -------------------------------------------------------------------------------- /notes/chapter7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter7.pdf -------------------------------------------------------------------------------- /notes/chapter8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter8.pdf -------------------------------------------------------------------------------- /notes/chapter9.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter9.pdf --------------------------------------------------------------------------------