├── .gitignore ├── README.md ├── codes ├── MyAdaBoost.py ├── MyDecisionTree.py ├── MyEM.py ├── MyHMM.py ├── MyHMMTestData.txt ├── MyHMMTrainData.txt ├── MyKNN.py ├── MyLogisticRegression.py ├── MyMaxEnt.py ├── MyNaiveBayes.py ├── MyPerceptron.py ├── MySVM.py └── iris.csv ├── docImage ├── 10_1_1.jpg ├── 10_2_1.jpg ├── 11_1_1.jpg ├── 11_1_2.jpg ├── 11_2_1.jpg ├── 11_2_2.jpg ├── 11_2_3.jpg ├── 8_1_1.jpg ├── 8_1_2.jpg ├── 8_1_3.jpg ├── 8_2_1.jpg ├── 8_2_2.jpg ├── 8_2_3.jpg ├── 8_2_4.jpg ├── 9_1_1.jpg ├── 9_1_2.jpg ├── 9_2_1.jpg ├── 9_2_2.jpg ├── 9_2_3.jpg ├── Maximum_separation_hyperplane1.jpg ├── Maximum_separation_hyperplane2.jpg ├── Maximum_separation_hyperplane3.jpg ├── Maximum_separation_hyperplane4.jpg ├── Novikoff1.jpg ├── Novikoff2.jpg ├── Novikoff3.jpg ├── Soft_interval_maximization_dual1.jpg ├── Soft_interval_maximization_dual2.jpg ├── Soft_interval_maximization_dual3.jpg ├── bayes_naive_bayes1.jpg ├── bayes_naive_bayes2.jpg ├── bayesian_estimation.jpg ├── hoeffding1.jpg ├── hoeffding2.jpg ├── iterative_method1.jpg ├── iterative_method2.jpg ├── iterative_method3.jpg ├── lagrange_duality1.jpg ├── lagrange_duality2.jpg ├── lagrange_duality3.jpg ├── maximum_entropy1.jpg ├── maximum_entropy2.jpg ├── maximum_likelihood_estimation.jpg ├── mle_naive_bayes.jpg ├── poster_prob1.jpg └── poster_prob2.jpg └── notes ├── chapter1.pdf ├── chapter10.pdf ├── chapter11.pdf ├── chapter2.pdf ├── chapter3.pdf ├── chapter4.pdf ├── chapter5.pdf ├── chapter6.pdf ├── chapter7.pdf ├── chapter8.pdf └── chapter9.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Statistic-study-notes 2 | # 李航统计学习方法(第二版)的学习笔记,包括: 3 | ## 1、每章重点数学公式的手动推导 4 |
均为手写然后扫描成图片,字迹不工整还望谅解,之后有时间会用Latex修正 5 | 点击数学公式没有出现图片的情况 需要搭梯子才可以在线预览到数学推导的图片... 6 | 7 | - [1.第一章数学公式推导](#1第一章数学公式推导) 8 | - [1.1 极大似然估计推导](#11极大似然估计推导) 9 | - [1.2 贝叶斯估计推导](#12贝叶斯估计推导) 10 | - [1.3 利用Hoeffding推导泛化误差上界](#13利用Hoeffding推导泛化误差上界) 11 | - [2.第二章数学公式推导](#2第二章数学公式推导) 12 | - [2.1 算法的收敛性证明Novikoff](#21算法的收敛性证明Novikoff) 13 | - [3.第三章数学公式推导](#3第三章数学公式推导) 14 | - 3.1 无数学推导,偏重算法实现-KNN 15 | - [4.第四章数学公式推导](#4第四章数学公式推导) 16 | - [4.1 用极大似然法估计朴素贝叶斯参数](#41用极大似然法估计朴素贝叶斯参数) 17 | - [4.2 用贝叶斯估计法朴素贝叶斯参数](#42用贝叶斯估计法朴素贝叶斯参数) 18 | - [4.3 证明后验概率最大化即期望风险最小化](#43证明后验概率最大化即期望风险最小化) 19 | - [5.第五章数学公式推导](#5第五章数学公式推导) 20 | - 5.1 无数学推导,偏重算法实现-决策树 21 | - [6.第六章数学公式推导](#6第六章数学公式推导) 22 | - [6.1 最大熵模型的数学推导](#61最大熵模型的数学推导) 23 | - [6.2 拉格朗日对偶性问题的数学推导](#62拉格朗日对偶性问题的数学推导) 24 | - [6.3 改进的迭代尺度法数学推导](#63改进的迭代尺度法数学推导) 25 | - [7.第七章数学公式推导](#7第七章数学公式推导) 26 | - [7.1 软间隔最大化对偶问题](#71软间隔最大化对偶问题) 27 | - [7.2 证明最大间隔分离超平面存在唯一性](#72证明最大间隔分离超平面存在唯一性) 28 | - [8.第八章数学公式推导](#8第八章数学公式推导) 29 | - [8.1 证明AdaBoost是前向分步加法算法的特例](#81证明AdaBoost是前向分步加法算法的特例) 30 | - [8.2 证明AdaBoost的训练误差界](#82证明AdaBoost的训练误差界) 31 | - [9.第九章数学公式推导](#9第九章数学公式推导) 32 | - [9.1 EM算法的导出](#91EM算法的导出) 33 | - [9.2 用EM算法估计高斯模混合模型](#92用EM算法估计高斯模混合模型) 34 | - [10.第十章数学公式推导](#10第十章数学公式推导) 35 | - [10.1 前向算法两个公式的证明](#101前向算法两个公式的证明) 36 | - [10.2 维特比算法推导](#102维特比算法推导) 37 | - [11.第十一章数学公式推导](#11第十一章数学公式推导) 38 | - [11.1 条件随机场的矩阵形式推导](#111条件随机场的矩阵形式推导) 39 | - [11.2 牛顿法和拟牛顿法的推导](#112牛顿法和拟牛顿法的推导) 40 | 41 | 42 | 43 | ## 2、每章算法的Python自实现 44 | [数据集为iris.csv(带Header)](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/iris.csv) 45 | ### 第2章 感知机模型(使用Iris数据集) 46 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyPerceptron.py) 47 | ### 第3章 KNN模型(线性-使用Iris数据集 与 KD树-有点问题..修改后再上传) 48 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyKNN.py) 49 | ### 第4章 朴素贝叶斯模型(使用Iris数据集) 50 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyNaiveBayes.py) 51 | ### 第5章 决策树模型(使用Iris数据集) 52 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyDecisionTree.py) 53 | ### 第6章 逻辑斯提回归模型(使用Iris数据集,采用梯度下降方法) 54 | 源代码[MyPerceptron.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyLogisticRegression.py) 55 | ### 第6章 最大熵模型(使用Iris数据集) 56 | 源代码[MyMaxEnt.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyMaxEnt.py) 57 | ### 第7章 SVM(使用Iris数据集) 58 | 源代码[MySVM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MySVM.py) 59 | ### 第8章 AdaBoost(使用Iris数据集) 60 | 源代码[MyAdaBoost.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyAdaBoost.py) 61 | ### 第9章 EM算法(使用自己随机生成的符合高斯分布的数据) 62 | 源代码[MyEM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyEM.py) 63 | ### 第10章 HMM算法(使用人民日报语料库进行训练,对输入的文本进行分词,12.8前完成) 64 | 源代码[MyHMM.py](https://github.com/kingsunfather/Statistic-study-notes/blob/master/codes/MyHMM.py) 65 | 66 | ## 3、学习笔记汇总 67 |
学习笔记均为自己学习过程中记录在笔记本上然后拍照扫描成pdf 68 | ### [第1章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter1.pdf) 69 | ### [第2章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter2.pdf) 70 | ### [第3章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter3.pdf) 71 | ### [第4章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter4.pdf) 72 | ### [第5章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter5.pdf) 73 | ### [第6章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter6.pdf) 74 | ### [第7章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter7.pdf) 75 | ### [第8章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter8.pdf) 76 | ### [第9章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter9.pdf) 77 | ### [第10章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter10.pdf) 78 | ### [第11章学习笔记](https://github.com/kingsunfather/Statistic-study-notes/blob/master/notes/chapter11.pdf) 79 | 80 | 81 | ## 4、每章节的课后习题实现 82 |
接下来每周都会定时更新课后习题的实现 83 | 84 | ## 1第一章数学公式推导 85 | 86 | ### 1.1极大似然估计推导 87 | 88 | ![](/docImage/maximum_likelihood_estimation.jpg) 89 | 90 | ### 1.2贝叶斯估计推导 91 | 92 | ![](/docImage/bayesian_estimation.jpg) 93 | 94 | 95 | ### 1.3利用Hoeffding推导泛化误差上界 96 | 97 | ![](/docImage/hoeffding1.jpg) 98 | 99 | ![](/docImage/hoeffding2.jpg) 100 | 101 | ## 2第二章数学公式推导 102 | 103 | ### 2.1算法的收敛性证明Novikoff 104 | 105 | ![](/docImage/Novikoff1.jpg) 106 | ![](/docImage/Novikoff2.jpg) 107 | ![](/docImage/Novikoff3.jpg) 108 | 109 | ## 3第三章数学公式推导 110 | 111 | ## 4第四章数学公式推导 112 | 113 | ### 4.1用极大似然法估计朴素贝叶斯参数 114 | ![](/docImage/mle_naive_bayes.jpg) 115 | 116 | ### 4.2用贝叶斯估计法朴素贝叶斯参数 117 | ![](/docImage/bayes_naive_bayes1.jpg) 118 | ![](/docImage/bayes_naive_bayes2.jpg) 119 | 120 | ### 4.3证明后验概率最大化即期望风险最小化 121 | ![](/docImage/poster_prob1.jpg) 122 | ![](/docImage/poster_prob2.jpg) 123 | 124 | ## 5第五章数学公式推导 125 | 126 | ## 6第六章数学公式推导 127 | 128 | ### 6.1最大熵模型的数学推导 129 | ![](/docImage/maximum_entropy1.jpg) 130 | ![](/docImage/maximum_entropy2.jpg) 131 | 132 | ### 6.2拉格朗日对偶性问题的数学推导 133 | ![](/docImage/lagrange_duality1.jpg) 134 | ![](/docImage/lagrange_duality2.jpg) 135 | ![](/docImage/lagrange_duality3.jpg) 136 | 137 | ### 6.3改进的迭代尺度法数学推导 138 | ![](/docImage/iterative_method1.jpg) 139 | ![](/docImage/iterative_method2.jpg) 140 | ![](/docImage/iterative_method3.jpg) 141 | 142 | ## 7第七章数学公式推导 143 | 144 | ### 7.1软间隔最大化对偶问题 145 | ![](/docImage/Soft_interval_maximization_dual1.jpg) 146 | ![](/docImage/Soft_interval_maximization_dual2.jpg) 147 | ![](/docImage/Soft_interval_maximization_dual3.jpg) 148 | 149 | ### 7.2证明最大间隔分离超平面存在唯一性 150 | ![](/docImage/Maximum_separation_hyperplane1.jpg) 151 | ![](/docImage/Maximum_separation_hyperplane2.jpg) 152 | ![](/docImage/Maximum_separation_hyperplane3.jpg) 153 | ![](/docImage/Maximum_separation_hyperplane4.jpg) 154 | 155 | ## 8第八章数学公式推导 156 | 157 | ### 8.1证明AdaBoost是前向分步加法算法的特例 158 | ![](/docImage/8_1_1.jpg) 159 | ![](/docImage/8_1_2.jpg) 160 | ![](/docImage/8_1_3.jpg) 161 | 162 | ### 8.2 证明AdaBoost的训练误差界 163 | ![](/docImage/8_2_1.jpg) 164 | ![](/docImage/8_2_2.jpg) 165 | ![](/docImage/8_2_3.jpg) 166 | ![](/docImage/8_2_4.jpg) 167 | 168 | ## 9第九章数学公式推导 169 | ### 9.1 EM算法的导出 170 | ![](/docImage/9_1_1.jpg) 171 | ![](/docImage/9_1_2.jpg) 172 | 173 | ### 9.2 用EM算法估计高斯模混合模型 174 | ![](/docImage/9_2_1.jpg) 175 | ![](/docImage/9_2_2.jpg) 176 | ![](/docImage/9_2_3.jpg) 177 | 178 | ## 10.第十章数学公式推导 179 | 180 | ### 10.1 前向算法两个公式的证明 181 | ![](/docImage/10_1_1.jpg) 182 | 183 | ### 10.2 维特比算法推导 184 | ![](/docImage/10_2_1.jpg) 185 | 186 | ## 11.第十一章数学公式推导 187 | ### 11.1 条件随机场的矩阵形式推导 188 | ![](/docImage/11_1_1.jpg) 189 | ![](/docImage/11_1_1.jpg) 190 | ### 11.2 牛顿法和拟牛顿法的推导 191 | ![](/docImage/11_2_1.jpg) 192 | ![](/docImage/11_2_2.jpg) 193 | ![](/docImage/11_2_3.jpg) 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | -------------------------------------------------------------------------------- /codes/MyAdaBoost.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | #根据文件路径读取Iris数据集数据 9 | # #return type: np.array 10 | def processData(filePath): 11 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 12 | X = [] 13 | y = [] 14 | #默认读取csv的头部 15 | df = pd.read_csv(filePath) 16 | #利用数据集合的第一个维度特征分类 17 | #遍历pandas中df的每一行 18 | for index, row in df.iterrows(): 19 | if row["Species"] == "setosa": 20 | y.append(1) 21 | else: 22 | y.append(-1) 23 | #进行二值化处理,使得样本数据为0/1 24 | X.append([int(float(row["Sepal.Length"])>5.0),int(float(row["Sepal.Width"])>3.5),int(float(row["Petal.Length"])>1.4)]) 25 | 26 | return np.array(X), np.array(y) 27 | 28 | def createOneLayerBoostTree(X_train, y_train, D): 29 | #获得样本数目及特征数量 30 | m, n = np.shape(X_train) 31 | #该字典代表了一层提升树,用于存放当前层提升树的参数 32 | oneLowerNumayerBoostTree = {} 33 | #初始化分类误差率为1,即100% 34 | oneLowerNumayerBoostTree['errorRate'] = 1 35 | #对每一个特征进行遍历,寻找用于划分的最合适的特征 36 | for i in range(n): 37 | #因为特征已经经过二值化,只能为0和1,因此分切点为-0.5, 0.5, 1.5 38 | for division in [-0.5, 0.5, 1.5]: 39 | #规则为如下所示: 40 | #LowerNumowerNumSetOne:LowerNumow is one:小于某值得是1 41 | #UpperNumSetOne:UpperNumigh is one:大于某值得是1 42 | for rule in ['LowerNumSetOne', 'UpperNumSetOne']: 43 | #按照第i个特征,以值division进行切割,进行当前设置得到的预测和分类错误率 44 | Gx, e = calculate_e_Gx(X_train, y_train, i, division, rule, D) 45 | #如果分类错误率e小于当前最小的e,那么将它作为最小的分类错误率保存 46 | if e < oneLowerNumayerBoostTree['errorRate']: 47 | # 分类错误率 48 | oneLowerNumayerBoostTree['errorRate'] = e 49 | # 最优划分点 50 | oneLowerNumayerBoostTree['division'] = division 51 | # 划分规则 52 | oneLowerNumayerBoostTree['rule'] = rule 53 | # 预测结果 54 | oneLowerNumayerBoostTree['Gx'] = Gx 55 | # 特征索引 56 | oneLowerNumayerBoostTree['feature'] = i 57 | return oneLowerNumayerBoostTree 58 | 59 | def createBoostTree(X_train, y_train, treeNum): 60 | #将数据和标签转化为数组形式 61 | X_train = np.array(X_train) 62 | y_train = np.array(y_train) 63 | #获得训练集数量以及特征个数 64 | m, n = np.shape(X_train) 65 | #初始化D为1/N 66 | D = [1 / m] * m 67 | #初始化提升树列表,每个位置为一层 68 | tree = [] 69 | #循环创建提升树 70 | for i in range(treeNum): 71 | #得到当前层的提升树 72 | currentTree = createOneLayerBoostTree(X_train, y_train, D) 73 | # 这边由于用的是Iris数据集,数据量过小,所以currentTree['errorRate']即误差分类率可能为0 74 | # 因此在最后加上了0.0001来避免除数为0的错误 75 | alpha = 1/2 * np.log((1 - currentTree['errorRate']) / (currentTree['errorRate']+0.0001)) 76 | #获得当前层的预测结果,用于下一步更新D 77 | Gx = currentTree['Gx'] 78 | D = np.multiply(D, np.exp(-1 * alpha * np.multiply(y_train, Gx))) / sum(D) 79 | currentTree['alpha'] = alpha 80 | tree.append(currentTree) 81 | return tree 82 | 83 | # 前提:数据进行二值处理 84 | def predict(x, division, rule, feature): 85 | if rule == 'LowerNumSetOne': 86 | LowerNum = 1 87 | UpperNum = -1 88 | else: 89 | LowerNum = -1 90 | UpperNum = 1 91 | 92 | if x[feature] < division: 93 | return LowerNum 94 | else: 95 | return UpperNum 96 | 97 | def test(X_test, y_test, tree): 98 | rightCount = 0 99 | for i in range(len(X_test)): 100 | result = 0 101 | for currentTree in tree: 102 | division = currentTree['division'] 103 | rule = currentTree['rule'] 104 | feature = currentTree['feature'] 105 | alpha = currentTree['alpha'] 106 | result += alpha * predict(X_test[i], division, rule, feature) 107 | #预测结果取sign值,如果大于0 sign为1,反之为0 108 | if np.sign(result) == y_test[i]: 109 | rightCount += 1 110 | #返回准确率 111 | return rightCount / len(X_test) 112 | 113 | #计算分类错误率 114 | def calculate_e_Gx(X_train, y_train, n, division, rule, D): 115 | #初始化分类误差率为0 116 | e = 0 117 | x = X_train[:, n] 118 | y = y_train 119 | train = [] 120 | if rule == 'LowerNumSetOne': 121 | LowerNum = 1 122 | UpperNum = -1 123 | else: 124 | LowerNum = -1 125 | UpperNum = 1 126 | 127 | #遍历样本的特征 128 | for i in range(X_train.shape[0]): 129 | if x[i] < division: 130 | #如果小于划分点,则预测为LowerNum 131 | #如果设置小于division为1,那么LowerNum就是1, 132 | #如果设置小于division为-1,LowerNum就是-1 133 | train.append(LowerNum) 134 | #如果预测错误,分类错误率要加上该分错的样本的权值 135 | if y[i] != LowerNum: 136 | e += D[i] 137 | elif x[i] >= division: 138 | train.append(UpperNum) 139 | if y[i] != UpperNum: 140 | e += D[i] 141 | return np.array(train), e 142 | 143 | if __name__ == '__main__': 144 | X, y = processData('iris.csv') 145 | 146 | X_train = X[0:149:50] 147 | y_train = y[0:149:50] 148 | 149 | # 自己在数据集后面加上了干扰的实例 150 | X_test = X[0:150:1] 151 | y_test = y[0:150:1] 152 | 153 | #创建提升树,最后一个参数代表的是公式的m,即多少个模型 154 | tree = createBoostTree(X_train, y_train, 5) 155 | 156 | #准确率测试 157 | rightRate = test(X_test, y_test, tree) 158 | print('分类正确率为:',rightRate * 100, '%') -------------------------------------------------------------------------------- /codes/MyDecisionTree.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | import math 8 | from collections import namedtuple 9 | 10 | # 定义节点 11 | # 孩子节点、分类特征的取值、节点内容、节点分类特征、标签 12 | class Node(namedtuple("Node","children type content feature label")): 13 | def __repr__(self): 14 | return str(tuple(self)) 15 | 16 | #决策树 17 | class DecisionTree(): 18 | def __init__(self,method="info_gain_ratio"): 19 | self.tree=None 20 | self.method=method 21 | 22 | #计算经验熵 23 | def _experienc_entropy(self,X): 24 | 25 | # 统计每个取值的出现频率 26 | x_types_prob=X.iloc[:,0].value_counts()/X.shape[0] 27 | # 计算经验熵 28 | x_experienc_entropy=sum((-p*math.log(p,2) for p in x_types_prob)) 29 | return x_experienc_entropy 30 | 31 | #计算条件熵 32 | def _conditinal_entropy(self,X_train,y_train,feature): 33 | # feature特征下每个特征取值数量统计 34 | x_types_count= X_train[feature].value_counts() 35 | # 每个特征取值频率计算 36 | x_types_prob = x_types_count / X_train.shape[0] 37 | # 每个特征取值下类别y的经验熵 38 | x_experienc_entropy=[self._experienc_entropy(y_train[(X_train[feature]==i).values]) for i in x_types_count.index] 39 | # 特征feature对数据集的经验条件熵 40 | x_conditinal_entropy=(x_types_prob.mul(x_experienc_entropy)).sum() 41 | return x_conditinal_entropy 42 | 43 | #计算信息增益 44 | def _information_gain(self,X_train,y_train,feature): 45 | return self._experienc_entropy(y_train)-self._conditinal_entropy(X_train,y_train,feature) 46 | 47 | #计算信息增益比 48 | def _information_gain_ratio(self,X_train,y_train,features,feature): 49 | index=features.index(feature) 50 | return self._information_gain(X_train,y_train,feature)/self._experienc_entropy(X_train.iloc[:,index:index+1]) 51 | 52 | #选择分类特征 53 | def _choose_feature(self,X_train,y_train,features): 54 | if self.method=="info_gain_ratio": 55 | info=[self._information_gain_ratio(X_train,y_train,features,feature) for feature in features] 56 | elif self.method=="info_gain": 57 | info=[self._information_gain(X_train,y_train,feature) for feature in features] 58 | else: 59 | raise TypeError 60 | optimal_feature=features[np.argmax(info)] 61 | return optimal_feature 62 | 63 | #递归构造决策树 64 | def _built_tree(self,X_train,y_train,features,type=None): 65 | # 只有一个节点或已经完全分类,则决策树停止继续分叉 66 | if len(features)==1 or len(np.unique(y_train))==1: 67 | label=list(y_train[0].value_counts().index)[0] 68 | return Node(children=None,type=type,content=(X_train,y_train),feature=None,label=label) 69 | else: 70 | # 选择分类特征值 71 | feature=self._choose_feature(X_train,y_train,features) 72 | features.remove(feature) 73 | # 构建节点,同时递归创建孩子节点 74 | features_iter=np.unique(X_train[feature]) 75 | children=[] 76 | for item in features_iter: 77 | X_item=X_train[(X_train[feature]==item).values] 78 | y_item=y_train[(X_train[feature]==item).values] 79 | children.append(self._built_tree(X_item,y_item,features,type=item)) 80 | return Node(children=children,type=type,content=None,feature=feature,label=None) 81 | 82 | #进行剪枝 83 | def _prune(self): 84 | pass 85 | 86 | def fit(self,X_train,y_train,features): 87 | self.tree=self._built_tree(X_train,y_train,features) 88 | 89 | 90 | def _search(self,X_new): 91 | tree=self.tree 92 | # 若还有孩子节点,则继续向下搜索,否则搜索停止,在当前节点获取标签 93 | while tree.children: 94 | for child in tree.children: 95 | if X_new[tree.feature].loc[0]==child.type: 96 | tree=child 97 | break 98 | return tree.label 99 | 100 | def predict(self,X_new): 101 | return self._search(X_new) 102 | 103 | def processData(filePath): 104 | print('开始读取数据') 105 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 106 | X = [] 107 | y = [] 108 | #默认读取csv的头部 109 | df = pd.read_csv(filePath) 110 | #利用数据集合的第一个维度特征分类 111 | #遍历pandas中df的每一行 112 | for index, row in df.iterrows(): 113 | if row["Species"] == "setosa" : 114 | y.append("是setosa花") 115 | else: 116 | y.append("不是setosa花") 117 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"]),float(row["Petal.Width"])]) 118 | return np.array(X), np.array(y) 119 | 120 | def main(): 121 | # 训练数据集 122 | features = ["萼片长", "萼片宽", "花瓣长", "花瓣宽"] 123 | 124 | X , y = processData('iris.csv') 125 | 126 | X_train = X[0:149:4] 127 | y_train = y[0:149:4] 128 | 129 | X_test = X[0:149:10] 130 | y_test = y[0:149:10] 131 | 132 | 133 | X_train = pd.DataFrame(X_train, columns=features) 134 | y_train = pd.DataFrame(y_train) 135 | # 训练,使用信息增益 136 | clf=DecisionTree(method="info_gain") 137 | clf.fit(X_train,y_train,features.copy()) 138 | print('训练结束') 139 | 140 | X_new= pd.DataFrame(X_test, columns=features) 141 | y_predict=clf.predict(X_new) 142 | print(y_predict) 143 | 144 | if __name__=="__main__": 145 | main() -------------------------------------------------------------------------------- /codes/MyEM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | 6 | import numpy as np 7 | import random 8 | import math 9 | 10 | # 通过服从高斯分布的随机函数来伪造数据集 11 | # mean0: 高斯0的均值、 12 | # sigma0: 高斯0的方差 13 | # alpha0: 高斯0的系数 14 | 15 | # mean1: 高斯1的均值 16 | # sigma1: 高斯1的方差 17 | # alpha1: 高斯1的系数 18 | # 混合了两个高斯分布的数据 19 | 20 | def processData(mean0, sigma0, mean1, sigma1, alpha0, alpha1): 21 | #定义数据集长度为1000 22 | length = 1000 23 | 24 | #初始化高斯分布,数据长度为length * alpha 25 | data0 = np.random.normal(mean0, sigma0, int(length * alpha0)) 26 | data1 = np.random.normal(mean1, sigma1, int(length * alpha1)) 27 | 28 | trainData = [] 29 | trainData.extend(data0) 30 | trainData.extend(data1) 31 | 32 | #对总的数据集进行打乱 33 | random.shuffle(trainData) 34 | return trainData 35 | 36 | # 根据高斯密度函数计算值 37 | # 返回整个可观测数据集的高斯分布密度(向量形式) 38 | def calculateGauss(trainDataArr, mean, sigmod): 39 | result = (1 / (math.sqrt(2 * math.pi) * sigmod**2)) * np.exp(-1 * (trainDataArr - mean) * (trainDataArr - mean) / (2 * sigmod**2)) 40 | return result 41 | 42 | 43 | def E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1): 44 | gamma0 = alpha0 * calculateGauss(trainDataArr, mean0, sigmod0) 45 | gamma1 = alpha1 * calculateGauss(trainDataArr, mean1, sigmod1) 46 | 47 | sum = gamma0 + gamma1 48 | gamma0 = gamma0 / sum 49 | gamma1 = gamma1 / sum 50 | return gamma0, gamma1 51 | 52 | def M(meano, mean1, gamma0, gamma1, trainDataArr): 53 | mean0_new = np.dot(gamma0, trainDataArr) / np.sum(gamma0) 54 | mean1_new = np.dot(gamma1, trainDataArr) / np.sum(gamma1) 55 | 56 | sigmod0_new = math.sqrt(np.dot(gamma0, (trainDataArr - meano)**2) / np.sum(gamma0)) 57 | sigmod1_new = math.sqrt(np.dot(gamma1, (trainDataArr - mean1)**2) / np.sum(gamma1)) 58 | 59 | alpha0_new = np.sum(gamma0) / len(gamma0) 60 | alpha1_new = np.sum(gamma1) / len(gamma1) 61 | 62 | return mean0_new, mean1_new, sigmod0_new, sigmod1_new, alpha0_new, alpha1_new 63 | 64 | 65 | def EM(trainDataList, iter = 500): 66 | trainDataArr = np.array(trainDataList) 67 | 68 | alpha0 = 0.5 69 | mean0 = 0 70 | sigmod0 = 1 71 | alpha1 = 0.5 72 | mean1 = 1 73 | sigmod1 = 1 74 | 75 | count = 0 76 | while (count < iter): 77 | count = count+1 78 | # E步 79 | gamma0, gamma1 = E(trainDataArr, alpha0, mean0, sigmod0, alpha1, mean1, sigmod1) 80 | # M步 81 | mean0, mean1, sigmod0, sigmod1, alpha0, alpha1 = M(mean0, mean1, gamma0, gamma1, trainDataArr) 82 | return alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 83 | 84 | if __name__ == '__main__': 85 | alpha0 = 0.1 86 | mean0 = -4.0 87 | sigmod0 = 0.6 88 | 89 | alpha1 = 0.9 90 | mean1 = 2.2 91 | sigmod1 = 0.1 92 | 93 | #初始化数据集 94 | trainDataList = processData(mean0, sigmod0, mean1, sigmod1, alpha0, alpha1) 95 | 96 | #开始EM算法,进行参数估计 97 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 = EM(trainDataList) 98 | 99 | print('用EM计算之后的数据为:') 100 | print('alpha0:%.1f, mean0:%.1f, sigmod0:%.1f, alpha1:%.1f, mean1:%.1f, sigmod1:%.1f' % ( 101 | alpha0, mean0, sigmod0, alpha1, mean1, sigmod1 102 | )) 103 | 104 | 105 | -------------------------------------------------------------------------------- /codes/MyHMM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | 7 | # 依据训练文本统计PI、A、B 8 | def trainHMM(fileName): 9 | # B:词语的开头 10 | # M:一个词语的中间词 11 | # E:一个词语的结果 12 | # S:非词语,单个词 13 | statuDict = {'B':0, 'M':1, 'E':2, 'S':3} 14 | 15 | # 每个字只有四种状态,所以下方的各类初始化中大小的参数均为4 16 | PI = np.zeros(4) 17 | # 初始化状态转移矩阵A,涉及到四种状态各自到四种状态的转移,因为大小为4x4 18 | A = np.zeros((4, 4)) 19 | # 初始化观测概率矩阵,分别为四种状态到每个字的发射概率 20 | B = np.zeros((4, 65536)) 21 | fr = open(fileName, encoding='utf-8') 22 | 23 | for line in fr.readlines(): 24 | curLine = line.strip().split() 25 | wordLabel = [] 26 | #对每一个单词进行遍历 27 | for i in range(len(curLine)): 28 | #如果长度为1,则直接将该字标记为S,即单个词 29 | if len(curLine[i]) == 1: 30 | label = 'S' 31 | else: 32 | label = 'B' + 'M' * (len(curLine[i]) - 2) + 'E' 33 | #如果是单行开头第一个字,PI中对应位置加1, 34 | if i == 0: PI[statuDict[label[0]]] += 1 35 | for j in range(len(label)): 36 | B[statuDict[label[j]]][ord(curLine[i][j])] += 1 37 | wordLabel.extend(label) 38 | for i in range(1, len(wordLabel)): 39 | A[statuDict[wordLabel[i - 1]]][statuDict[wordLabel[i]]] += 1 40 | 41 | sum = np.sum(PI) 42 | 43 | for i in range(len(PI)): 44 | if PI[i] == 0: PI[i] = -3.14e+100 45 | else: PI[i] = np.log(PI[i] / sum) 46 | 47 | for i in range(len(A)): 48 | sum = np.sum(A[i]) 49 | for j in range(len(A[i])): 50 | if A[i][j] == 0: A[i][j] = -3.14e+100 51 | else: A[i][j] = np.log(A[i][j] / sum) 52 | 53 | for i in range(len(B)): 54 | sum = np.sum(len(B[i])) 55 | for j in range(len(B[i])): 56 | if B[i][j] == 0: B[i][j] = -3.14e+100 57 | else:B[i][j] = np.log(B[i][j] / sum) 58 | 59 | return PI, A, B 60 | 61 | def processTrainData(fileName): 62 | textData = [] 63 | fr = open(fileName, encoding='utf-8') 64 | for line in fr.readlines(): 65 | #读到的每行最后都有一个\n,使用strip将最后的回车符去掉 66 | line = line.strip() 67 | textData.append(line) 68 | 69 | return textData 70 | 71 | def participleTestData(textData, PI, A, B): 72 | retArtical = [] 73 | for line in textData: 74 | delta = [[0 for i in range(4)] for i in range(len(line))] 75 | for i in range(4): 76 | delta[0][i] = PI[i] + B[i][ord(line[0])] 77 | psi = [[0 for i in range(4)] for i in range(len(line))] 78 | 79 | for t in range(1, len(line)): 80 | for i in range(4): 81 | tmpDelta = [0] * 4 82 | for j in range(4): 83 | tmpDelta[j] = delta[t - 1][j] + A[j][i] 84 | maxDelta = max(tmpDelta) 85 | maxDeltaIndex = tmpDelta.index(maxDelta) 86 | delta[t][i] = maxDelta + B[i][ord(line[t])] 87 | psi[t][i] = maxDeltaIndex 88 | 89 | sequence = [] 90 | i_opt = delta[len(line) - 1].index(max(delta[len(line) - 1])) 91 | sequence.append(i_opt) 92 | 93 | for t in range(len(line) - 1, 0, -1): 94 | i_opt = psi[t][i_opt] 95 | sequence.append(i_opt) 96 | 97 | sequence.reverse() 98 | curLine = '' 99 | for i in range(len(line)): 100 | curLine += line[i] 101 | if (sequence[i] == 3 or sequence[i] == 2) and i != (len(line) - 1): 102 | curLine += '|' 103 | retArtical.append(curLine) 104 | return retArtical 105 | 106 | if __name__ == '__main__': 107 | 108 | # 依据人民日报数据集计算HMM参数:PI、A、B 109 | PI, A, B = trainHMM('MyHMMTrainData.txt') 110 | 111 | # 读取测试文章 112 | textData = processTrainData('MyHMMTestData.txt') 113 | 114 | # 打印原文 115 | for line in textData: 116 | print(line) 117 | 118 | # 分词 119 | partiArtical = participleTestData(textData, PI, A, B) 120 | 121 | # 打印结果 122 | print('分词结果:') 123 | for line in partiArtical: 124 | print(line) 125 | -------------------------------------------------------------------------------- /codes/MyHMMTestData.txt: -------------------------------------------------------------------------------- 1 | 我本科就读于北京交通大学软件学院,专业是软件工程,本科做的是开发工作,工程性质较为浓厚。2019年,我保研至北京航空航天大学,我的个性不适合做纯理论研究,因此希望研究生毕业以后从事算法工程师的岗位,研究生期间我需要认真学习算法相关知识,但同时也不能落下工程实现能力,仍然需要较强的开发能力与项目落地能力,尤其是基础算法与数据结构,需要日常进行刷题比如:leetcode。 -------------------------------------------------------------------------------- /codes/MyKNN.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | from collections import Counter 8 | from concurrent import futures 9 | import heapq 10 | 11 | class KNN: 12 | def __init__(self,X_train,y_train,k=3): 13 | # 所需参数初始化 14 | self.k=k 15 | self.X_train=X_train 16 | self.y_train=y_train 17 | 18 | def predict_single(self,X_test): 19 | # 计算与前k个样本点欧氏距离,距离取负值是把原问题转化为取前k个最大的距离 20 | dist_list=[(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i) 21 | for i in range(self.k)] 22 | 23 | # 利用前k个距离构建堆 24 | heapq.heapify(dist_list) 25 | 26 | # 遍历计算与剩下样本点的欧式距离 27 | for i in range(self.k,self.X_train.shape[0]): 28 | dist_i=(-np.linalg.norm(X_test-self.X_train[i],ord=2),self.y_train[i],i) 29 | #进行下堆操作 30 | if dist_i[0]>dist_list[0][0]: 31 | heapq.heappushpop(dist_list,dist_i) 32 | # 若dist_i 比 dis_list的最小值小,堆保持不变,继续遍历 33 | else: 34 | continue 35 | y_list=[dist_list[i][1] for i in range(self.k)] 36 | #[-1,1,1,-1...] 37 | # 对上述k个点的分类进行统计 38 | y_count=Counter(y_list).most_common() 39 | #{1:n,-1:m} 40 | return y_count[0][0] 41 | 42 | # 用多线程提高效率 43 | def predict_many(self,X_test): 44 | # 导入多线程 45 | with futures.ProcessPoolExecutor(max_workers=10) as executor: 46 | # 建立多线程任务 47 | tasks=[executor.submit(self.predict_single,X_test[i]) for i in range(X_test.shape[0])] 48 | # 驱动多线程运行 49 | done_iter=futures.as_completed(tasks) 50 | # 提取结果 51 | res=[future.result() for future in done_iter] 52 | return res 53 | 54 | def cal_right_rate(self,res,y_test): 55 | right_count = 0 56 | wrong_count = 0 57 | for i in range(len(res)): 58 | if res[i] == y_test[i]: 59 | right_count += 1 60 | else: 61 | wrong_count += 1 62 | return right_count / (right_count+wrong_count) 63 | 64 | def processData(filePath): 65 | print('开始读取数据') 66 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 67 | X = [] 68 | y = [] 69 | #默认读取csv的头部 70 | df = pd.read_csv(filePath) 71 | #利用数据集合的第一个维度特征分类 72 | #遍历pandas中df的每一行 73 | for index, row in df.iterrows(): 74 | if(row["Sepal.Length"]>=5.5) : 75 | y.append(1) 76 | else: 77 | y.append(-1) 78 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 79 | return np.array(X), np.array(y) 80 | 81 | 82 | def main(): 83 | #获取数据 84 | X, y = processData('iris.csv') 85 | X_train = X[0:149:4] 86 | y_train = y[0:149:4] 87 | 88 | X_test = X[0:149:10] 89 | y_test = y[0:149:10] 90 | 91 | # 不同的k对分类结果的影响 92 | for k in range(1,6,2): 93 | #构建KNN实例 94 | clf=KNN(X_train,y_train,k=k) 95 | #对测试数据进行分类预测 96 | y_predict=clf.predict_many(X_test) 97 | print("k={},被分类为:{}".format(k,y_predict)) 98 | print("正确率为: ", clf.cal_right_rate(y_predict,y_test)) 99 | 100 | if __name__=="__main__": 101 | main() -------------------------------------------------------------------------------- /codes/MyLogisticRegression.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import time 7 | import pandas as pd 8 | 9 | #使用随机梯度下降 10 | class LogisticRegression: 11 | def __init__(self,learn_rate=0.1,max_iter=10000,tol=1e-3): 12 | # 学习速率 13 | self.learn_rate=learn_rate 14 | # 迭代次数 15 | self.max_iter=max_iter 16 | # 迭代停止阈值 17 | self.tol=tol 18 | # 权重 19 | self.w=None 20 | 21 | def preprocessing(self,X): 22 | row=X.shape[0] 23 | #在末尾加上一列,数值为1 24 | y=np.ones(row).reshape(row, 1) 25 | X_prepro =np.hstack((X,y)) 26 | return X_prepro 27 | 28 | def sigmod(self,x): 29 | return 1/(1+np.exp(-x)) 30 | 31 | def train(self,X_train,y_train): 32 | X=self.preprocessing(X_train) 33 | y=y_train.T 34 | #初始化权重w 35 | self.w=np.array([[0]*X.shape[1]],dtype=np.float) 36 | i=0 37 | k=0 38 | for loop in range(self.max_iter): 39 | # 计算梯度 40 | z=np.dot(X[i],self.w.T) 41 | grad=X[i]*(y[i]-self.sigmod(z)) 42 | # 利用梯度的绝对值作为迭代中止的条件 43 | if (np.abs(grad)<=self.tol).all(): 44 | break 45 | else: 46 | # 更新权重w 梯度上升——求极大值 47 | self.w+=self.learn_rate*grad 48 | k+=1 49 | i=(i+1)%X.shape[0] 50 | print("迭代次数:{}次".format(k)) 51 | print("最终梯度:{}".format(grad)) 52 | print("最终权重:{}".format(self.w[0])) 53 | 54 | def predict(self,x): 55 | p=self.sigmod(np.dot(self.preprocessing(x),self.w.T)) 56 | print("Y=1的概率被估计为:{:.2%}".format(p[0][0])) 57 | p[np.where(p>0.5)]=1 58 | p[np.where(p<0.5)]=0 59 | return p 60 | 61 | def cal_right_rate(self,X,y): 62 | y_c=self.predict(X) 63 | right_count = 0 64 | wrong_count = 0 65 | for i in range(len(y)): 66 | if y_c[i] == y[i]: 67 | right_count += 1 68 | else: 69 | wrong_count += 1 70 | return right_count / (right_count + wrong_count) 71 | error_rate=np.sum(np.abs(y_c-y.T))/y_c.shape[0] 72 | # return 1-error_rate 73 | 74 | #根据文件路径读取Iris数据集数据 75 | #return type: np.array 76 | def processData(filePath): 77 | print('开始读取数据') 78 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 79 | X = [] 80 | y = [] 81 | #默认读取csv的头部 82 | df = pd.read_csv(filePath) 83 | #利用数据集合的第一个维度特征分类 84 | #遍历pandas中df的每一行 85 | for index, row in df.iterrows(): 86 | if row["Species"] == "setosa" : 87 | y.append(1) 88 | else: 89 | y.append(0) 90 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])]) 91 | return np.array(X), np.array(y) 92 | 93 | def main(): 94 | star=time.time() 95 | # 训练数据集 96 | X, y = processData('iris.csv') 97 | X_train = X[0:149:30] 98 | y_train = y[0:149:30] 99 | 100 | #自己在数据集后面加上了干扰的实例 101 | X_test = X[0:151:1] 102 | y_test = y[0:151:1] 103 | 104 | # 构建实例,进行训练 105 | clf=LogisticRegression() 106 | clf.train(X_train,y_train) 107 | 108 | # 预测新数据 109 | y_predict=clf.predict(X_test) 110 | print("{}被分类为:{}".format(X_test[0],y_predict[0])) 111 | 112 | # 利用已有数据对训练模型进行评价 113 | correct_rate=clf.cal_right_rate(X_test,y_test) 114 | print("测试一共有{}组实例,正确率:{:.5%}".format(X_test.shape[0],correct_rate)) 115 | end=time.time() 116 | print("用时:{:.5f}s".format(end-star)) 117 | 118 | if __name__=="__main__": 119 | main() -------------------------------------------------------------------------------- /codes/MyMaxEnt.py: -------------------------------------------------------------------------------- 1 | import time 2 | import numpy as np 3 | import pandas as pd 4 | from collections import defaultdict 5 | 6 | 7 | #根据文件路径读取Iris数据集数据 8 | #return type: list 9 | def processData(filePath): 10 | print('开始读取数据') 11 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 12 | X = [] 13 | y = [] 14 | #默认读取csv的头部 15 | df = pd.read_csv(filePath) 16 | #利用数据集合的第一个维度特征分类 17 | #遍历pandas中df的每一行 18 | for index, row in df.iterrows(): 19 | if(row["Sepal.Length"]>=5.5) : 20 | y.append(1) 21 | else: 22 | y.append(0) 23 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 24 | return X, y 25 | 26 | #最大熵类 27 | class maxEnt: 28 | def __init__(self, trainDataList, trainLabelList, testDataList, testLabelList): 29 | 30 | # 训练数据集 31 | self.trainDataList = trainDataList 32 | # 训练标签集 33 | self.trainLabelList = trainLabelList 34 | # 测试数据集 35 | self.testDataList = testDataList 36 | # 测试标签集 37 | self.testLabelList = testLabelList 38 | # 特征数量 39 | self.featureNum = len(trainDataList[0]) 40 | # 总训练集长度 41 | self.N = len(trainDataList) 42 | # 训练集中(xi,y)对数量 43 | self.n = 0 44 | # 训练集中(xi,y)对数量 45 | self.M = 10000 46 | # 所有(x, y)对出现的次数 47 | self.fixy = self.calc_fixy() 48 | # Pw(y|x)中的w 49 | self.w = [0] * self.n 50 | # (x, y)->id和id->(x, y)的搜索字典 51 | self.xy2idDict, self.id2xyDict = self.createSearchDict() 52 | # Ep_xy期望值 53 | self.Ep_xy = self.calcEp_xy() 54 | 55 | 56 | # 计算特征函数f(x, y) 57 | def calcEpxy(self): 58 | # 初始化期望存放列表,对于每一个xy对都有一个期望 59 | Epxy = [0] * self.n 60 | # 对于每一个样本进行遍历 61 | for i in range(self.N): 62 | # 初始化公式中的P(y|x)列表 63 | Pwxy = [0] * 2 64 | # 计算P(y = 0 } X) 65 | # 注:程序中X表示是一个样本的全部特征,x表示单个特征,这里是全部特征的一个样本 66 | Pwxy[0] = self.calcPwy_x(self.trainDataList[i], 0) 67 | # 计算P(y = 1 } X) 68 | Pwxy[1] = self.calcPwy_x(self.trainDataList[i], 1) 69 | 70 | for feature in range(self.featureNum): 71 | for y in range(2): 72 | if (self.trainDataList[i][feature], y) in self.fixy[feature]: 73 | id = self.xy2idDict[feature][(self.trainDataList[i][feature], y)] 74 | Epxy[id] += (1 / self.N) * Pwxy[y] 75 | return Epxy 76 | 77 | # 计算特征函数f(x, y) 78 | # :return: 计算得到的Ep_xy 79 | def calcEp_xy(self): 80 | 81 | # 初始化Ep_xy列表,长度为n 82 | Ep_xy = [0] * self.n 83 | 84 | # 遍历每一个特征 85 | for feature in range(self.featureNum): 86 | # 遍历每个特征中的(x, y)对 87 | for (x, y) in self.fixy[feature]: 88 | # 获得其id 89 | id = self.xy2idDict[feature][(x, y)] 90 | # 将计算得到的Ep_xy写入对应的位置中 91 | # fixy中存放所有对在训练集中出现过的次数,处于训练集总长度N就是概率了 92 | Ep_xy[id] = self.fixy[feature][(x, y)] / self.N 93 | 94 | # 返回期望 95 | return Ep_xy 96 | 97 | 98 | # 创建查询字典 99 | # xy2idDict:通过(x, y)对找到其id, 所有出现过的xy对都有一个id 100 | # id2xyDict:通过id找到对应的(x, y)对 101 | def createSearchDict(self): 102 | # 设置xy搜多id字典 103 | # 不同特征的xy存入不同特征内的字典 104 | xy2idDict = [{} for i in range(self.featureNum)] 105 | # 初始化id到xy对的字典。因为id与(x,y)的指向是唯一的,所以可以使用一个字典 106 | id2xyDict = {} 107 | 108 | # 设置缩影,其实就是最后的id 109 | index = 0 110 | # 对特征进行遍历 111 | for feature in range(self.featureNum): 112 | # 对出现过的每一个(x, y)对进行遍历 113 | # fixy:内部存放特征数目个字典,对于遍历的每一个特征,单独读取对应字典内的(x, y)对 114 | for (x, y) in self.fixy[feature]: 115 | # 将该(x, y)对存入字典中,要注意存入时通过[feature]指定了存入哪个特征内部的字典 116 | # 同时将index作为该对的id号 117 | xy2idDict[feature][(x, y)] = index 118 | # 同时在id->xy字典中写入id号,val为(x, y)对 119 | id2xyDict[index] = (x, y) 120 | # id加一 121 | index += 1 122 | 123 | # 返回创建的两个字典 124 | return xy2idDict, id2xyDict 125 | 126 | # 计算(x, y)在训练集中出现过的次数 127 | def calc_fixy(self): 128 | # 建立特征数目个字典,属于不同特征的(x, y)对存入不同的字典中,保证不被混淆 129 | fixyDict = [defaultdict(int) for i in range(self.featureNum)] 130 | # 遍历训练集中所有样本 131 | for i in range(len(self.trainDataList)): 132 | # 遍历样本中所有特征 133 | for j in range(self.featureNum): 134 | # 将出现过的(x, y)对放入字典中并计数值加1 135 | fixyDict[j][(self.trainDataList[i][j], self.trainLabelList[i])] += 1 136 | # 对整个大字典进行计数,判断去重后还有多少(x, y)对,写入n 137 | for i in fixyDict: 138 | self.n += len(i) 139 | # 返回大字典 140 | return fixyDict 141 | 142 | # 计算得到的Pw(Y | X) 143 | def calcPwy_x(self, X, y): 144 | # 分子 145 | numerator = 0 146 | # 分母 147 | Z = 0 148 | # 对每个特征进行遍历 149 | for i in range(self.featureNum): 150 | # 如果该(xi,y)对在训练集中出现过 151 | if (X[i], y) in self.xy2idDict[i]: 152 | # 在xy->id字典中指定当前特征i,以及(x, y)对:(X[i], y),读取其id 153 | index = self.xy2idDict[i][(X[i], y)] 154 | # 分子是wi和fi(x,y)的连乘再求和,最后指数 155 | # 由于当(x, y)存在时fi(x,y)为1,因为xy对肯定存在,所以直接就是1 156 | # 对于分子来说,就是n个wi累加,最后再指数就可以了 157 | # 因为有n个w,所以通过id将w与xy绑定,前文的两个搜索字典中的id就是用在这里 158 | numerator += self.w[index] 159 | # 同时计算其他一种标签y时候的分子,下面的z并不是全部的分母,再加上上式的分子以后 160 | # 才是完整的分母,即z = z + numerator 161 | if (X[i], 1 - y) in self.xy2idDict[i]: 162 | # 原理与上式相同 163 | index = self.xy2idDict[i][(X[i], 1 - y)] 164 | Z += self.w[index] 165 | # 计算分子的指数 166 | numerator = np.exp(numerator) 167 | # 计算分母的z 168 | Z = np.exp(Z) + numerator 169 | # 返回Pw(y|x) 170 | return numerator / Z 171 | 172 | def maxEntropyTrain(self, iter=500): 173 | # 设置迭代次数寻找最优解 174 | for i in range(iter): 175 | # 单次迭代起始时间点 176 | iterStart = time.time() 177 | 178 | # 计算“6.2.3 最大熵模型的学习”中的第二个期望(83页最上方哪个) 179 | Epxy = self.calcEpxy() 180 | 181 | # 使用的是IIS,所以设置sigma列表 182 | sigmaList = [0] * self.n 183 | # 对于所有的n进行一次遍历 184 | for j in range(self.n): 185 | # 依据“6.3.1 改进的迭代尺度法” 式6.34计算 186 | sigmaList[j] = (1 / self.M) * np.log(self.Ep_xy[j] / Epxy[j]) 187 | 188 | # 按照算法6.1步骤二中的(b)更新w 189 | self.w = [self.w[i] + sigmaList[i] for i in range(self.n)] 190 | 191 | # 单次迭代结束 192 | iterEnd = time.time() 193 | 194 | # 预测标签 195 | def predict(self, X): 196 | # 因为y只有0和1,所有建立两个长度的概率列表 197 | result = [0] * 2 198 | # 循环计算两个概率 199 | for i in range(2): 200 | # 计算样本x的标签为i的概率 201 | result[i] = self.calcPwy_x(X, i) 202 | # 返回标签 203 | # max(result):找到result中最大的那个概率值 204 | # result.index(max(result)):通过最大的那个概率值再找到其索引,索引是0就返回0,1就返回1 205 | return result.index(max(result)) 206 | 207 | def test(self): 208 | # 错误值计数 209 | errorCnt = 0 210 | # 对测试集中所有样本进行遍历 211 | for i in range(len(self.testDataList)): 212 | # 预测该样本对应的标签 213 | result = self.predict(self.testDataList[i]) 214 | # 如果错误,计数值加1 215 | if result != self.testLabelList[i]: errorCnt += 1 216 | # 返回准确率 217 | return 1 - errorCnt / len(self.testDataList) 218 | 219 | 220 | if __name__ == '__main__': 221 | start = time.time() 222 | X, y = processData('iris.csv') 223 | 224 | X_train = X[0:149:30] 225 | y_train = y[0:149:30] 226 | 227 | # 自己在数据集后面加上了干扰的实例 228 | X_test = X[0:151:1] 229 | y_test = y[0:151:1] 230 | 231 | # 初始化最大熵类 232 | maxEnt = maxEnt(X_train, y_train, X_test, y_test) 233 | 234 | # 开始训练 235 | maxEnt.maxEntropyTrain() 236 | 237 | # 开始测试 238 | right_rate = maxEnt.test() 239 | print('准确度为:', right_rate) 240 | 241 | # 打印时间 242 | print('花费的时间为:', time.time() - start) 243 | -------------------------------------------------------------------------------- /codes/MyNaiveBayes.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | class NaiveBayes(): 9 | def __init__(self,lambda_): 10 | # 贝叶斯系数 取0时,即为极大似然估计 11 | self.lambda_=lambda_ 12 | # y的(类型:数量) 13 | self.y_types_count=None 14 | # y的(类型:概率) 15 | self.y_types_proba=None 16 | # (xi 的编号,xi的取值,y的类型):概率 17 | self.x_types_proba=dict() 18 | 19 | def fit(self,X_train,y_train): 20 | # y的所有取值类型 21 | self.y_types=np.unique(y_train) 22 | # 转化成pandas df 数据格式 23 | X=pd.DataFrame(X_train) 24 | y=pd.DataFrame(y_train) 25 | # y的(类型:数量)统计 26 | self.y_types_count=y[0].value_counts() 27 | # y的(类型:概率)计算 28 | self.y_types_proba=(self.y_types_count+self.lambda_)/(y.shape[0]+len(self.y_types)*self.lambda_) 29 | 30 | # (xi 的编号,xi的取值,y的类型):概率的计算 - 遍历xi 31 | for idx in X.columns: 32 | # 选取每一个y的类型 33 | for j in self.y_types: 34 | # 选择所有y==j为真的数据点的第idx个特征的值,并对这些值进行(类型:数量)统计 35 | p_x_y=X[(y==j).values][idx].value_counts() 36 | # 计算(xi 的编号,xi的取值,y的类型):概率 37 | for i in p_x_y.index: 38 | self.x_types_proba[(idx,i,j)]=(p_x_y[i]+self.lambda_)/(self.y_types_count[j]+p_x_y.shape[0]*self.lambda_) 39 | 40 | def predict(self,X_new): 41 | res=[] 42 | # 遍历y的可能取值 43 | for y in self.y_types: 44 | # 计算y的先验概率P(Y=ck) 45 | p_y=self.y_types_proba[y] 46 | p_xy=1 47 | for idx,x in enumerate(X_new): 48 | # 计算P(X=(x1,x2...xd)/Y=ck) 49 | p_xy*=self.x_types_proba[(idx,x,y)] 50 | res.append(p_y*p_xy) 51 | for i in range(len(self.y_types)): 52 | print("[{}]对应概率:{:.2%}".format(self.y_types[i],res[i])) 53 | #返回最大后验概率对应的y值 54 | return self.y_types[np.argmax(res)] 55 | 56 | def processData(filePath): 57 | print('开始读取数据') 58 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 59 | X = [] 60 | y = [] 61 | #默认读取csv的头部 62 | df = pd.read_csv(filePath) 63 | #利用数据集合的第一个维度特征分类 64 | #遍历pandas中df的每一行 65 | for index, row in df.iterrows(): 66 | if(row["Sepal.Length"]>=5.5) : 67 | y.append(1) 68 | else: 69 | y.append(-1) 70 | X.append([float(row["Sepal.Width"]),str(row["Species"])]) 71 | return np.array(X), np.array(y) 72 | 73 | def main(): 74 | X, y = processData('iris.csv') 75 | X_train = X[0:149:4] 76 | y_train = y[0:149:4] 77 | 78 | 79 | clf=NaiveBayes(lambda_= 0.5) 80 | clf.fit(X_train,y_train) 81 | 82 | X_test=np.array([3.5,"setosa"]) 83 | y_predict=clf.predict(X_test) 84 | print("{}被分类为:{}".format(X_test,y_predict)) 85 | 86 | if __name__=="__main__": 87 | main() -------------------------------------------------------------------------------- /codes/MyPerceptron.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | 8 | #根据文件路径读取Iris数据集数据 9 | #return type: list 10 | def processData(filePath): 11 | print('开始读取数据') 12 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 13 | X = [] 14 | y = [] 15 | #默认读取csv的头部 16 | df = pd.read_csv(filePath) 17 | #利用数据集合的第一个维度特征分类 18 | #遍历pandas中df的每一行 19 | for index, row in df.iterrows(): 20 | if(row["Sepal.Length"]>=5.5) : 21 | y.append(1) 22 | else: 23 | y.append(-1) 24 | X.append([float(row["Sepal.Width"]),float(row["Petal.Length"])]) 25 | return X, y 26 | 27 | 28 | #感知机类 29 | class MyPerceptron: 30 | def __init__(self): 31 | # 参数w 32 | self.w = None 33 | # 偏置b 34 | self.b = 0 35 | # 表示学习速率 36 | self.l_rate = 0.0001 37 | #表示迭代次数 38 | self.iter = 100 39 | 40 | #训练 41 | def train(self, X_train, y_train): 42 | print('开始训练') 43 | # 将数据转换成矩阵形式 44 | # 转换后的数据中每一个样本的向量都是横向的 45 | X_trainMat = np.mat(X_train) 46 | y_trainMat = np.mat(y_train).T 47 | # 获取数据矩阵的大小,为m*n 48 | m, n = np.shape(X_trainMat) 49 | #np.shape(X_trainMat)[1]表示的维度=样本的长度 50 | self.w = np.zeros((1, np.shape(X_trainMat)[1])) 51 | 52 | # 进行iter次迭代计算 53 | for k in range(self.iter): 54 | ##利用随机梯度下降 55 | for i in range(m): 56 | # 获取当前样本的向量 57 | xi = X_trainMat[i] 58 | # 获取当前样本所对应的标签 59 | yi = y_trainMat[i] 60 | # 判断是否是误分类样本 61 | # 误分类样本特诊为: -yi(w*xi+b)>=0,详细可参考书中2.2.2小节 62 | # 在书的公式中写的是>0,实际上如果=0,说明改点在超平面上,也是不正确的 63 | if -1 * yi * (self.w * xi.T + self.b) >= 0: 64 | # 对于误分类样本,进行梯度下降,更新w和b 65 | self.w = self.w + self.l_rate * yi * xi 66 | self.b = self.b + self.l_rate * yi 67 | 68 | #测试 69 | def predict(self,X_test, y_test): 70 | print('开始预测') 71 | X_testMat = np.mat(X_test) 72 | y_testMat = np.mat(y_test).T 73 | 74 | #获取测试数据集矩阵的大小 75 | m, n = np.shape(X_testMat) 76 | #错误样本数计数 77 | rightCount = 0 78 | 79 | for i in range(m): 80 | #获得单个样本向量 81 | xi = X_testMat[i] 82 | #获得该样本标记 83 | yi = y_testMat[i] 84 | #获得运算结果 85 | result = yi * (self.w * xi.T + self.b) 86 | #如果-yi(w*xi+b)>=0,说明该样本被误分类,错误样本数加一 87 | if result >= 0: rightCount += 1 88 | #正确率 = 1 - (样本分类错误数 / 样本总数) 89 | rightRate = rightCount / m 90 | #返回正确率 91 | return rightRate 92 | 93 | 94 | def main(): 95 | X,y = processData('iris.csv') 96 | 97 | # 构建感知机对象,对数据集训练并且预测 98 | perceptron=MyPerceptron() 99 | perceptron.train(X[0:100],y[0:100]) 100 | rightRate = perceptron.predict(X[101:140],y[101:140]) 101 | print('对测试集的分类的正确率为:',rightRate) 102 | #有二维输入,所以应该有2个w 103 | print('模型的参数w为:',perceptron.w) 104 | print('模型的参数b为',perceptron.b) 105 | 106 | 107 | if __name__ == '__main__': 108 | main() -------------------------------------------------------------------------------- /codes/MySVM.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | #Author:lrrlrr 3 | #Email:kingsundad@gmail.com 4 | 5 | import numpy as np 6 | import pandas as pd 7 | import math 8 | import random 9 | 10 | #根据文件路径读取Iris数据集数据 11 | # #return type: np.array 12 | def processData(filePath): 13 | # 存放数据集的list,X表示的是输入可能有很多维度,y表示输出的分类只有一个 14 | X = [] 15 | y = [] 16 | #默认读取csv的头部 17 | df = pd.read_csv(filePath) 18 | #利用数据集合的第一个维度特征分类 19 | #遍历pandas中df的每一行 20 | for index, row in df.iterrows(): 21 | if row["Species"] == "setosa" : 22 | y.append(1) 23 | else: 24 | y.append(-1) 25 | X.append([float(row["Sepal.Length"]),float(row["Sepal.Width"]),float(row["Petal.Length"])]) 26 | return np.array(X), np.array(y) 27 | 28 | 29 | # X_train:训练数据集 30 | # y_train: 训练测试集 31 | # sigma: 高斯核中分母的σ,在核函数中σ的值,高度依赖样本特征值范围,特征值范围较大时若不相应增大σ会导致所有计算得到的核函数均为0 32 | # C:软间隔中的惩罚参数,调和间隔与误分类点的系数 33 | # toler:松弛变量 34 | class SVM: 35 | def __init__(self, X_train, y_train, sigma = 10, C = 200, toler = 0.001): 36 | 37 | self.train_XMat = np.mat(X_train) 38 | # 训练标签集,为了方便后续运算提前做了转置,变为列向量 39 | self.train_yMat = np.mat(y_train).T 40 | # m:训练集数量 n:样本特征数目 41 | self.m, self.n = np.shape(self.train_XMat) 42 | self.sigma = sigma 43 | self.C = C 44 | self.toler = toler 45 | 46 | # 核函数(初始化时提前计算) 47 | self.k = self.calculateKernel() 48 | # SVM中的偏置b 49 | self.b = 0 50 | # α 长度为训练集数目 51 | self.alpha = [0] * self.train_XMat.shape[0] 52 | # SMO运算过程中的Ei 53 | self.E = [0 * self.train_yMat[i, 0] for i in range(self.train_yMat.shape[0])] 54 | self.supportVecIndex = [] 55 | 56 | 57 | # 使用高斯核函数 58 | def calculateKernel(self): 59 | #初始化高斯核结果矩阵 大小 = 训练集长度m * 训练集长度m 60 | #k[i][j] = Xi * Xj 61 | k = [[0 for i in range(self.m)] for j in range(self.m)] 62 | for i in range(self.m): 63 | X = self.train_XMat[i, :] 64 | for j in range(i, self.m): 65 | Z = self.train_XMat[j, :] 66 | #先计算||X - Z||^2 67 | result = (X - Z) * (X - Z).T 68 | #分子除以分母后去指数,得到高斯核结果 69 | result = np.exp(-1 * result / (2 * self.sigma**2)) 70 | #将Xi*Xj的结果存放入k[i][j]和k[j][i]中 71 | k[i][j] = result 72 | k[j][i] = result 73 | return k 74 | 75 | # 查看第i个α是否满足KKT条件 76 | def isSatisfyKKT(self, i): 77 | gxi =self.calculate_gxi(i) 78 | yi = self.train_yMat[i] 79 | if (math.fabs(self.alpha[i]) < self.toler) and (yi * gxi >= 1): 80 | return True 81 | elif (math.fabs(self.alpha[i] - self.C) < self.toler) and (yi * gxi <= 1): 82 | return True 83 | elif (self.alpha[i] > -self.toler) and (self.alpha[i] < (self.C + self.toler)) \ 84 | and (math.fabs(yi * gxi - 1) < self.toler): 85 | return True 86 | 87 | return False 88 | 89 | def calculate_gxi(self, i): 90 | gxi = 0 91 | index = [i for i, alpha in enumerate(self.alpha) if alpha != 0] 92 | # 遍历每一个非零α,i为非零α的下标 93 | for j in index: 94 | #计算g(xi) 95 | gxi += self.alpha[j] * self.train_yMat[j] * self.k[j][i] 96 | # 求和结束后再单独加上偏置b 97 | gxi += self.b 98 | 99 | #返回 100 | return gxi 101 | 102 | def calculateEi(self, i): 103 | # 计算g(xi) 104 | gxi = self.calculate_gxi(i) 105 | # Ei = g(xi) - yi,直接将结果作为Ei返回 106 | return gxi - self.train_yMat[i] 107 | 108 | 109 | # E1: 第一个变量的E1 110 | # i: 第一个变量α的下标 111 | def getAlphaJ(self, E1, i): 112 | E2 = 0 113 | maxE1_E2 = -1 114 | maxIndex = -1 115 | nozeroE = [i for i, Ei in enumerate(self.E) if Ei != 0] 116 | 117 | for j in nozeroE: 118 | E2_tmp = self.calculateEi(j) 119 | if math.fabs(E1 - E2_tmp) > maxE1_E2: 120 | #更新 121 | maxE1_E2 = math.fabs(E1 - E2_tmp) 122 | E2 = E2_tmp 123 | maxIndex = j 124 | if maxIndex == -1: 125 | maxIndex = i 126 | while maxIndex == i: 127 | maxIndex = int(random.uniform(0, self.m)) 128 | E2 = self.calculateEi(maxIndex) 129 | return E2, maxIndex 130 | 131 | def train(self, count = 100): 132 | countCur = 0; parameterChanged = 1 133 | while (countCur < count) and (parameterChanged > 0): 134 | countCur += 1 135 | parameterChanged = 0 136 | 137 | for i in range(self.m): 138 | #是否满足KKT条件,如果不满足则作为SMO中第一个变量从而进行优化 139 | if self.isSatisfyKKT(i) == False: 140 | #如果下标为i的α不满足KKT条件,则进行优化 141 | E1 = self.calculateEi(i) 142 | E2, j = self.getAlphaJ(E1, i) 143 | 144 | y1 = self.train_yMat[i] 145 | y2 = self.train_yMat[j] 146 | 147 | alphaOld_1 = self.alpha[i] 148 | alphaOld_2 = self.alpha[j] 149 | 150 | if y1 != y2: 151 | L = max(0, alphaOld_2 - alphaOld_1) 152 | H = min(self.C, self.C + alphaOld_2 - alphaOld_1) 153 | else: 154 | L = max(0, alphaOld_2 + alphaOld_1 - self.C) 155 | H = min(self.C, alphaOld_2 + alphaOld_1) 156 | 157 | if L == H: 158 | continue 159 | 160 | #计算α的新值 161 | k11 = self.k[i][i] 162 | k22 = self.k[j][j] 163 | k21 = self.k[j][i] 164 | k12 = self.k[i][j] 165 | 166 | alphaNew_2 = alphaOld_2 + y2 * (E1 - E2) / (k11 + k22 - 2 * k12) 167 | 168 | if alphaNew_2 < L: alphaNew_2 = L 169 | elif alphaNew_2 > H: alphaNew_2 = H 170 | #更新α1 171 | alphaNew_1 = alphaOld_1 + y1 * y2 * (alphaOld_2 - alphaNew_2) 172 | 173 | #计算b1和b2 174 | b1New = -1 * E1 - y1 * k11 * (alphaNew_1 - alphaOld_1) \ 175 | - y2 * k21 * (alphaNew_2 - alphaOld_2) + self.b 176 | b2New = -1 * E2 - y1 * k12 * (alphaNew_1 - alphaOld_1) \ 177 | - y2 * k22 * (alphaNew_2 - alphaOld_2) + self.b 178 | 179 | #依据α1和α2的值范围确定新b 180 | if (alphaNew_1 > 0) and (alphaNew_1 < self.C): 181 | bNew = b1New 182 | elif (alphaNew_2 > 0) and (alphaNew_2 < self.C): 183 | bNew = b2New 184 | else: 185 | bNew = (b1New + b2New) / 2 186 | 187 | #将更新后的各类值写入,进行更新 188 | self.alpha[i] = alphaNew_1 189 | self.alpha[j] = alphaNew_2 190 | self.b = bNew 191 | 192 | self.E[i] = self.calculateEi(i) 193 | self.E[j] = self.calculateEi(j) 194 | 195 | #如果α2的改变量过于小,就认为该参数未改变,不增加parameterChanged值 196 | #反之则自增1 197 | if math.fabs(alphaNew_2 - alphaOld_2) >= 0.00001: 198 | parameterChanged += 1 199 | 200 | #全部计算结束后,重新遍历一遍α,查找里面的支持向量 201 | for i in range(self.m): 202 | #如果α>0,说明是支持向量 203 | if self.alpha[i] > 0: 204 | #将支持向量的索引保存起来 205 | self.supportVecIndex.append(i) 206 | 207 | # 单独计算核函数 208 | def calculateSinglKernel(self, x1, x2): 209 | # 计算高斯核 210 | result = (x1 - x2) * (x1 - x2).T 211 | result = np.exp(-1 * result / (2 * self.sigma ** 2)) 212 | return np.exp(result) 213 | 214 | # 对样本的标签进行预测 215 | def predict(self, x): 216 | result = 0 217 | for i in self.supportVecIndex: 218 | # 遍历所有支持向量,计算求和式 219 | tmp = self.calculateSinglKernel(self.train_XMat[i, :], np.mat(x)) 220 | result += self.alpha[i] * self.train_yMat[i] * tmp 221 | # 偏置b 222 | result += self.b 223 | 224 | return np.sign(result) 225 | 226 | 227 | 228 | def test(self, X_test, y_test): 229 | 230 | rightCount = 0 231 | 232 | for i in range(len(X_test)): 233 | result = self.predict(X_test[i]) 234 | if result == y_test[i]: 235 | rightCount += 1 236 | return rightCount / len(X_test) 237 | 238 | 239 | if __name__ == '__main__': 240 | 241 | X, y = processData('iris.csv') 242 | 243 | X_train = X[0:149:50] 244 | y_train = y[0:149:50] 245 | 246 | # 自己在数据集后面加上了干扰的实例 247 | X_test = X[0:150:1] 248 | y_test = y[0:150:1] 249 | 250 | # 初始化SVM类 251 | svm = SVM(X_train, y_train, 10, 200, 0.001) 252 | 253 | # 开始训练 254 | svm.train() 255 | 256 | # 开始测试 257 | rightRate = svm.test(X_test, y_test) 258 | print('准确率为百分之 %d' % (rightRate * 100)) -------------------------------------------------------------------------------- /codes/iris.csv: -------------------------------------------------------------------------------- 1 | "Number","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species" 2 | "1",5.1,3.5,1.4,0.2,"setosa" 3 | "2",4.9,3,1.4,0.2,"setosa" 4 | "3",4.7,3.2,1.3,0.2,"setosa" 5 | "4",4.6,3.1,1.5,0.2,"setosa" 6 | "5",5,3.6,1.4,0.2,"setosa" 7 | "6",5.4,3.9,1.7,0.4,"setosa" 8 | "7",4.6,3.4,1.4,0.3,"setosa" 9 | "8",5,3.4,1.5,0.2,"setosa" 10 | "9",4.4,2.9,1.4,0.2,"setosa" 11 | "10",4.9,3.1,1.5,0.1,"setosa" 12 | "11",5.4,3.7,1.5,0.2,"setosa" 13 | "12",4.8,3.4,1.6,0.2,"setosa" 14 | "13",4.8,3,1.4,0.1,"setosa" 15 | "14",4.3,3,1.1,0.1,"setosa" 16 | "15",5.8,4,1.2,0.2,"setosa" 17 | "16",5.7,4.4,1.5,0.4,"setosa" 18 | "17",5.4,3.9,1.3,0.4,"setosa" 19 | "18",5.1,3.5,1.4,0.3,"setosa" 20 | "19",5.7,3.8,1.7,0.3,"setosa" 21 | "20",5.1,3.8,1.5,0.3,"setosa" 22 | "21",5.4,3.4,1.7,0.2,"setosa" 23 | "22",5.1,3.7,1.5,0.4,"setosa" 24 | "23",4.6,3.6,1,0.2,"setosa" 25 | "24",5.1,3.3,1.7,0.5,"setosa" 26 | "25",4.8,3.4,1.9,0.2,"setosa" 27 | "26",5,3,1.6,0.2,"setosa" 28 | "27",5,3.4,1.6,0.4,"setosa" 29 | "28",5.2,3.5,1.5,0.2,"setosa" 30 | "29",5.2,3.4,1.4,0.2,"setosa" 31 | "30",4.7,3.2,1.6,0.2,"setosa" 32 | "31",4.8,3.1,1.6,0.2,"setosa" 33 | "32",5.4,3.4,1.5,0.4,"setosa" 34 | "33",5.2,4.1,1.5,0.1,"setosa" 35 | "34",5.5,4.2,1.4,0.2,"setosa" 36 | "35",4.9,3.1,1.5,0.2,"setosa" 37 | "36",5,3.2,1.2,0.2,"setosa" 38 | "37",5.5,3.5,1.3,0.2,"setosa" 39 | "38",4.9,3.6,1.4,0.1,"setosa" 40 | "39",4.4,3,1.3,0.2,"setosa" 41 | "40",5.1,3.4,1.5,0.2,"setosa" 42 | "41",5,3.5,1.3,0.3,"setosa" 43 | "42",4.5,2.3,1.3,0.3,"setosa" 44 | "43",4.4,3.2,1.3,0.2,"setosa" 45 | "44",5,3.5,1.6,0.6,"setosa" 46 | "45",5.1,3.8,1.9,0.4,"setosa" 47 | "46",4.8,3,1.4,0.3,"setosa" 48 | "47",5.1,3.8,1.6,0.2,"setosa" 49 | "48",4.6,3.2,1.4,0.2,"setosa" 50 | "49",5.3,3.7,1.5,0.2,"setosa" 51 | "50",5,3.3,1.4,0.2,"setosa" 52 | "51",7,3.2,4.7,1.4,"versicolor" 53 | "52",6.4,3.2,4.5,1.5,"versicolor" 54 | "53",6.9,3.1,4.9,1.5,"versicolor" 55 | "54",5.5,2.3,4,1.3,"versicolor" 56 | "55",6.5,2.8,4.6,1.5,"versicolor" 57 | "56",5.7,2.8,4.5,1.3,"versicolor" 58 | "57",6.3,3.3,4.7,1.6,"versicolor" 59 | "58",4.9,2.4,3.3,1,"versicolor" 60 | "59",6.6,2.9,4.6,1.3,"versicolor" 61 | "60",5.2,2.7,3.9,1.4,"versicolor" 62 | "61",5,2,3.5,1,"versicolor" 63 | "62",5.9,3,4.2,1.5,"versicolor" 64 | "63",6,2.2,4,1,"versicolor" 65 | "64",6.1,2.9,4.7,1.4,"versicolor" 66 | "65",5.6,2.9,3.6,1.3,"versicolor" 67 | "66",6.7,3.1,4.4,1.4,"versicolor" 68 | "67",5.6,3,4.5,1.5,"versicolor" 69 | "68",5.8,2.7,4.1,1,"versicolor" 70 | "69",6.2,2.2,4.5,1.5,"versicolor" 71 | "70",5.6,2.5,3.9,1.1,"versicolor" 72 | "71",5.9,3.2,4.8,1.8,"versicolor" 73 | "72",6.1,2.8,4,1.3,"versicolor" 74 | "73",6.3,2.5,4.9,1.5,"versicolor" 75 | "74",6.1,2.8,4.7,1.2,"versicolor" 76 | "75",6.4,2.9,4.3,1.3,"versicolor" 77 | "76",6.6,3,4.4,1.4,"versicolor" 78 | "77",6.8,2.8,4.8,1.4,"versicolor" 79 | "78",6.7,3,5,1.7,"versicolor" 80 | "79",6,2.9,4.5,1.5,"versicolor" 81 | "80",5.7,2.6,3.5,1,"versicolor" 82 | "81",5.5,2.4,3.8,1.1,"versicolor" 83 | "82",5.5,2.4,3.7,1,"versicolor" 84 | "83",5.8,2.7,3.9,1.2,"versicolor" 85 | "84",6,2.7,5.1,1.6,"versicolor" 86 | "85",5.4,3,4.5,1.5,"versicolor" 87 | "86",6,3.4,4.5,1.6,"versicolor" 88 | "87",6.7,3.1,4.7,1.5,"versicolor" 89 | "88",6.3,2.3,4.4,1.3,"versicolor" 90 | "89",5.6,3,4.1,1.3,"versicolor" 91 | "90",5.5,2.5,4,1.3,"versicolor" 92 | "91",5.5,2.6,4.4,1.2,"versicolor" 93 | "92",6.1,3,4.6,1.4,"versicolor" 94 | "93",5.8,2.6,4,1.2,"versicolor" 95 | "94",5,2.3,3.3,1,"versicolor" 96 | "95",5.6,2.7,4.2,1.3,"versicolor" 97 | "96",5.7,3,4.2,1.2,"versicolor" 98 | "97",5.7,2.9,4.2,1.3,"versicolor" 99 | "98",6.2,2.9,4.3,1.3,"versicolor" 100 | "99",5.1,2.5,3,1.1,"versicolor" 101 | "100",5.7,2.8,4.1,1.3,"versicolor" 102 | "101",6.3,3.3,6,2.5,"virginica" 103 | "102",5.8,2.7,5.1,1.9,"virginica" 104 | "103",7.1,3,5.9,2.1,"virginica" 105 | "104",6.3,2.9,5.6,1.8,"virginica" 106 | "105",6.5,3,5.8,2.2,"virginica" 107 | "106",7.6,3,6.6,2.1,"virginica" 108 | "107",4.9,2.5,4.5,1.7,"virginica" 109 | "108",7.3,2.9,6.3,1.8,"virginica" 110 | "109",6.7,2.5,5.8,1.8,"virginica" 111 | "110",7.2,3.6,6.1,2.5,"virginica" 112 | "111",6.5,3.2,5.1,2,"virginica" 113 | "112",6.4,2.7,5.3,1.9,"virginica" 114 | "113",6.8,3,5.5,2.1,"virginica" 115 | "114",5.7,2.5,5,2,"virginica" 116 | "115",5.8,2.8,5.1,2.4,"virginica" 117 | "116",6.4,3.2,5.3,2.3,"virginica" 118 | "117",6.5,3,5.5,1.8,"virginica" 119 | "118",7.7,3.8,6.7,2.2,"virginica" 120 | "119",7.7,2.6,6.9,2.3,"virginica" 121 | "120",6,2.2,5,1.5,"virginica" 122 | "121",6.9,3.2,5.7,2.3,"virginica" 123 | "122",5.6,2.8,4.9,2,"virginica" 124 | "123",7.7,2.8,6.7,2,"virginica" 125 | "124",6.3,2.7,4.9,1.8,"virginica" 126 | "125",6.7,3.3,5.7,2.1,"virginica" 127 | "126",7.2,3.2,6,1.8,"virginica" 128 | "127",6.2,2.8,4.8,1.8,"virginica" 129 | "128",6.1,3,4.9,1.8,"virginica" 130 | "129",6.4,2.8,5.6,2.1,"virginica" 131 | "130",7.2,3,5.8,1.6,"virginica" 132 | "131",7.4,2.8,6.1,1.9,"virginica" 133 | "132",7.9,3.8,6.4,2,"virginica" 134 | "133",6.4,2.8,5.6,2.2,"virginica" 135 | "134",6.3,2.8,5.1,1.5,"virginica" 136 | "135",6.1,2.6,5.6,1.4,"virginica" 137 | "136",7.7,3,6.1,2.3,"virginica" 138 | "137",6.3,3.4,5.6,2.4,"virginica" 139 | "138",6.4,3.1,5.5,1.8,"virginica" 140 | "139",6,3,4.8,1.8,"virginica" 141 | "140",6.9,3.1,5.4,2.1,"virginica" 142 | "141",6.7,3.1,5.6,2.4,"virginica" 143 | "142",6.9,3.1,5.1,2.3,"virginica" 144 | "143",5.8,2.7,5.1,1.9,"virginica" 145 | "144",6.8,3.2,5.9,2.3,"virginica" 146 | "145",6.7,3.3,5.7,2.5,"virginica" 147 | "146",6.7,3,5.2,2.3,"virginica" 148 | "147",6.3,2.5,5,1.9,"virginica" 149 | "148",6.5,3,5.2,2,"virginica" 150 | "149",6.2,3.4,5.4,2.3,"virginica" 151 | "150",5.9,3,5.1,1.8,"virginica" 152 | "151",52.9,322,52.221,1212.8,"virginica" 153 | 154 | -------------------------------------------------------------------------------- /docImage/10_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_1_1.jpg -------------------------------------------------------------------------------- /docImage/10_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/10_2_1.jpg -------------------------------------------------------------------------------- /docImage/11_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_1.jpg -------------------------------------------------------------------------------- /docImage/11_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_1_2.jpg -------------------------------------------------------------------------------- /docImage/11_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_1.jpg -------------------------------------------------------------------------------- /docImage/11_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_2.jpg -------------------------------------------------------------------------------- /docImage/11_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/11_2_3.jpg -------------------------------------------------------------------------------- /docImage/8_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_1.jpg -------------------------------------------------------------------------------- /docImage/8_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_2.jpg -------------------------------------------------------------------------------- /docImage/8_1_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_1_3.jpg -------------------------------------------------------------------------------- /docImage/8_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_1.jpg -------------------------------------------------------------------------------- /docImage/8_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_2.jpg -------------------------------------------------------------------------------- /docImage/8_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_3.jpg -------------------------------------------------------------------------------- /docImage/8_2_4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/8_2_4.jpg -------------------------------------------------------------------------------- /docImage/9_1_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_1.jpg -------------------------------------------------------------------------------- /docImage/9_1_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_1_2.jpg -------------------------------------------------------------------------------- /docImage/9_2_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_1.jpg -------------------------------------------------------------------------------- /docImage/9_2_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_2.jpg -------------------------------------------------------------------------------- /docImage/9_2_3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/9_2_3.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane1.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane2.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane3.jpg -------------------------------------------------------------------------------- /docImage/Maximum_separation_hyperplane4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Maximum_separation_hyperplane4.jpg -------------------------------------------------------------------------------- /docImage/Novikoff1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff1.jpg -------------------------------------------------------------------------------- /docImage/Novikoff2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff2.jpg -------------------------------------------------------------------------------- /docImage/Novikoff3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Novikoff3.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual1.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual2.jpg -------------------------------------------------------------------------------- /docImage/Soft_interval_maximization_dual3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/Soft_interval_maximization_dual3.jpg -------------------------------------------------------------------------------- /docImage/bayes_naive_bayes1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes1.jpg -------------------------------------------------------------------------------- /docImage/bayes_naive_bayes2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayes_naive_bayes2.jpg -------------------------------------------------------------------------------- /docImage/bayesian_estimation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/bayesian_estimation.jpg -------------------------------------------------------------------------------- /docImage/hoeffding1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding1.jpg -------------------------------------------------------------------------------- /docImage/hoeffding2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/hoeffding2.jpg -------------------------------------------------------------------------------- /docImage/iterative_method1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method1.jpg -------------------------------------------------------------------------------- /docImage/iterative_method2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method2.jpg -------------------------------------------------------------------------------- /docImage/iterative_method3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/iterative_method3.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality1.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality2.jpg -------------------------------------------------------------------------------- /docImage/lagrange_duality3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/lagrange_duality3.jpg -------------------------------------------------------------------------------- /docImage/maximum_entropy1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy1.jpg -------------------------------------------------------------------------------- /docImage/maximum_entropy2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_entropy2.jpg -------------------------------------------------------------------------------- /docImage/maximum_likelihood_estimation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/maximum_likelihood_estimation.jpg -------------------------------------------------------------------------------- /docImage/mle_naive_bayes.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/mle_naive_bayes.jpg -------------------------------------------------------------------------------- /docImage/poster_prob1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob1.jpg -------------------------------------------------------------------------------- /docImage/poster_prob2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/docImage/poster_prob2.jpg -------------------------------------------------------------------------------- /notes/chapter1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter1.pdf -------------------------------------------------------------------------------- /notes/chapter10.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter10.pdf -------------------------------------------------------------------------------- /notes/chapter11.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter11.pdf -------------------------------------------------------------------------------- /notes/chapter2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter2.pdf -------------------------------------------------------------------------------- /notes/chapter3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter3.pdf -------------------------------------------------------------------------------- /notes/chapter4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter4.pdf -------------------------------------------------------------------------------- /notes/chapter5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter5.pdf -------------------------------------------------------------------------------- /notes/chapter6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter6.pdf -------------------------------------------------------------------------------- /notes/chapter7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter7.pdf -------------------------------------------------------------------------------- /notes/chapter8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter8.pdf -------------------------------------------------------------------------------- /notes/chapter9.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingsunfather/Statistic-study-notes/6c52019d311c714d763e9ab6c05ab23343b418d2/notes/chapter9.pdf --------------------------------------------------------------------------------