├── 10.Apriori算法 ├── README.md └── apriori.pyc ├── 3.决策树 ├── README.md ├── 信息熵与信息增益.pptx └── 决策树之ID3算法案例.py ├── 4.朴素贝叶斯 ├── README.md └── sabayes.py ├── 5.逻辑回归 ├── LogRegres-gj.py ├── LogRegres.py ├── README.md ├── WeChat Image_20191104085648.jpg ├── colicLogRegres.py ├── horseColicTest.txt ├── horseColicTraining.txt ├── testSet.txt └── 新建 Microsoft Word 文档.docx ├── 6.支持向量机 ├── README.md ├── sasvm.py └── svm.py ├── 7.Adaboost ├── AdaBoost.pptx ├── Adaboost.py ├── readme.md └── 常见代码.pptx ├── 8.回归 ├── readme.md └── saLR.py ├── 9.随机森林 ├── C4.5.py ├── CART.py ├── ID3,C4.5,CART理论部分.DOC ├── README.md ├── 随机森林.py └── 随机森林理论部分.docx ├── README.md ├── 学习资料 ├── Pattern Recognition and Machine Learning.pdf ├── The Elements of Statistical Learning(2nd).pdf └── 统计学习方法(第1版).pdf └── 谱聚类 ├── README.md └── SpectralClustering.py /10.Apriori算法/README.md: -------------------------------------------------------------------------------- 1 | # 关联分析介绍 2 | 商场的销售过程,涉及很多机器学习的应用,商品的陈列,购物卷的提供,用户忠诚度等等,通过对这些大量数据的分析,可以帮组商店了解用户的购物行为,进而对商品的定价、市场促销、存货管理等进行决策。 3 | 从大规模数据集中寻找物品间的隐含关系被称作关联分析(association analysis)或者关联规则学习(association rule learning)。这里的主要问题在于,寻找物品的不同组合是一项十分耗时的任务,所需的计算代价很高,蛮力搜索方法并不能解决这个问题,所以需要用更智能的方法在合理的时间范围内找到频繁项集。 4 | 这些关系可以有两种形式:频繁项集、关联规则。 5 | 频繁项集:经常出现在一块的物品的集合 6 | 关联规则:暗示两种物品之间可能存在很强的关系 7 | 一个具体的例子: 8 | 9 | 10 | 11 | 频繁项集是指那些经常出现在一起的物品,例如上图的{葡萄酒、尿布、豆奶},从上面的数据集中也可以找到尿布->葡萄酒的关联规则,这意味着有人买了尿布,那很有可能他也会购买葡萄酒。那如何定义和表示频繁项集和关联规则呢?这里引入支持度和可信度(置信度)。 12 | 13 | ## 支持度:一个项集的支持度被定义为数据集中包含该项集的记录所占的比例,上图中,豆奶的支持度为4/5,(豆奶、尿布)为3/5。支持度是针对项集来说的,因此可以定义一个最小支持度,只保留最小支持度的项集。 14 | 15 | ## 可信度(置信度):针对如{尿布}->{葡萄酒}这样的关联规则来定义的。计算为 支持度{尿布,葡萄酒}/支持度{尿布},其中{尿布,葡萄酒}的支持度为3/5,{尿布}的支持度为4/5,所以“尿布->葡萄酒”的可行度为3/4=0.75,这意味着尿布的记录中,我们的规则有75%都适用。 16 | 17 | 有了可以量化的计算方式,我们却还不能立刻运算,这是因为如果我们直接运算所有的数据,运算量极其的大,很难实现,这里说明一下,假设我们只有 4 种商品:商品0,商品1,商品 2,商品3. 那么如何得可能被一起购买的商品的组合? 18 | 一杂货店有四种商品 0,1,2,3.这些商品的组合可能有一种,二种,三种,四种。我们的关注点:用户购买一种或者多种商品,不关心具体买的商品的数量。 19 | 集合{0,1,2,3}中所有可能的项集组合如下图 20 | 21 | 22 | 23 | 24 | 25 | 四种商品要遍历15次,随着物品数目的增加遍历次数会急剧增加。对于包含N种商品的数据集一共有2^N-1种项集组合。 26 | 27 | 上图显示了物品之间所有可能的组合,从上往下一个集合是 Ø,表示不包含任何物品的空集,物品集合之间的连线表明两个或者更多集合可以组合形成一个更大的集合。我们的目标是找到经常在一起购买的物品集合。这里使用集合的支持度来度量其出现的频率。一个集合出现的支持度是指有多少比例的交易记录包含该集合。例如,对于上图,要计算 0,3 的支持度,直接的想法是遍历每条记录,统计包含有 0 和 3 的记录的数量,使用该数量除以总记录数,就可以得到支持度。而这只是针对单个集合 0,3. 要获得每种可能集合的支持度就需要多次重复上述过程。对于上图,虽然仅有4中物品,也需要遍历数据15次。随着物品数目的增加,遍历次数会急剧增加,对于包含 N 种物品的数据集共有 2^N−1 种项集组合。为了降低计算时间,研究人员发现了 Apriori 原理,可以帮我们减少感兴趣的频繁项集的数目。 28 | 29 | # Apriori 的原理: 30 | ## 定律1: 31 | 如果一个集合是频繁项集,则它的所有子集都是频繁项集。 32 | 举例:假设一个集合{A,B}是频繁项集,即A、B同时出现在一条记录的次数大于等于最小支持度min_support,则它的子集{A},{B}出现次数必定≧min_support,即它的子集都是频繁项集。 33 | ## 定律2: 34 | 如果一个集合不是频繁项集,则它的所有超集都不是频繁项集。 35 | 举例:假设集合{A}不是频繁项集,即A出现的次数小于min_support,则它的任何超集如{A,B}出现的次数必定小于(min_support),因此其超集必定也不是频繁项集。 36 | 注意: 37 | 由二级频繁项集生成三级候选项集时,没有{牛奶,面包,啤酒},那是因为{面包,啤酒}不是二级频繁项集。 38 | 最后生成三级频繁项集后,没有更高一级的候选项集,算法结束,{牛奶,面包,尿布}是最大频繁子集。 39 | 如下图所示: 40 | 41 | 42 | 43 | 44 | 45 | Apriori算法是经典生成关联规则的频繁项集挖掘算法,其目标是找到最多的K项频繁集。那么什么是最多的K项频繁集呢?例如当我们找到符合支持度的频繁集AB和ABE,我们会选择3项频繁集ABE。下面我们介绍Apriori算法选择频繁K项集过程。 46 | 47 | Apriori算法采用迭代的方法,先搜索出候选1项集以及对应的支持度,剪枝去掉低于支持度的候选1项集,得到频繁1项集。然后对剩下的频繁1项集进行连接,得到候选2项集,筛选去掉低于支持度的候选2项集,得到频繁2项集。如此迭代下去,直到无法找到频繁k+1集为止,对应的频繁k项集的集合便是算法的输出结果。我们可以通过下面例子来看到具体迭代过程。 48 | 49 | 50 | 数据集包含4条记录{‘134’,‘235’,‘1235’,‘25’},我们利用Apriori算法来寻找频繁k项集,最小支持度设置为50%。首先生成候选1项集,共包含五个数据{‘1’,‘2’,‘3’,‘4’,‘5’},计算5个数据的支持度,然后对低于支持度的数据进行剪枝。其中数据{4}支持度为25%,低于最小支持度,进行剪枝处理,最终频繁1项集为{‘1’,‘2’,‘3’,‘5’}。根据频繁1项集连接得到候选2项集{‘12’,‘13’,‘15’,‘23’,‘25’,‘35’},其中数据{‘12’,‘15’}低于最低支持度,进行剪枝处理,得到频繁2项集为{‘13’,‘23’,‘25’,‘35’}。如此迭代下去,最终能够得到频繁3项集{‘235’},由于数据无法再进行连接,算法至此结束。 51 | 52 | # Apriori算法流程 53 | 54 | 从Apriori算法原理中我们能够总结如下算法流程,其中输入数据为数据集合D和最小支持度α,输出数据为最大的频繁k项集。 55 | 扫描数据集,得到所有出现过的数据,作为候选1项集。 56 | 挖掘频繁k项集。 57 | 扫描计算候选k项集的支持度。 58 | 剪枝去掉候选k项集中支持度低于最小支持度α的数据集,得到频繁k项集。如果频繁k项集为空,则返回频繁k-1项集的集合作为算法结果,算法结束。如果得到的频繁k项集只有一项,则直接返回频繁k项集的集合作为算法结果,算法结束。 59 | 基于频繁k项集,连接生成候选k+1项集。 60 | 利用步骤2,迭代得到k=k+1项集结果。 61 | 62 | 63 | 64 | 65 | 66 | # 从频繁集中挖掘相关规则 67 | 解决了频繁项集问题,下一步就可以解决相关规则问题。 68 | 要找到关联规则,我们首先从一个频繁项集开始。从杂货店的例子可以得到,如果有一个频繁项集{豆奶, 莴苣},那么就可能有一条关联规则“豆奶➞莴苣”。这意味着如果有人购买了豆奶,那么在统计上他会购买莴苣的概率较大。注意这一条反过来并不总是成立,也就是说,可信度(“豆奶➞莴苣”)并不等于可信度(“莴苣➞豆奶”)。 69 | 前文也提到过,一条规则P➞H的可信度定义为support(P | H)/support(P),其中“|”表示P和H的并集。可见可信度的计算是基于项集的支持度的。 70 | 图4给出了从项集{0,1,2,3}产生的所有关联规则,其中阴影区域给出的是低可信度的规则。可以发现如果{0,1,2}➞{3}是一条低可信度规则,那么所有其他以3作为后件(箭头右部包含3)的规则均为低可信度的。 71 | 72 | 73 | 74 | 75 | 76 | 77 | # Apriori算法优缺点 78 | 79 | ## 优点 80 | 适合稀疏数据集。 81 | 算法原理简单,易实现。 82 | 适合事务数据库的关联规则挖掘。 83 | ## 缺点 84 | 可能产生庞大的候选集。 85 | 算法需多次遍历数据集,算法效率低,耗时。 86 | -------------------------------------------------------------------------------- /10.Apriori算法/apriori.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/10.Apriori算法/apriori.pyc -------------------------------------------------------------------------------- /3.决策树/README.md: -------------------------------------------------------------------------------- 1 | 本文采用决策树ID3算法讲银行贷款用户进行分类,图片1为分类对象表格,代码中将此表格改为数据集DataSet,PPT中说明了该算法相关的基本概念以及代码中函数的数学公式,代码还有很多需要精简改进的地方,望大家多多斧正。如果大家有不明白的地方可以来找我相互讨论,也可以去CSDN博客搜作者Jack-Cui,里面有很多机器学习模型的教学。 2 | from:蔡承真 3 | -------------------------------------------------------------------------------- /3.决策树/信息熵与信息增益.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/3.决策树/信息熵与信息增益.pptx -------------------------------------------------------------------------------- /3.决策树/决策树之ID3算法案例.py: -------------------------------------------------------------------------------- 1 | from matplotlib.font_manager import FontProperties 2 | import matplotlib.pyplot as plt 3 | from math import log 4 | import operator 5 | 6 | 7 | 8 | def createDataSet(): 9 | dataSet = [[0, 0, 0, 0, 'no'], # 创建数据集 10 | [0, 0, 0, 1, 'no'], 11 | [0, 1, 0, 1, 'yes'], 12 | [0, 1, 1, 0, 'yes'], 13 | [0, 0, 0, 0, 'no'], 14 | [1, 0, 0, 0, 'no'], 15 | [1, 0, 0, 1, 'no'], 16 | [1, 1, 1, 1, 'yes'], 17 | [1, 0, 1, 2, 'yes'], 18 | [1, 0, 1, 2, 'yes'], 19 | [2, 0, 1, 2, 'yes'], 20 | [2, 1, 0, 1, 'yes'], 21 | [2, 1, 0, 2, 'yes'], 22 | [2, 0, 1, 1, 'yes'], 23 | [2, 0, 0, 0, 'no']] 24 | labels = ['年龄', '有工作', '有自己的房子', '信贷情况'] # 分类属性 25 | return dataSet, labels # 返回数据集和分类属性 26 | 27 | 28 | """ 29 | 函数说明:计算给定数据集的经验熵(香农熵) 30 | 31 | Parameters: 32 | dataSet - 数据集 33 | Returns: 34 | shannonEnt - 经验熵(香农熵) 35 | """ 36 | 37 | 38 | def calcShannonEnt(dataSet): 39 | namEntices = len(dataSet) # 返回数据集的行数 40 | labelCounts = {} # 保存每个标签(Label)出现次数的字典 41 | for featVec in dataSet: # 对每组特征向量进行统计 42 | currentLabel = featVec[-1] # 提取标签(Label)信息 43 | if currentLabel not in labelCounts.keys(): # 如果标签(Label)没有放入统计次数的字典,添加进去 44 | labelCounts[currentLabel] = 0 45 | labelCounts[currentLabel] += 1 # Label计数 46 | shannonEnt = 0.0 # 经验熵(香农熵) 47 | for key in labelCounts: # 计算香农熵 48 | prob = float(labelCounts[key]) / namEntices # 选择该标签(Label)的概率 49 | shannonEnt -= prob * log(prob, 2) # 利用公式计算 50 | return shannonEnt # 返回经验熵(香农熵) 51 | 52 | 53 | # if __name__ == '__main__': 54 | # dataSet, features = createDataSet() 55 | # print(dataSet) 56 | # print(calcShannonEnt(dataSet)) 57 | 58 | """ 59 | 函数说明:按照给定特征划分数据集 60 | Parameters: 61 | dataSet - 待划分的数据集 62 | axis - 划分数据集的特征 63 | value - 需要返回的特征的值 64 | """ 65 | 66 | 67 | def splitDataSet(dataSet, axis, value): 68 | retDataSet = [] # 创建返回的数据集列表 69 | for featVec in dataSet: # 遍历数据集 70 | if featVec[axis] == value: 71 | reducedFeatVec = featVec[:axis] # 去掉axis特征 72 | reducedFeatVec.extend(featVec[axis + 1:]) # 将符合条件的添加到返回的数据集 73 | retDataSet.append(reducedFeatVec) 74 | return retDataSet # 返回划分后的数据集 75 | 76 | 77 | """ 78 | 函数说明:选择最优特征 79 | 80 | Parameters: 81 | dataSet - 数据集 82 | Returns: 83 | bestFeature - 信息增益最大的(最优)特征的索引值 84 | """ 85 | 86 | 87 | def chooseBestFeatureToSplit(dataSet, is_input=False): 88 | numFeatures = len(dataSet[0]) - 1 # 特征数量 89 | baseEntropy = calcShannonEnt(dataSet) # 计算数据集的香农熵 90 | bestInfoGain = 0.0 # 信息增益 91 | bestFeature = -1 # 最优特征的索引值 92 | for i in range(numFeatures): # 遍历所有特征 93 | # 获取dataSet的第i个所有特征 94 | featList = [example[i] for example in dataSet] 95 | uniqueVals = set(featList) # 创建set集合{},元素不可重复 96 | newEntropy = 0.0 # 经验条件熵 97 | for value in uniqueVals: # 计算信息增益 98 | subDataSet = splitDataSet(dataSet, i, value) # subDataSet划分后的子集 99 | prob = len(subDataSet) / float(len(dataSet)) # 计算子集的概率 100 | newEntropy += prob * calcShannonEnt(subDataSet) # 根据公式计算经验条件熵 101 | infoGain = baseEntropy - newEntropy # 信息增益 102 | if is_input: 103 | print("第%d个特征的增益为%.3f" % (i, infoGain)) # 打印每个特征的信息增益 104 | if infoGain > bestInfoGain: # 计算信息增益 105 | bestInfoGain = infoGain # 更新信息增益,找到最大的信息增益 106 | bestFeature = i # 记录信息增益最大的特征的索引值 107 | return bestFeature # 返回信息增益最大的特征的索引值 108 | 109 | 110 | # if __name__ == '__main__': 111 | # dataSet, features = createDataSet() 112 | # print("最优特征索引值:" + str(chooseBestFeatureToSplit(dataSet, True))) 113 | 114 | """ 115 | 函数说明:统计classList中出现此处最多的元素(类标签) 116 | Parameters: 117 | classList - 类标签列表 118 | Returns: 119 | sortedClassCount[0][0] - 出现此处最多的元素(类标签) 120 | """ 121 | 122 | 123 | def majorityCnt(classList): 124 | classCount = {} 125 | for vote in classList: # 统计classList中每个元素出现的次数 126 | if vote not in classCount.keys(): 127 | classCount[vote] = 0 128 | classCount[vote] += 1 129 | sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True) 130 | # classCount.iteritems()将classCount字典分解为元组列表,operator.itemgetter(1)按照第二个元素的次序对元组进行排序,reverse = True是逆序,即按照从大到小的顺序排列 131 | return sortedClassCount[0][0] # 返回classList中出现次数最多的元素 132 | 133 | 134 | """ 135 | 函数说明:创建决策树 136 | Parameters: 137 | dataSet - 训练数据集 138 | labels - 分类属性标签 139 | featLabels - 存储选择的最优特征标签 140 | Returns: 141 | myTree - 决策树 142 | """ 143 | 144 | 145 | def createTree(dataSet, labels, featLabels): 146 | classList = [example[-1] for example in dataSet] # 取分类标签(是否放贷:yes or no) 147 | if classList.count(classList[0]) == len(classList): # 如果类别完全相同则停止继续划分 148 | return classList[0] 149 | if len(dataSet[0]) == 1 or len(labels) == 0: # 遍历完所有特征时返回出现次数最多的类标签 150 | return majorityCnt(classList) 151 | bestFeat = chooseBestFeatureToSplit(dataSet) # 选择最优特征 152 | bestFeatLabel = labels[bestFeat] # 最优特征的标签 153 | featLabels.append(bestFeatLabel) 154 | myTree = {bestFeatLabel: {}} # 根据最优特征的标签生成树 155 | del (labels[bestFeat]) # 删除已经使用特征标签 156 | featValues = [example[bestFeat] for example in dataSet] # 得到训练集中所有最优特征的属性值 157 | uniqueVals = set(featValues) # 去掉重复的属性值 158 | for value in uniqueVals: # 遍历特征,创建决策树。 159 | subLabels = labels[:] 160 | myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels, featLabels) 161 | return myTree 162 | 163 | 164 | """ 165 | 函数说明:获取决策树叶子结点的数目 166 | 167 | Parameters: 168 | myTree - 决策树 169 | Returns: 170 | numLeafs - 决策树的叶子结点的数目 171 | """ 172 | 173 | 174 | def getNumLeafs(myTree): 175 | numLeafs = 0 # 初始化叶子 176 | firstStr = next(iter(myTree)) # 获取决策树结点,python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0] 177 | secondDict = myTree[firstStr] # 获取下一组字典 178 | for key in secondDict.keys(): 179 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 180 | numLeafs += getNumLeafs(secondDict[key]) 181 | else: 182 | numLeafs += 1 183 | return numLeafs 184 | 185 | 186 | """ 187 | 函数说明:获取决策树的层数 188 | 189 | Parameters: 190 | myTree - 决策树 191 | Returns: 192 | maxDepth - 决策树的层数 193 | """ 194 | 195 | 196 | def getTreeDepth(myTree): 197 | maxDepth = 0 # 初始化决策树深度 198 | firstStr = next(iter(myTree)) # python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0] 199 | secondDict = myTree[firstStr] # 获取下一个字典 200 | for key in secondDict.keys(): 201 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 202 | thisDepth = 1 + getTreeDepth(secondDict[key]) 203 | else: 204 | thisDepth = 1 205 | if thisDepth > maxDepth: 206 | maxDepth = thisDepth # 更新层数 207 | return maxDepth 208 | 209 | 210 | """ 211 | 函数说明:绘制结点 212 | 213 | Parameters: 214 | nodeTxt - 结点名 215 | centerPt - 文本位置 216 | parentPt - 标注的箭头位置 217 | nodeType - 结点格式 218 | """ 219 | 220 | 221 | def plotNode(nodeTxt, centerPt, parentPt, nodeType): 222 | arrow_args = dict(arrowstyle="<-") # 定义箭头格式 223 | font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14) # 设置中文字体 224 | createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction', # 绘制结点 225 | xytext=centerPt, textcoords='axes fraction', 226 | va="center", ha="center", bbox=nodeType, arrowprops=arrow_args, 227 | FontProperties=font) 228 | 229 | 230 | """ 231 | 函数说明:标注有向边属性值 232 | Parameters: 233 | cntrPt、parentPt - 用于计算标注位置 234 | txtString - 标注的内容 235 | """ 236 | 237 | 238 | def plotMidText(cntrPt, parentPt, txtString): 239 | xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0] # 计算标注位置 240 | yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1] 241 | createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30) 242 | 243 | 244 | """ 245 | 函数说明:绘制决策树 246 | Parameters: 247 | myTree - 决策树(字典) 248 | parentPt - 标注的内容 249 | nodeTxt - 结点名 250 | """ 251 | 252 | 253 | def plotTree(myTree, parentPt, nodeTxt): 254 | decisionNode = dict(boxstyle="sawtooth", fc="0.8") # 设置结点格式 255 | leafNode = dict(boxstyle="round4", fc="0.8") # 设置叶结点格式 256 | numLeafs = getNumLeafs(myTree) # 获取决策树叶结点数目,决定了树的宽度 257 | depth = getTreeDepth(myTree) # 获取决策树层数 258 | firstStr = next(iter(myTree)) # 下个字典 259 | cntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff) # 中心位置 260 | plotMidText(cntrPt, parentPt, nodeTxt) # 标注有向边属性值 261 | plotNode(firstStr, cntrPt, parentPt, decisionNode) # 绘制结点 262 | secondDict = myTree[firstStr] # 下一个字典,也就是继续绘制子结点 263 | plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalD # y偏移 264 | for key in secondDict.keys(): 265 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 266 | plotTree(secondDict[key], cntrPt, str(key)) # 不是叶结点,递归调用继续绘制 267 | else: # 如果是叶结点,绘制叶结点,并标注有向边属性值 268 | plotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalW 269 | plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode) 270 | plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key)) 271 | plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD 272 | 273 | 274 | """ 275 | 函数说明:创建绘制面板 276 | 277 | Parameters: 278 | inTree - 决策树(字典) 279 | Returns: 280 | 无 281 | """ 282 | 283 | 284 | def createPlot(inTree): 285 | fig = plt.figure(1, facecolor='white') # 创建fig 286 | fig.clf() # 清空fig 287 | axprops = dict(xticks=[], yticks=[]) 288 | createPlot.ax1 = plt.subplot(111, frameon=False, **axprops) # 去掉x、y轴 289 | plotTree.totalW = float(getNumLeafs(inTree)) # 获取决策树叶结点数目 290 | plotTree.totalD = float(getTreeDepth(inTree)) # 获取决策树层数 291 | plotTree.xOff = -0.5 / plotTree.totalW; 292 | plotTree.yOff = 1.0; # x偏移 293 | plotTree(inTree, (0.5, 1.0), '') # 绘制决策树 294 | plt.show() # 显示绘制结果 295 | 296 | 297 | if __name__ == '__main__': 298 | dataSet, labels = createDataSet() 299 | featLabels = [] 300 | myTree = createTree(dataSet, labels, featLabels) 301 | print(myTree) 302 | createPlot(myTree) 303 | 304 | -------------------------------------------------------------------------------- /4.朴素贝叶斯/README.md: -------------------------------------------------------------------------------- 1 | 一, 鸢尾花实验 2 | 1, 变量说明 3 | dataset 数据集 4 | rate 训练集占比 5 | train 训练集 6 | test 测试集 7 | 2, 函数作用 8 | randsplit(dataset,rate) 切分训练集和测试集 9 | gauss_clsssify(train,test) 构建贝叶斯分类器 10 | 11 | 二, 侮辱类文本分类 12 | 1,变量说明 13 | dataset 数据集 14 | classVec 标签列 15 | vocablist 词汇表 16 | inputset 切分好的一个文档 17 | returnVec 训练集向量 18 | trainMat 训练集向量矩阵 19 | p1v 侮辱类词条的条件概率数组 20 | p0v 非侮辱类词条的条件概率数组 21 | pA0 侮辱性文档占总文档的概率 22 | 2,函数作用 23 | creatVocabList(dataset) 构建词汇表 24 | setOfWords2Vec(vocablist,inputset) 获得训练集向量 25 | get_trainMat(dataset) 获得训练集向量矩阵 26 | trainNB(trainMat,classVec) 朴素贝叶斯分类器训练函数 27 | classifyNB(vec2classify,p1v,p0v,pA0) 测试朴素贝叶斯分类器 28 | testingNB(testVec) 朴素贝叶斯测试函数 29 | trainNB2(trainMat,classVec) 朴素贝叶斯分类器训练函数改进版 30 | classifyNB2(vec2classify,p1v,p0v,pA0) 测试朴素贝叶斯分类器改进版 31 | testingNB2(testVec) 朴素贝叶斯测试函数改进版 32 | -------------------------------------------------------------------------------- /4.朴素贝叶斯/sabayes.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Thu Oct 17 17:57:34 2019 4 | 5 | @author: 韩琳琳 6 | """ 7 | 8 | ###################鸢尾花实验 9 | 10 | import numpy as np 11 | import pandas as pd 12 | import random 13 | import os 14 | os.chdir('C:\\Users\\SA\\Documents\\machine learning\\code\\Ch04') 15 | 16 | dataset=pd.read_csv('iris.txt',header=None) 17 | 18 | 19 | #切分训练级和测试集 20 | def randsplit(dataset,rate): 21 | l=list(dataset.index) 22 | random.shuffle(l) 23 | dataset.index=l 24 | n=dataset.shape[0] 25 | m=int(n*0.8) 26 | train=dataset.loc[range(m),:] #注意不能写成.iloc or [:m,:] 27 | test=dataset.loc[range(m,n),:] 28 | test.index=range(n-m) 29 | dataset.index=range(n) 30 | return train,test 31 | train,test=randsplit(dataset,0.8) 32 | 33 | #构建贝叶斯分类器 34 | def gauss_clsssify(train,test): 35 | labels=train.iloc[:,-1].value_counts().index 36 | mean=[] 37 | std=[] 38 | for la in labels: 39 | item=train.loc[train.iloc[:,-1]==la] # loc 列名不能为-1 40 | m=item.iloc[:,:-1].mean() 41 | s=np.sum((item.iloc[:,:-1]-m)**2)/item.shape[0] 42 | mean.append(m) 43 | std.append(s) 44 | mean=pd.DataFrame(mean) #mean=pd.DataFrame(mean,index=labels) 45 | #这样后面的 pla=p.index[np.argmax(p.values)] 46 | std=pd.DataFrame(std) 47 | result=[] 48 | 49 | for i in range(test.shape[0]): 50 | iest=test.iloc[i,:-1].tolist() #当前测试实例 51 | pr=np.exp(-(iest-mean)**2/(2*std))/np.sqrt(2*np.pi*std) #得到正态分布概率矩阵 52 | p=1 53 | for j in range(test.shape[1]-1): 54 | p*=pr[j] 55 | pla=labels[p.index[np.argmax(p.values)]] 56 | result.append(pla) 57 | test['pre']=result 58 | accuracy=(test.iloc[:,-2]==test.iloc[:,-1]).mean() 59 | print(f'模型的预测准确率为{accuracy}') 60 | 61 | 62 | for i in range(20): 63 | train ,test=randsplit(dataset,0.8) 64 | gauss_clsssify(train,test) 65 | 66 | 67 | 68 | import pandas as pd 69 | from sklearn.naive_bayes import GaussianNB 70 | from sklearn.model_selection import train_test_split 71 | from sklearn.metrics import accuracy_score 72 | 73 | from sklearn import datasets 74 | iris=datasets.load_iris() 75 | 76 | #切分数据集 77 | Xtrain,Xtest,ytrain,ytest=train_test_split(iris.data,iris.target,random_state=42)#随机数种子决定不同切分规则 78 | #建模 79 | clf=GaussianNB() 80 | clf.fit(Xtrain,ytrain) 81 | #在测试集上执行预测,proba导出的是每个样本属于某一类的概率 82 | clf.predict(Xtest) 83 | clf.predict_proba(Xtest) 84 | #测试准确率 85 | accuracy_score(ytest,clf.predict(Xtest)) 86 | 87 | #连续性用高斯贝叶斯,0-1用伯努利贝叶斯,分词用多项式朴素贝叶斯 88 | 89 | 90 | 91 | 92 | #################################朴素贝叶斯之言论过滤 93 | 94 | import numpy as np 95 | 96 | def loadDataSet(): 97 | dataset=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'], 98 | ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'], 99 | ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'], 100 | ['stop', 'posting', 'stupid', 'worthless', 'garbage'], 101 | ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'], 102 | ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']] 103 | classVec = [0,1,0,1,0,1] #1 侮辱性词表, 0 非侮辱性词表 104 | return dataset,classVec 105 | 106 | dataset,classVec = loadDataSet() 107 | 108 | #构建词汇表 109 | 110 | def creatVocabList(dataset): 111 | vocablist=set() #只有set和set才能取并集 112 | for doc in dataset: 113 | vocablist=vocablist|set(doc) #并集 114 | vocablist=list(vocablist) 115 | #vocablist=set(vocablist) #并集的结果已经去过重了 116 | return vocablist 117 | 118 | vocablist = creatVocabList(dataset) 119 | 120 | #获得训练集向量 121 | 122 | def setOfWords2Vec(vocablist,inputset): #输入词表和切分好的一个词条 123 | returnVec = [0]*len(vocablist) #与词表等长的零向量 124 | for word in inputset: 125 | if word in vocablist: 126 | returnVec[vocablist.index(word)] = returnVec[vocablist.index(word)]+1 127 | else: print ("the word: %s is not in my Vocabulary!" % word) 128 | return returnVec 129 | 130 | def get_trainMat(dataset): 131 | vocablist=creatVocabList(dataset) 132 | result=[] 133 | for inputset in dataset: 134 | vec=setOfWords2Vec(vocablist,inputset) 135 | result.append(vec) 136 | return result 137 | 138 | trainMat = get_trainMat(dataset) 139 | 140 | #朴素贝叶斯分类器训练函数 141 | 142 | def trainNB(trainMat,classVec): 143 | n = len(trainMat) #总文档数目 144 | m = len(trainMat[0]) #所有文档中非重复词条数 145 | pA0 = sum(classVec)/n #侮辱性文档占总文档的概率 146 | p0num = np.zeros(m) # 初始化 147 | p1num = np.zeros(m) 148 | p1demo = 0 149 | p0demo = 0 150 | for i in range(n): 151 | if classVec[i]==1: 152 | p1num += trainMat[i] # 侮辱性文档中词条的分布 153 | p1demo += sum(trainMat[i]) #侮辱性文档中词条总数 154 | else: 155 | p0num += trainMat[i] 156 | p0demo += sum(trainMat[i]) 157 | p1v = p1num/p1demo #全部侮辱类词条的条件概率数组 158 | p0v = p0num/p0demo 159 | return p1v,p0v,pA0 160 | 161 | p1v,p0v,pA0 = trainNB(trainMat,classVec) 162 | 163 | #测试朴素贝叶斯分类器 164 | 165 | from functools import reduce 166 | 167 | def classifyNB(vec2classify,p1v,p0v,pA0): # vec2classify 待分类的词条分布数组 168 | p1 = reduce(lambda x,y:x*y,vec2classify*p1v)*pA0 #reduce作用,对应数字相乘(已知词组属于侮辱类的条件概率*pA0 ) 169 | p0 = reduce(lambda x,y:x*y,vec2classify*p0v)*(1-pA0) 170 | print('p1:',p1) 171 | print('p0:',p0) 172 | if p1>p0: 173 | return 1 174 | else:return 0 175 | 176 | #朴素贝叶斯测试函数 177 | 178 | def testingNB(testVec): 179 | dataset,classVec = loadDataSet() 180 | vocablist = creatVocabList(dataset) 181 | trainMat = get_trainMat(dataset) 182 | p1v,p0v,pA0 = trainNB(trainMat,classVec) 183 | thisone = setOfWords2Vec(vocablist,testVec) 184 | if classifyNB(thisone,p1v,p0v,pA0) == 0: 185 | print(testVec,'属于非侮辱类') 186 | else: 187 | print(testVec,'属于侮辱类') 188 | 189 | testVec1 = ['love','my','dalmation'] 190 | testingNB(testVec1) 191 | testVec2 = ['garbage','dog'] 192 | testingNB(testVec2) 193 | 194 | ###################朴素贝叶斯改进之拉普拉斯平滑 195 | #问题1 : P(W0|1)P(W1|1)P(W2|1) 其中任何一个为0,乘积也为0 196 | #解决 :拉普拉斯平滑:将所有词的初始频数设为1,分母设为2 197 | #问题2 : P(W0|1)P(W1|1)P(W2|1) 每个都太小,数据下溢出 198 | #解决 : 对乘积结果取对数 199 | 200 | #朴素贝叶斯分类器训练函数 改进版 201 | 202 | def trainNB2(trainMat,classVec): 203 | n = len(trainMat) #总文档数目 204 | m = len(trainMat[0]) #所有文档中非重复词条数 205 | pA0 = sum(classVec)/n #侮辱性文档占总文档的概率 206 | p0num = np.ones(m) # 初始化 1 207 | p1num = np.ones(m) 208 | p1demo = 2 #分母设为2 209 | p0demo = 2 210 | for i in range(n): 211 | if classVec[i]==1: 212 | p1num += trainMat[i] # 侮辱性文档中词条的分布 213 | p1demo += sum(trainMat[i]) #侮辱性文档中词条总数 214 | else: 215 | p0num += trainMat[i] 216 | p0demo += sum(trainMat[i]) 217 | p1v = np.log(p1num/p1demo) #侮辱类的条件概率数组取对数 218 | p0v = np.log(p0num/p0demo) 219 | return p1v,p0v,pA0 220 | 221 | p1v,p0v,pA0 = trainNB2(trainMat,classVec) 222 | 223 | #测试朴素贝叶斯分类器 224 | 225 | from functools import reduce 226 | 227 | def classifyNB2(vec2classify,p1v,p0v,pA0): # vec2classify 待分类的词条分布数组 228 | p1 = sum(vec2classify*p1v)+np.log(pA0) # 原本的连乘取对数变成连加 229 | p0 = sum(vec2classify*p0v)+np.log(1-pA0) 230 | print('p1:',p1) 231 | print('p0:',p0) 232 | if p1>p0: 233 | return 1 234 | else:return 0 235 | 236 | #朴素贝叶斯测试函数 237 | 238 | def testingNB2(testVec): 239 | dataset,classVec = loadDataSet() 240 | vocablist = creatVocabList(dataset) 241 | trainMat = get_trainMat(dataset) 242 | p1v,p0v,pA0 = trainNB2(trainMat,classVec) 243 | thisone = setOfWords2Vec(vocablist,testVec) 244 | if classifyNB2(thisone,p1v,p0v,pA0) == 0: 245 | print(testVec,'属于非侮辱类') 246 | else: 247 | print(testVec,'属于侮辱类') 248 | 249 | 250 | -------------------------------------------------------------------------------- /5.逻辑回归/LogRegres-gj.py: -------------------------------------------------------------------------------- 1 | # -*- coding:UTF-8 -*- 2 | from matplotlib.font_manager import FontProperties 3 | import matplotlib.pyplot as plt 4 | import numpy as np 5 | import random 6 | 7 | 8 | """ 9 | 函数说明:加载数据 10 | 11 | Parameters: 12 | 无 13 | Returns: 14 | dataMat - 数据列表 15 | labelMat - 标签列表 16 | Author: 17 | Jack Cui 18 | Blog: 19 | http://blog.csdn.net/c406495762 20 | Zhihu: 21 | https://www.zhihu.com/people/Jack--Cui/ 22 | Modify: 23 | 2017-08-28 24 | """ 25 | def loadDataSet(): 26 | dataMat = [] #创建数据列表 27 | labelMat = [] #创建标签列表 28 | fr = open('testSet.txt') #打开文件 29 | for line in fr.readlines(): #逐行读取 30 | lineArr = line.strip().split() #去回车,放入列表 31 | dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])]) #添加数据 32 | labelMat.append(int(lineArr[2])) #添加标签 33 | fr.close() #关闭文件 34 | return dataMat, labelMat #返回 35 | 36 | """ 37 | 函数说明:sigmoid函数 38 | 39 | Parameters: 40 | inX - 数据 41 | Returns: 42 | sigmoid函数 43 | Author: 44 | Jack Cui 45 | Blog: 46 | http://blog.csdn.net/c406495762 47 | Zhihu: 48 | https://www.zhihu.com/people/Jack--Cui/ 49 | Modify: 50 | 2017-08-28 51 | """ 52 | def sigmoid(inX): 53 | return 1.0 / (1 + np.exp(-inX)) 54 | 55 | """ 56 | 函数说明:梯度上升算法 57 | 58 | Parameters: 59 | dataMatIn - 数据集 60 | classLabels - 数据标签 61 | Returns: 62 | weights.getA() - 求得的权重数组(最优参数) 63 | weights_array - 每次更新的回归系数 64 | Author: 65 | Jack Cui 66 | Blog: 67 | http://blog.csdn.net/c406495762 68 | Zhihu: 69 | https://www.zhihu.com/people/Jack--Cui/ 70 | Modify: 71 | 2017-08-28 72 | """ 73 | def gradAscent(dataMatIn, classLabels): 74 | dataMatrix = np.mat(dataMatIn) #转换成numpy的mat 75 | labelMat = np.mat(classLabels).transpose() #转换成numpy的mat,并进行转置 76 | m, n = np.shape(dataMatrix) #返回dataMatrix的大小。m为行数,n为列数。 77 | alpha = 0.01 #移动步长,也就是学习速率,控制更新的幅度。 78 | maxCycles = 500 #最大迭代次数 79 | weights = np.ones((n,1)) 80 | weights_array = np.array([]) 81 | for k in range(maxCycles): 82 | h = sigmoid(dataMatrix * weights) #梯度上升矢量化公式 83 | error = labelMat - h 84 | weights = weights + alpha * dataMatrix.transpose() * error 85 | weights_array = np.append(weights_array,weights) 86 | weights_array = weights_array.reshape(maxCycles,n) 87 | return weights.getA(),weights_array #将矩阵转换为数组,并返回 88 | 89 | """ 90 | 函数说明:改进的随机梯度上升算法 91 | 92 | Parameters: 93 | dataMatrix - 数据数组 94 | classLabels - 数据标签 95 | numIter - 迭代次数 96 | Returns: 97 | weights - 求得的回归系数数组(最优参数) 98 | weights_array - 每次更新的回归系数 99 | Author: 100 | Jack Cui 101 | Blog: 102 | http://blog.csdn.net/c406495762 103 | Zhihu: 104 | https://www.zhihu.com/people/Jack--Cui/ 105 | Modify: 106 | 2017-08-31 107 | """ 108 | def stocGradAscent1(dataMatrix, classLabels, numIter=150): 109 | m,n = np.shape(dataMatrix) #返回dataMatrix的大小。m为行数,n为列数。 110 | weights = np.ones(n) #参数初始化 111 | weights_array = np.array([]) #存储每次更新的回归系数 112 | for j in range(numIter): 113 | dataIndex = list(range(m)) 114 | for i in range(m): 115 | alpha = 4/(1.0+j+i)+0.01 #降低alpha的大小,每次减小1/(j+i)。 116 | randIndex = int(random.uniform(0,len(dataIndex))) #随机选取样本 117 | h = sigmoid(sum(dataMatrix[randIndex]*weights)) #选择随机选取的一个样本,计算h 118 | error = classLabels[randIndex] - h #计算误差 119 | weights = weights + alpha * error * dataMatrix[randIndex] #更新回归系数 120 | weights_array = np.append(weights_array,weights,axis=0) #添加回归系数到数组中 121 | del(dataIndex[randIndex]) #删除已经使用的样本 122 | weights_array = weights_array.reshape(numIter*m,n) #改变维度 123 | return weights,weights_array #返回 124 | 125 | """ 126 | 函数说明:绘制数据集 127 | 128 | Parameters: 129 | weights - 权重参数数组 130 | Returns: 131 | 无 132 | Author: 133 | Jack Cui 134 | Blog: 135 | http://blog.csdn.net/c406495762 136 | Zhihu: 137 | https://www.zhihu.com/people/Jack--Cui/ 138 | Modify: 139 | 2017-08-30 140 | """ 141 | def plotBestFit(weights): 142 | dataMat, labelMat = loadDataSet() #加载数据集 143 | dataArr = np.array(dataMat) #转换成numpy的array数组 144 | n = np.shape(dataMat)[0] #数据个数 145 | xcord1 = []; ycord1 = [] #正样本 146 | xcord2 = []; ycord2 = [] #负样本 147 | for i in range(n): #根据数据集标签进行分类 148 | if int(labelMat[i]) == 1: 149 | xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2]) #1为正样本 150 | else: 151 | xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2]) #0为负样本 152 | fig = plt.figure() 153 | ax = fig.add_subplot(111) #添加subplot 154 | ax.scatter(xcord1, ycord1, s = 20, c = 'red', marker = 's',alpha=.5)#绘制正样本 155 | ax.scatter(xcord2, ycord2, s = 20, c = 'green',alpha=.5) #绘制负样本 156 | x = np.arange(-3.0, 3.0, 0.1) 157 | y = (-weights[0] - weights[1] * x) / weights[2] 158 | ax.plot(x, y) 159 | plt.title('BestFit') #绘制title 160 | plt.xlabel('X1'); plt.ylabel('X2') #绘制label 161 | plt.show() 162 | 163 | """ 164 | 函数说明:绘制回归系数与迭代次数的关系 165 | 166 | Parameters: 167 | weights_array1 - 回归系数数组1 168 | weights_array2 - 回归系数数组2 169 | Returns: 170 | 无 171 | Author: 172 | Jack Cui 173 | Blog: 174 | http://blog.csdn.net/c406495762 175 | Zhihu: 176 | https://www.zhihu.com/people/Jack--Cui/ 177 | Modify: 178 | 2017-08-30 179 | """ 180 | def plotWeights(weights_array1,weights_array2): 181 | #设置汉字格式 182 | font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14) 183 | #将fig画布分隔成1行1列,不共享x轴和y轴,fig画布的大小为(13,8) 184 | #当nrow=3,nclos=2时,代表fig画布被分为六个区域,axs[0][0]表示第一行第一列 185 | fig, axs = plt.subplots(nrows=3, ncols=2,sharex=False, sharey=False, figsize=(20,10)) 186 | x1 = np.arange(0, len(weights_array1), 1) 187 | #绘制w0与迭代次数的关系 188 | axs[0][0].plot(x1,weights_array1[:,0]) 189 | axs0_title_text = axs[0][0].set_title(u'改进的随机梯度上升算法:回归系数与迭代次数关系',FontProperties=font) 190 | axs0_ylabel_text = axs[0][0].set_ylabel(u'W0',FontProperties=font) 191 | plt.setp(axs0_title_text, size=20, weight='bold', color='black') 192 | plt.setp(axs0_ylabel_text, size=20, weight='bold', color='black') 193 | #绘制w1与迭代次数的关系 194 | axs[1][0].plot(x1,weights_array1[:,1]) 195 | axs1_ylabel_text = axs[1][0].set_ylabel(u'W1',FontProperties=font) 196 | plt.setp(axs1_ylabel_text, size=20, weight='bold', color='black') 197 | #绘制w2与迭代次数的关系 198 | axs[2][0].plot(x1,weights_array1[:,2]) 199 | axs2_xlabel_text = axs[2][0].set_xlabel(u'迭代次数',FontProperties=font) 200 | axs2_ylabel_text = axs[2][0].set_ylabel(u'W2',FontProperties=font) 201 | plt.setp(axs2_xlabel_text, size=20, weight='bold', color='black') 202 | plt.setp(axs2_ylabel_text, size=20, weight='bold', color='black') 203 | 204 | 205 | x2 = np.arange(0, len(weights_array2), 1) 206 | #绘制w0与迭代次数的关系 207 | axs[0][1].plot(x2,weights_array2[:,0]) 208 | axs0_title_text = axs[0][1].set_title(u'梯度上升算法:回归系数与迭代次数关系',FontProperties=font) 209 | axs0_ylabel_text = axs[0][1].set_ylabel(u'W0',FontProperties=font) 210 | plt.setp(axs0_title_text, size=20, weight='bold', color='black') 211 | plt.setp(axs0_ylabel_text, size=20, weight='bold', color='black') 212 | #绘制w1与迭代次数的关系 213 | axs[1][1].plot(x2,weights_array2[:,1]) 214 | axs1_ylabel_text = axs[1][1].set_ylabel(u'W1',FontProperties=font) 215 | plt.setp(axs1_ylabel_text, size=20, weight='bold', color='black') 216 | #绘制w2与迭代次数的关系 217 | axs[2][1].plot(x2,weights_array2[:,2]) 218 | axs2_xlabel_text = axs[2][1].set_xlabel(u'迭代次数',FontProperties=font) 219 | axs2_ylabel_text = axs[2][1].set_ylabel(u'W2',FontProperties=font) 220 | plt.setp(axs2_xlabel_text, size=20, weight='bold', color='black') 221 | plt.setp(axs2_ylabel_text, size=20, weight='bold', color='black') 222 | 223 | plt.show() 224 | 225 | if __name__ == '__main__': 226 | dataMat, labelMat = loadDataSet() 227 | weights1,weights_array1 = stocGradAscent1(np.array(dataMat), labelMat) 228 | plotBestFit(weights1) 229 | weights2,weights_array2 = gradAscent(dataMat, labelMat) 230 | plotWeights(weights_array1, weights_array2) -------------------------------------------------------------------------------- /5.逻辑回归/LogRegres.py: -------------------------------------------------------------------------------- 1 | # -*- coding:UTF-8 -*- 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | """ 5 | 函数说明:梯度上升算法测试函数 6 | 7 | 求函数f(x) = -x^2 + 4x的极大值 8 | 9 | Parameters: 10 | 无 11 | Returns: 12 | 无 13 | Author: 14 | Jack Cui 15 | Blog: 16 | http://blog.csdn.net/c406495762 17 | Zhihu: 18 | https://www.zhihu.com/people/Jack--Cui/ 19 | Modify: 20 | 2017-08-28 21 | """ 22 | def Gradient_Ascent_test(): 23 | def f_prime(x_old): #f(x)的导数 24 | return -2 * x_old + 4 25 | x_old = -1 #初始值,给一个小于x_new的值 26 | x_new = 0 #梯度上升算法初始值,即从(0,0)开始 27 | alpha = 0.01 #步长,也就是学习速率,控制更新的幅度 28 | presision = 0.00000001 #精度,也就是更新阈值 29 | while abs(x_new - x_old) > presision: 30 | x_old = x_new 31 | x_new = x_old + alpha * f_prime(x_old) #上面提到的公式 32 | print(x_new) #打印最终求解的极值近似值 33 | 34 | """ 35 | 函数说明:加载数据 36 | 37 | Parameters: 38 | 无 39 | Returns: 40 | dataMat - 数据列表 41 | labelMat - 标签列表 42 | Author: 43 | Jack Cui 44 | Blog: 45 | http://blog.csdn.net/c406495762 46 | Zhihu: 47 | https://www.zhihu.com/people/Jack--Cui/ 48 | Modify: 49 | 2017-08-28 50 | """ 51 | def loadDataSet(): 52 | dataMat = [] #创建数据列表 53 | labelMat = [] #创建标签列表 54 | fr = open('testSet.txt') #打开文件 55 | for line in fr.readlines(): #逐行读取 56 | lineArr = line.strip().split() #去回车,放入列表 57 | dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])]) #添加数据 58 | labelMat.append(int(lineArr[2])) #添加标签 59 | fr.close() #关闭文件 60 | return dataMat, labelMat #返回 61 | 62 | """ 63 | 函数说明:sigmoid函数 64 | 65 | Parameters: 66 | inX - 数据 67 | Returns: 68 | sigmoid函数 69 | Author: 70 | Jack Cui 71 | Blog: 72 | http://blog.csdn.net/c406495762 73 | Zhihu: 74 | https://www.zhihu.com/people/Jack--Cui/ 75 | Modify: 76 | 2017-08-28 77 | """ 78 | def sigmoid(inX): 79 | return 1.0 / (1 + np.exp(-inX)) 80 | 81 | """ 82 | 函数说明:梯度上升算法 83 | 84 | Parameters: 85 | dataMatIn - 数据集 86 | classLabels - 数据标签 87 | Returns: 88 | weights.getA() - 求得的权重数组(最优参数) 89 | Author: 90 | Jack Cui 91 | Blog: 92 | http://blog.csdn.net/c406495762 93 | Zhihu: 94 | https://www.zhihu.com/people/Jack--Cui/ 95 | Modify: 96 | 2017-08-28 97 | """ 98 | def gradAscent(dataMatIn, classLabels): 99 | dataMatrix = np.mat(dataMatIn) #转换成numpy的mat 100 | labelMat = np.mat(classLabels).transpose() #转换成numpy的mat,并进行转置 101 | m, n = np.shape(dataMatrix) #返回dataMatrix的大小。m为行数,n为列数。 102 | alpha = 0.001 #移动步长,也就是学习速率,控制更新的幅度。 103 | maxCycles = 500 #最大迭代次数 104 | weights = np.ones((n,1)) 105 | for k in range(maxCycles): 106 | h = sigmoid(dataMatrix * weights) #梯度上升矢量化公式 107 | error = labelMat - h 108 | weights = weights + alpha * dataMatrix.transpose() * error 109 | return weights.getA() #将矩阵转换为数组,返回权重数组 110 | 111 | """ 112 | 函数说明:绘制数据集 113 | 114 | Parameters: 115 | 无 116 | Returns: 117 | 无 118 | Author: 119 | Jack Cui 120 | Blog: 121 | http://blog.csdn.net/c406495762 122 | Zhihu: 123 | https://www.zhihu.com/people/Jack--Cui/ 124 | Modify: 125 | 2017-08-30 126 | """ 127 | def plotDataSet(): 128 | dataMat, labelMat = loadDataSet() #加载数据集 129 | dataArr = np.array(dataMat) #转换成numpy的array数组 130 | n = np.shape(dataMat)[0] #数据个数 131 | xcord1 = []; ycord1 = [] #正样本 132 | xcord2 = []; ycord2 = [] #负样本 133 | for i in range(n): #根据数据集标签进行分类 134 | if int(labelMat[i]) == 1: 135 | xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2]) #1为正样本 136 | else: 137 | xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2]) #0为负样本 138 | fig = plt.figure() 139 | ax = fig.add_subplot(111) #添加subplot 140 | ax.scatter(xcord1, ycord1, s = 20, c = 'red', marker = 's',alpha=.5)#绘制正样本 141 | ax.scatter(xcord2, ycord2, s = 20, c = 'green',alpha=.5) #绘制负样本 142 | plt.title('DataSet') #绘制title 143 | plt.xlabel('X1'); plt.ylabel('X2') #绘制label 144 | plt.show() #显示 145 | 146 | """ 147 | 函数说明:绘制数据集 148 | 149 | Parameters: 150 | weights - 权重参数数组 151 | Returns: 152 | 无 153 | Author: 154 | Jack Cui 155 | Blog: 156 | http://blog.csdn.net/c406495762 157 | Zhihu: 158 | https://www.zhihu.com/people/Jack--Cui/ 159 | Modify: 160 | 2017-08-30 161 | """ 162 | def plotBestFit(weights): 163 | dataMat, labelMat = loadDataSet() #加载数据集 164 | dataArr = np.array(dataMat) #转换成numpy的array数组 165 | n = np.shape(dataMat)[0] #数据个数 166 | xcord1 = []; ycord1 = [] #正样本 167 | xcord2 = []; ycord2 = [] #负样本 168 | for i in range(n): #根据数据集标签进行分类 169 | if int(labelMat[i]) == 1: 170 | xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2]) #1为正样本 171 | else: 172 | xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2]) #0为负样本 173 | fig = plt.figure() 174 | ax = fig.add_subplot(111) #添加subplot 175 | ax.scatter(xcord1, ycord1, s = 20, c = 'red', marker = 's',alpha=.5)#绘制正样本 176 | ax.scatter(xcord2, ycord2, s = 20, c = 'green',alpha=.5) #绘制负样本 177 | x = np.arange(-3.0, 3.0, 0.1) 178 | y = (-weights[0] - weights[1] * x) / weights[2] #最佳拟合曲线 179 | ax.plot(x, y) 180 | plt.title('BestFit') #绘制title 181 | plt.xlabel('X1'); plt.ylabel('X2') #绘制label 182 | plt.show() 183 | 184 | if __name__ == '__main__': 185 | dataMat, labelMat = loadDataSet() 186 | weights = gradAscent(dataMat, labelMat) 187 | plotBestFit(weights) -------------------------------------------------------------------------------- /5.逻辑回归/README.md: -------------------------------------------------------------------------------- 1 | ### Question 1 :步长的几何意义 2 | 图像中每一段曲线的长度 3 | 4 | --- 5 | 6 | ### Question 2 :为什么用sigmoid函数 7 | - sigmoid函数是一个阀值函数,不管x取什么值,对应的sigmoid函数值总是0 20 | 21 | 22 | 23 | 24 | sigmoid函数: 25 | 26 | 27 | 28 | 29 | 30 | 令 y(p)=x, p=f(x),则 31 | 32 | 33 | 34 | 35 | 36 | 证毕。 37 | 38 | --- 39 | 如果有哪里写的不对,欢迎大家指正。——韩琳琳 40 | -------------------------------------------------------------------------------- /5.逻辑回归/WeChat Image_20191104085648.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/5.逻辑回归/WeChat Image_20191104085648.jpg -------------------------------------------------------------------------------- /5.逻辑回归/colicLogRegres.py: -------------------------------------------------------------------------------- 1 | # -*- coding:UTF-8 -*- 2 | from sklearn.linear_model import LogisticRegression 3 | 4 | """" 5 | 函数说明:使用Sklearn构建Logistic回归分类器 6 | 7 | Parameters: 8 | 无 9 | Returns: 10 | 无 11 | Author: 12 | Jack Cui 13 | Blog: 14 | http://blog.csdn.net/c406495762 15 | Zhihu: 16 | https://www.zhihu.com/people/Jack--Cui/ 17 | Modify: 18 | 2017-09-05 19 | """ 20 | def colicSklearn(): 21 | frTrain = open('horseColicTraining.txt') #打开训练集 22 | frTest = open('horseColicTest.txt') #打开测试集 23 | trainingSet = []; trainingLabels = [] 24 | testSet = []; testLabels = [] 25 | for line in frTrain.readlines(): 26 | currLine = line.strip().split('\t') 27 | lineArr = [] 28 | for i in range(len(currLine)-1): 29 | lineArr.append(float(currLine[i])) 30 | trainingSet.append(lineArr) 31 | trainingLabels.append(float(currLine[-1])) 32 | for line in frTest.readlines(): 33 | currLine = line.strip().split('\t') 34 | lineArr =[] 35 | for i in range(len(currLine)-1): 36 | lineArr.append(float(currLine[i])) 37 | testSet.append(lineArr) 38 | testLabels.append(float(currLine[-1])) 39 | classifier = LogisticRegression(solver = 'sag',max_iter = 5000).fit(trainingSet, trainingLabels) 40 | test_accurcy = classifier.score(testSet, testLabels) * 100 41 | print('正确率:%f%%' % test_accurcy) 42 | 43 | if __name__ == '__main__': 44 | colicSklearn() 45 | -------------------------------------------------------------------------------- /5.逻辑回归/horseColicTest.txt: -------------------------------------------------------------------------------- 1 | 2 1 38.50 54 20 0 1 2 2 3 4 1 2 2 5.90 0 2 42.00 6.30 0 0 1 2 | 2 1 37.60 48 36 0 0 1 1 0 3 0 0 0 0 0 0 44.00 6.30 1 5.00 1 3 | 1 1 37.7 44 28 0 4 3 2 5 4 4 1 1 0 3 5 45 70 3 2 1 4 | 1 1 37 56 24 3 1 4 2 4 4 3 1 1 0 0 0 35 61 3 2 0 5 | 2 1 38.00 42 12 3 0 3 1 1 0 1 0 0 0 0 2 37.00 5.80 0 0 1 6 | 1 1 0 60 40 3 0 1 1 0 4 0 3 2 0 0 5 42 72 0 0 1 7 | 2 1 38.40 80 60 3 2 2 1 3 2 1 2 2 0 1 1 54.00 6.90 0 0 1 8 | 2 1 37.80 48 12 2 1 2 1 3 0 1 2 0 0 2 0 48.00 7.30 1 0 1 9 | 2 1 37.90 45 36 3 3 3 2 2 3 1 2 1 0 3 0 33.00 5.70 3 0 1 10 | 2 1 39.00 84 12 3 1 5 1 2 4 2 1 2 7.00 0 4 62.00 5.90 2 2.20 0 11 | 2 1 38.20 60 24 3 1 3 2 3 3 2 3 3 0 4 4 53.00 7.50 2 1.40 1 12 | 1 1 0 140 0 0 0 4 2 5 4 4 1 1 0 0 5 30 69 0 0 0 13 | 1 1 37.90 120 60 3 3 3 1 5 4 4 2 2 7.50 4 5 52.00 6.60 3 1.80 0 14 | 2 1 38.00 72 36 1 1 3 1 3 0 2 2 1 0 3 5 38.00 6.80 2 2.00 1 15 | 2 9 38.00 92 28 1 1 2 1 1 3 2 3 0 7.20 0 0 37.00 6.10 1 1.10 1 16 | 1 1 38.30 66 30 2 3 1 1 2 4 3 3 2 8.50 4 5 37.00 6.00 0 0 1 17 | 2 1 37.50 48 24 3 1 1 1 2 1 0 1 1 0 3 2 43.00 6.00 1 2.80 1 18 | 1 1 37.50 88 20 2 3 3 1 4 3 3 0 0 0 0 0 35.00 6.40 1 0 0 19 | 2 9 0 150 60 4 4 4 2 5 4 4 0 0 0 0 0 0 0 0 0 0 20 | 1 1 39.7 100 30 0 0 6 2 4 4 3 1 0 0 4 5 65 75 0 0 0 21 | 1 1 38.30 80 0 3 3 4 2 5 4 3 2 1 0 4 4 45.00 7.50 2 4.60 1 22 | 2 1 37.50 40 32 3 1 3 1 3 2 3 2 1 0 0 5 32.00 6.40 1 1.10 1 23 | 1 1 38.40 84 30 3 1 5 2 4 3 3 2 3 6.50 4 4 47.00 7.50 3 0 0 24 | 1 1 38.10 84 44 4 0 4 2 5 3 1 1 3 5.00 0 4 60.00 6.80 0 5.70 0 25 | 2 1 38.70 52 0 1 1 1 1 1 3 1 0 0 0 1 3 4.00 74.00 0 0 1 26 | 2 1 38.10 44 40 2 1 3 1 3 3 1 0 0 0 1 3 35.00 6.80 0 0 1 27 | 2 1 38.4 52 20 2 1 3 1 1 3 2 2 1 0 3 5 41 63 1 1 1 28 | 1 1 38.20 60 0 1 0 3 1 2 1 1 1 1 0 4 4 43.00 6.20 2 3.90 1 29 | 2 1 37.70 40 18 1 1 1 0 3 2 1 1 1 0 3 3 36.00 3.50 0 0 1 30 | 1 1 39.1 60 10 0 1 1 0 2 3 0 0 0 0 4 4 0 0 0 0 1 31 | 2 1 37.80 48 16 1 1 1 1 0 1 1 2 1 0 4 3 43.00 7.50 0 0 1 32 | 1 1 39.00 120 0 4 3 5 2 2 4 3 2 3 8.00 0 0 65.00 8.20 3 4.60 1 33 | 1 1 38.20 76 0 2 3 2 1 5 3 3 1 2 6.00 1 5 35.00 6.50 2 0.90 1 34 | 2 1 38.30 88 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 | 1 1 38.00 80 30 3 3 3 1 0 0 0 0 0 6.00 0 0 48.00 8.30 0 4.30 1 36 | 1 1 0 0 0 3 1 1 1 2 3 3 1 3 6.00 4 4 0 0 2 0 0 37 | 1 1 37.60 40 0 1 1 1 1 1 1 1 0 0 0 1 1 0 0 2 2.10 1 38 | 2 1 37.50 44 0 1 1 1 1 3 3 2 0 0 0 0 0 45.00 5.80 2 1.40 1 39 | 2 1 38.2 42 16 1 1 3 1 1 3 1 0 0 0 1 0 35 60 1 1 1 40 | 2 1 38 56 44 3 3 3 0 0 1 1 2 1 0 4 0 47 70 2 1 1 41 | 2 1 38.30 45 20 3 3 2 2 2 4 1 2 0 0 4 0 0 0 0 0 1 42 | 1 1 0 48 96 1 1 3 1 0 4 1 2 1 0 1 4 42.00 8.00 1 0 1 43 | 1 1 37.70 55 28 2 1 2 1 2 3 3 0 3 5.00 4 5 0 0 0 0 1 44 | 2 1 36.00 100 20 4 3 6 2 2 4 3 1 1 0 4 5 74.00 5.70 2 2.50 0 45 | 1 1 37.10 60 20 2 0 4 1 3 0 3 0 2 5.00 3 4 64.00 8.50 2 0 1 46 | 2 1 37.10 114 40 3 0 3 2 2 2 1 0 0 0 0 3 32.00 0 3 6.50 1 47 | 1 1 38.1 72 30 3 3 3 1 4 4 3 2 1 0 3 5 37 56 3 1 1 48 | 1 1 37.00 44 12 3 1 1 2 1 1 1 0 0 0 4 2 40.00 6.70 3 8.00 1 49 | 1 1 38.6 48 20 3 1 1 1 4 3 1 0 0 0 3 0 37 75 0 0 1 50 | 1 1 0 82 72 3 1 4 1 2 3 3 0 3 0 4 4 53 65 3 2 0 51 | 1 9 38.20 78 60 4 4 6 0 3 3 3 0 0 0 1 0 59.00 5.80 3 3.10 0 52 | 2 1 37.8 60 16 1 1 3 1 2 3 2 1 2 0 3 0 41 73 0 0 0 53 | 1 1 38.7 34 30 2 0 3 1 2 3 0 0 0 0 0 0 33 69 0 2 0 54 | 1 1 0 36 12 1 1 1 1 1 2 1 1 1 0 1 5 44.00 0 0 0 1 55 | 2 1 38.30 44 60 0 0 1 1 0 0 0 0 0 0 0 0 6.40 36.00 0 0 1 56 | 2 1 37.40 54 18 3 0 1 1 3 4 3 2 2 0 4 5 30.00 7.10 2 0 1 57 | 1 1 0 0 0 4 3 0 2 2 4 1 0 0 0 0 0 54 76 3 2 1 58 | 1 1 36.6 48 16 3 1 3 1 4 1 1 1 1 0 0 0 27 56 0 0 0 59 | 1 1 38.5 90 0 1 1 3 1 3 3 3 2 3 2 4 5 47 79 0 0 1 60 | 1 1 0 75 12 1 1 4 1 5 3 3 0 3 5.80 0 0 58.00 8.50 1 0 1 61 | 2 1 38.20 42 0 3 1 1 1 1 1 2 2 1 0 3 2 35.00 5.90 2 0 1 62 | 1 9 38.20 78 60 4 4 6 0 3 3 3 0 0 0 1 0 59.00 5.80 3 3.10 0 63 | 2 1 38.60 60 30 1 1 3 1 4 2 2 1 1 0 0 0 40.00 6.00 1 0 1 64 | 2 1 37.80 42 40 1 1 1 1 1 3 1 0 0 0 3 3 36.00 6.20 0 0 1 65 | 1 1 38 60 12 1 1 2 1 2 1 1 1 1 0 1 4 44 65 3 2 0 66 | 2 1 38.00 42 12 3 0 3 1 1 1 1 0 0 0 0 1 37.00 5.80 0 0 1 67 | 2 1 37.60 88 36 3 1 1 1 3 3 2 1 3 1.50 0 0 44.00 6.00 0 0 0 -------------------------------------------------------------------------------- /5.逻辑回归/horseColicTraining.txt: -------------------------------------------------------------------------------- 1 | 2.000000 1.000000 38.500000 66.000000 28.000000 3.000000 3.000000 0.000000 2.000000 5.000000 4.000000 4.000000 0.000000 0.000000 0.000000 3.000000 5.000000 45.000000 8.400000 0.000000 0.000000 0.000000 2 | 1.000000 1.000000 39.200000 88.000000 20.000000 0.000000 0.000000 4.000000 1.000000 3.000000 4.000000 2.000000 0.000000 0.000000 0.000000 4.000000 2.000000 50.000000 85.000000 2.000000 2.000000 0.000000 3 | 2.000000 1.000000 38.300000 40.000000 24.000000 1.000000 1.000000 3.000000 1.000000 3.000000 3.000000 1.000000 0.000000 0.000000 0.000000 1.000000 1.000000 33.000000 6.700000 0.000000 0.000000 1.000000 4 | 1.000000 9.000000 39.100000 164.000000 84.000000 4.000000 1.000000 6.000000 2.000000 2.000000 4.000000 4.000000 1.000000 2.000000 5.000000 3.000000 0.000000 48.000000 7.200000 3.000000 5.300000 0.000000 5 | 2.000000 1.000000 37.300000 104.000000 35.000000 0.000000 0.000000 6.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 74.000000 7.400000 0.000000 0.000000 0.000000 6 | 2.000000 1.000000 0.000000 0.000000 0.000000 2.000000 1.000000 3.000000 1.000000 2.000000 3.000000 2.000000 2.000000 1.000000 0.000000 3.000000 3.000000 0.000000 0.000000 0.000000 0.000000 1.000000 7 | 1.000000 1.000000 37.900000 48.000000 16.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 3.000000 1.000000 1.000000 0.000000 3.000000 5.000000 37.000000 7.000000 0.000000 0.000000 1.000000 8 | 1.000000 1.000000 0.000000 60.000000 0.000000 3.000000 0.000000 0.000000 1.000000 0.000000 4.000000 2.000000 2.000000 1.000000 0.000000 3.000000 4.000000 44.000000 8.300000 0.000000 0.000000 0.000000 9 | 2.000000 1.000000 0.000000 80.000000 36.000000 3.000000 4.000000 3.000000 1.000000 4.000000 4.000000 4.000000 2.000000 1.000000 0.000000 3.000000 5.000000 38.000000 6.200000 0.000000 0.000000 0.000000 10 | 2.000000 9.000000 38.300000 90.000000 0.000000 1.000000 0.000000 1.000000 1.000000 5.000000 3.000000 1.000000 2.000000 1.000000 0.000000 3.000000 0.000000 40.000000 6.200000 1.000000 2.200000 1.000000 11 | 1.000000 1.000000 38.100000 66.000000 12.000000 3.000000 3.000000 5.000000 1.000000 3.000000 3.000000 1.000000 2.000000 1.000000 3.000000 2.000000 5.000000 44.000000 6.000000 2.000000 3.600000 1.000000 12 | 2.000000 1.000000 39.100000 72.000000 52.000000 2.000000 0.000000 2.000000 1.000000 2.000000 1.000000 2.000000 1.000000 1.000000 0.000000 4.000000 4.000000 50.000000 7.800000 0.000000 0.000000 1.000000 13 | 1.000000 1.000000 37.200000 42.000000 12.000000 2.000000 1.000000 1.000000 1.000000 3.000000 3.000000 3.000000 3.000000 1.000000 0.000000 4.000000 5.000000 0.000000 7.000000 0.000000 0.000000 1.000000 14 | 2.000000 9.000000 38.000000 92.000000 28.000000 1.000000 1.000000 2.000000 1.000000 1.000000 3.000000 2.000000 3.000000 0.000000 7.200000 1.000000 1.000000 37.000000 6.100000 1.000000 0.000000 0.000000 15 | 1.000000 1.000000 38.200000 76.000000 28.000000 3.000000 1.000000 1.000000 1.000000 3.000000 4.000000 1.000000 2.000000 2.000000 0.000000 4.000000 4.000000 46.000000 81.000000 1.000000 2.000000 1.000000 16 | 1.000000 1.000000 37.600000 96.000000 48.000000 3.000000 1.000000 4.000000 1.000000 5.000000 3.000000 3.000000 2.000000 3.000000 4.500000 4.000000 0.000000 45.000000 6.800000 0.000000 0.000000 0.000000 17 | 1.000000 9.000000 0.000000 128.000000 36.000000 3.000000 3.000000 4.000000 2.000000 4.000000 4.000000 3.000000 3.000000 0.000000 0.000000 4.000000 5.000000 53.000000 7.800000 3.000000 4.700000 0.000000 18 | 2.000000 1.000000 37.500000 48.000000 24.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 19 | 1.000000 1.000000 37.600000 64.000000 21.000000 1.000000 1.000000 2.000000 1.000000 2.000000 3.000000 1.000000 1.000000 1.000000 0.000000 2.000000 5.000000 40.000000 7.000000 1.000000 0.000000 1.000000 20 | 2.000000 1.000000 39.400000 110.000000 35.000000 4.000000 3.000000 6.000000 0.000000 0.000000 3.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 55.000000 8.700000 0.000000 0.000000 1.000000 21 | 1.000000 1.000000 39.900000 72.000000 60.000000 1.000000 1.000000 5.000000 2.000000 5.000000 4.000000 4.000000 3.000000 1.000000 0.000000 4.000000 4.000000 46.000000 6.100000 2.000000 0.000000 1.000000 22 | 2.000000 1.000000 38.400000 48.000000 16.000000 1.000000 0.000000 1.000000 1.000000 1.000000 3.000000 1.000000 2.000000 3.000000 5.500000 4.000000 3.000000 49.000000 6.800000 0.000000 0.000000 1.000000 23 | 1.000000 1.000000 38.600000 42.000000 34.000000 2.000000 1.000000 4.000000 0.000000 2.000000 3.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 48.000000 7.200000 0.000000 0.000000 1.000000 24 | 1.000000 9.000000 38.300000 130.000000 60.000000 0.000000 3.000000 0.000000 1.000000 2.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 50.000000 70.000000 0.000000 0.000000 1.000000 25 | 1.000000 1.000000 38.100000 60.000000 12.000000 3.000000 3.000000 3.000000 1.000000 0.000000 4.000000 3.000000 3.000000 2.000000 2.000000 0.000000 0.000000 51.000000 65.000000 0.000000 0.000000 1.000000 26 | 2.000000 1.000000 37.800000 60.000000 42.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 27 | 1.000000 1.000000 38.300000 72.000000 30.000000 4.000000 3.000000 3.000000 2.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 43.000000 7.000000 2.000000 3.900000 1.000000 28 | 1.000000 1.000000 37.800000 48.000000 12.000000 3.000000 1.000000 1.000000 1.000000 0.000000 3.000000 2.000000 1.000000 1.000000 0.000000 1.000000 3.000000 37.000000 5.500000 2.000000 1.300000 1.000000 29 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 30 | 2.000000 1.000000 37.700000 48.000000 0.000000 2.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 45.000000 76.000000 0.000000 0.000000 1.000000 31 | 2.000000 1.000000 37.700000 96.000000 30.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 3.000000 2.000000 4.000000 4.000000 5.000000 66.000000 7.500000 0.000000 0.000000 0.000000 32 | 2.000000 1.000000 37.200000 108.000000 12.000000 3.000000 3.000000 4.000000 2.000000 2.000000 4.000000 2.000000 0.000000 3.000000 6.000000 3.000000 3.000000 52.000000 8.200000 3.000000 7.400000 0.000000 33 | 1.000000 1.000000 37.200000 60.000000 0.000000 2.000000 1.000000 1.000000 1.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 4.000000 5.000000 43.000000 6.600000 0.000000 0.000000 1.000000 34 | 1.000000 1.000000 38.200000 64.000000 28.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 0.000000 0.000000 0.000000 0.000000 4.000000 4.000000 49.000000 8.600000 2.000000 6.600000 1.000000 35 | 1.000000 1.000000 0.000000 100.000000 30.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 3.000000 3.000000 0.000000 4.000000 4.000000 52.000000 6.600000 0.000000 0.000000 1.000000 36 | 2.000000 1.000000 0.000000 104.000000 24.000000 4.000000 3.000000 3.000000 2.000000 4.000000 4.000000 3.000000 0.000000 3.000000 0.000000 0.000000 2.000000 73.000000 8.400000 0.000000 0.000000 0.000000 37 | 2.000000 1.000000 38.300000 112.000000 16.000000 0.000000 3.000000 5.000000 2.000000 0.000000 0.000000 1.000000 1.000000 2.000000 0.000000 0.000000 5.000000 51.000000 6.000000 2.000000 1.000000 0.000000 38 | 1.000000 1.000000 37.800000 72.000000 0.000000 0.000000 3.000000 0.000000 1.000000 5.000000 3.000000 1.000000 0.000000 1.000000 0.000000 1.000000 1.000000 56.000000 80.000000 1.000000 2.000000 1.000000 39 | 2.000000 1.000000 38.600000 52.000000 0.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 1.000000 1.000000 0.000000 1.000000 3.000000 32.000000 6.600000 1.000000 5.000000 1.000000 40 | 1.000000 9.000000 39.200000 146.000000 96.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 41 | 1.000000 1.000000 0.000000 88.000000 0.000000 3.000000 3.000000 6.000000 2.000000 5.000000 3.000000 3.000000 1.000000 3.000000 0.000000 4.000000 5.000000 63.000000 6.500000 3.000000 0.000000 0.000000 42 | 2.000000 9.000000 39.000000 150.000000 72.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 47.000000 8.500000 0.000000 0.100000 1.000000 43 | 2.000000 1.000000 38.000000 60.000000 12.000000 3.000000 1.000000 3.000000 1.000000 3.000000 3.000000 1.000000 1.000000 1.000000 0.000000 2.000000 2.000000 47.000000 7.000000 0.000000 0.000000 1.000000 44 | 1.000000 1.000000 0.000000 120.000000 0.000000 3.000000 4.000000 4.000000 1.000000 4.000000 4.000000 4.000000 1.000000 1.000000 0.000000 0.000000 5.000000 52.000000 67.000000 2.000000 2.000000 0.000000 45 | 1.000000 1.000000 35.400000 140.000000 24.000000 3.000000 3.000000 4.000000 2.000000 4.000000 4.000000 0.000000 2.000000 1.000000 0.000000 0.000000 5.000000 57.000000 69.000000 3.000000 2.000000 0.000000 46 | 2.000000 1.000000 0.000000 120.000000 0.000000 4.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 1.000000 1.000000 0.000000 4.000000 5.000000 60.000000 6.500000 3.000000 0.000000 0.000000 47 | 1.000000 1.000000 37.900000 60.000000 15.000000 3.000000 0.000000 4.000000 2.000000 5.000000 4.000000 4.000000 2.000000 2.000000 0.000000 4.000000 5.000000 65.000000 7.500000 0.000000 0.000000 1.000000 48 | 2.000000 1.000000 37.500000 48.000000 16.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 1.000000 0.000000 37.000000 6.500000 0.000000 0.000000 1.000000 49 | 1.000000 1.000000 38.900000 80.000000 44.000000 3.000000 3.000000 3.000000 2.000000 2.000000 3.000000 3.000000 2.000000 2.000000 7.000000 3.000000 1.000000 54.000000 6.500000 3.000000 0.000000 0.000000 50 | 2.000000 1.000000 37.200000 84.000000 48.000000 3.000000 3.000000 5.000000 2.000000 4.000000 1.000000 2.000000 1.000000 2.000000 0.000000 2.000000 1.000000 73.000000 5.500000 2.000000 4.100000 0.000000 51 | 2.000000 1.000000 38.600000 46.000000 0.000000 1.000000 1.000000 2.000000 1.000000 1.000000 3.000000 2.000000 1.000000 1.000000 0.000000 0.000000 2.000000 49.000000 9.100000 1.000000 1.600000 1.000000 52 | 1.000000 1.000000 37.400000 84.000000 36.000000 1.000000 0.000000 3.000000 2.000000 3.000000 3.000000 2.000000 0.000000 0.000000 0.000000 4.000000 5.000000 0.000000 0.000000 3.000000 0.000000 0.000000 53 | 2.000000 1.000000 0.000000 0.000000 0.000000 1.000000 1.000000 3.000000 1.000000 1.000000 3.000000 1.000000 0.000000 0.000000 0.000000 2.000000 2.000000 43.000000 7.700000 0.000000 0.000000 1.000000 54 | 2.000000 1.000000 38.600000 40.000000 20.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 41.000000 6.400000 0.000000 0.000000 1.000000 55 | 2.000000 1.000000 40.300000 114.000000 36.000000 3.000000 3.000000 1.000000 2.000000 2.000000 3.000000 3.000000 2.000000 1.000000 7.000000 1.000000 5.000000 57.000000 8.100000 3.000000 4.500000 0.000000 56 | 1.000000 9.000000 38.600000 160.000000 20.000000 3.000000 0.000000 5.000000 1.000000 3.000000 3.000000 4.000000 3.000000 0.000000 0.000000 4.000000 0.000000 38.000000 0.000000 2.000000 0.000000 0.000000 57 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 24.000000 6.700000 0.000000 0.000000 1.000000 58 | 1.000000 1.000000 0.000000 64.000000 36.000000 2.000000 0.000000 2.000000 1.000000 5.000000 3.000000 3.000000 2.000000 2.000000 0.000000 0.000000 0.000000 42.000000 7.700000 0.000000 0.000000 0.000000 59 | 1.000000 1.000000 0.000000 0.000000 20.000000 4.000000 3.000000 3.000000 0.000000 5.000000 4.000000 3.000000 2.000000 0.000000 0.000000 4.000000 4.000000 53.000000 5.900000 3.000000 0.000000 0.000000 60 | 2.000000 1.000000 0.000000 96.000000 0.000000 3.000000 3.000000 3.000000 2.000000 5.000000 4.000000 4.000000 1.000000 2.000000 0.000000 4.000000 5.000000 60.000000 0.000000 0.000000 0.000000 0.000000 61 | 2.000000 1.000000 37.800000 48.000000 32.000000 1.000000 1.000000 3.000000 1.000000 2.000000 1.000000 0.000000 1.000000 1.000000 0.000000 4.000000 5.000000 37.000000 6.700000 0.000000 0.000000 1.000000 62 | 2.000000 1.000000 38.500000 60.000000 0.000000 2.000000 2.000000 1.000000 1.000000 1.000000 2.000000 2.000000 2.000000 1.000000 0.000000 1.000000 1.000000 44.000000 7.700000 0.000000 0.000000 1.000000 63 | 1.000000 1.000000 37.800000 88.000000 22.000000 2.000000 1.000000 2.000000 1.000000 3.000000 0.000000 0.000000 2.000000 0.000000 0.000000 4.000000 0.000000 64.000000 8.000000 1.000000 6.000000 0.000000 64 | 2.000000 1.000000 38.200000 130.000000 16.000000 4.000000 3.000000 4.000000 2.000000 2.000000 4.000000 4.000000 1.000000 1.000000 0.000000 0.000000 0.000000 65.000000 82.000000 2.000000 2.000000 0.000000 65 | 1.000000 1.000000 39.000000 64.000000 36.000000 3.000000 1.000000 4.000000 2.000000 3.000000 3.000000 2.000000 1.000000 2.000000 7.000000 4.000000 5.000000 44.000000 7.500000 3.000000 5.000000 1.000000 66 | 1.000000 1.000000 0.000000 60.000000 36.000000 3.000000 1.000000 3.000000 1.000000 3.000000 3.000000 2.000000 1.000000 1.000000 0.000000 3.000000 4.000000 26.000000 72.000000 2.000000 1.000000 1.000000 67 | 2.000000 1.000000 37.900000 72.000000 0.000000 1.000000 1.000000 5.000000 2.000000 3.000000 3.000000 1.000000 1.000000 3.000000 2.000000 3.000000 4.000000 58.000000 74.000000 1.000000 2.000000 1.000000 68 | 2.000000 1.000000 38.400000 54.000000 24.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 2.000000 1.000000 0.000000 3.000000 2.000000 49.000000 7.200000 1.000000 0.000000 1.000000 69 | 2.000000 1.000000 0.000000 52.000000 16.000000 1.000000 0.000000 3.000000 1.000000 0.000000 0.000000 0.000000 2.000000 3.000000 5.500000 0.000000 0.000000 55.000000 7.200000 0.000000 0.000000 1.000000 70 | 2.000000 1.000000 38.000000 48.000000 12.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3.000000 0.000000 1.000000 1.000000 0.000000 3.000000 2.000000 42.000000 6.300000 2.000000 4.100000 1.000000 71 | 2.000000 1.000000 37.000000 60.000000 20.000000 3.000000 0.000000 0.000000 1.000000 3.000000 0.000000 3.000000 2.000000 2.000000 4.500000 4.000000 4.000000 43.000000 7.600000 0.000000 0.000000 0.000000 72 | 1.000000 1.000000 37.800000 48.000000 28.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 2.000000 0.000000 0.000000 1.000000 1.000000 46.000000 5.900000 2.000000 7.000000 1.000000 73 | 1.000000 1.000000 37.700000 56.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 74 | 1.000000 1.000000 38.100000 52.000000 24.000000 1.000000 1.000000 5.000000 1.000000 4.000000 3.000000 1.000000 2.000000 3.000000 7.000000 1.000000 0.000000 54.000000 7.500000 2.000000 2.600000 0.000000 75 | 1.000000 9.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 37.000000 4.900000 0.000000 0.000000 0.000000 76 | 1.000000 9.000000 39.700000 100.000000 0.000000 3.000000 3.000000 5.000000 2.000000 2.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 48.000000 57.000000 2.000000 2.000000 0.000000 77 | 1.000000 1.000000 37.600000 38.000000 20.000000 3.000000 3.000000 1.000000 1.000000 3.000000 3.000000 2.000000 0.000000 0.000000 0.000000 3.000000 0.000000 37.000000 68.000000 0.000000 0.000000 1.000000 78 | 2.000000 1.000000 38.700000 52.000000 20.000000 2.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 1.000000 1.000000 33.000000 77.000000 0.000000 0.000000 1.000000 79 | 1.000000 1.000000 0.000000 0.000000 0.000000 3.000000 3.000000 3.000000 3.000000 5.000000 3.000000 3.000000 3.000000 2.000000 0.000000 4.000000 5.000000 46.000000 5.900000 0.000000 0.000000 0.000000 80 | 1.000000 1.000000 37.500000 96.000000 18.000000 1.000000 3.000000 6.000000 2.000000 3.000000 4.000000 2.000000 2.000000 3.000000 5.000000 0.000000 4.000000 69.000000 8.900000 3.000000 0.000000 1.000000 81 | 1.000000 1.000000 36.400000 98.000000 35.000000 3.000000 3.000000 4.000000 1.000000 4.000000 3.000000 2.000000 0.000000 0.000000 0.000000 4.000000 4.000000 47.000000 6.400000 3.000000 3.600000 0.000000 82 | 1.000000 1.000000 37.300000 40.000000 0.000000 0.000000 3.000000 1.000000 1.000000 2.000000 3.000000 2.000000 3.000000 1.000000 0.000000 3.000000 5.000000 36.000000 0.000000 3.000000 2.000000 1.000000 83 | 1.000000 9.000000 38.100000 100.000000 80.000000 3.000000 1.000000 2.000000 1.000000 3.000000 4.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 36.000000 5.700000 0.000000 0.000000 1.000000 84 | 1.000000 1.000000 38.000000 0.000000 24.000000 3.000000 3.000000 6.000000 2.000000 5.000000 0.000000 4.000000 1.000000 1.000000 0.000000 0.000000 0.000000 68.000000 7.800000 0.000000 0.000000 0.000000 85 | 1.000000 1.000000 37.800000 60.000000 80.000000 1.000000 3.000000 2.000000 2.000000 2.000000 3.000000 3.000000 0.000000 2.000000 5.500000 4.000000 0.000000 40.000000 4.500000 2.000000 0.000000 1.000000 86 | 2.000000 1.000000 38.000000 54.000000 30.000000 2.000000 3.000000 3.000000 3.000000 3.000000 1.000000 2.000000 2.000000 2.000000 0.000000 0.000000 4.000000 45.000000 6.200000 0.000000 0.000000 1.000000 87 | 1.000000 1.000000 0.000000 88.000000 40.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 3.000000 3.000000 0.000000 0.000000 4.000000 5.000000 50.000000 7.700000 3.000000 1.400000 0.000000 88 | 2.000000 1.000000 0.000000 40.000000 16.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 50.000000 7.000000 2.000000 3.900000 0.000000 89 | 2.000000 1.000000 39.000000 64.000000 40.000000 1.000000 1.000000 5.000000 1.000000 3.000000 3.000000 2.000000 2.000000 1.000000 0.000000 3.000000 3.000000 42.000000 7.500000 2.000000 2.300000 1.000000 90 | 2.000000 1.000000 38.300000 42.000000 10.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 38.000000 61.000000 0.000000 0.000000 1.000000 91 | 2.000000 1.000000 38.000000 52.000000 16.000000 0.000000 0.000000 0.000000 0.000000 2.000000 0.000000 0.000000 0.000000 3.000000 1.000000 1.000000 1.000000 53.000000 86.000000 0.000000 0.000000 1.000000 92 | 2.000000 1.000000 40.300000 114.000000 36.000000 3.000000 3.000000 1.000000 2.000000 2.000000 3.000000 3.000000 2.000000 1.000000 7.000000 1.000000 5.000000 57.000000 8.100000 3.000000 4.500000 0.000000 93 | 2.000000 1.000000 38.800000 50.000000 20.000000 3.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 0.000000 3.000000 1.000000 42.000000 6.200000 0.000000 0.000000 1.000000 94 | 2.000000 1.000000 0.000000 0.000000 0.000000 3.000000 3.000000 1.000000 1.000000 5.000000 3.000000 3.000000 1.000000 1.000000 0.000000 4.000000 5.000000 38.000000 6.500000 0.000000 0.000000 0.000000 95 | 2.000000 1.000000 37.500000 48.000000 30.000000 4.000000 1.000000 3.000000 1.000000 0.000000 2.000000 1.000000 1.000000 1.000000 0.000000 1.000000 1.000000 48.000000 8.600000 0.000000 0.000000 1.000000 96 | 1.000000 1.000000 37.300000 48.000000 20.000000 0.000000 1.000000 2.000000 1.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 41.000000 69.000000 0.000000 0.000000 1.000000 97 | 2.000000 1.000000 0.000000 84.000000 36.000000 0.000000 0.000000 3.000000 1.000000 0.000000 3.000000 1.000000 2.000000 1.000000 0.000000 3.000000 2.000000 44.000000 8.500000 0.000000 0.000000 1.000000 98 | 1.000000 1.000000 38.100000 88.000000 32.000000 3.000000 3.000000 4.000000 1.000000 2.000000 3.000000 3.000000 0.000000 3.000000 1.000000 4.000000 5.000000 55.000000 60.000000 0.000000 0.000000 0.000000 99 | 2.000000 1.000000 37.700000 44.000000 40.000000 2.000000 1.000000 3.000000 1.000000 1.000000 3.000000 2.000000 1.000000 1.000000 0.000000 1.000000 5.000000 41.000000 60.000000 0.000000 0.000000 1.000000 100 | 2.000000 1.000000 39.600000 108.000000 51.000000 3.000000 3.000000 6.000000 2.000000 2.000000 4.000000 3.000000 1.000000 2.000000 0.000000 3.000000 5.000000 59.000000 8.000000 2.000000 2.600000 1.000000 101 | 1.000000 1.000000 38.200000 40.000000 16.000000 3.000000 3.000000 1.000000 1.000000 1.000000 3.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 34.000000 66.000000 0.000000 0.000000 1.000000 102 | 1.000000 1.000000 0.000000 60.000000 20.000000 4.000000 3.000000 4.000000 2.000000 5.000000 4.000000 0.000000 0.000000 1.000000 0.000000 4.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 103 | 2.000000 1.000000 38.300000 40.000000 16.000000 3.000000 0.000000 1.000000 1.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 37.000000 57.000000 0.000000 0.000000 1.000000 104 | 1.000000 9.000000 38.000000 140.000000 68.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 0.000000 0.000000 0.000000 2.000000 1.000000 39.000000 5.300000 0.000000 0.000000 1.000000 105 | 1.000000 1.000000 37.800000 52.000000 24.000000 1.000000 3.000000 3.000000 1.000000 4.000000 4.000000 1.000000 2.000000 3.000000 5.700000 2.000000 5.000000 48.000000 6.600000 1.000000 3.700000 0.000000 106 | 1.000000 1.000000 0.000000 70.000000 36.000000 1.000000 0.000000 3.000000 2.000000 2.000000 3.000000 2.000000 2.000000 0.000000 0.000000 4.000000 5.000000 36.000000 7.300000 0.000000 0.000000 1.000000 107 | 1.000000 1.000000 38.300000 52.000000 96.000000 0.000000 3.000000 3.000000 1.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 43.000000 6.100000 0.000000 0.000000 1.000000 108 | 2.000000 1.000000 37.300000 50.000000 32.000000 1.000000 1.000000 3.000000 1.000000 1.000000 3.000000 2.000000 0.000000 0.000000 0.000000 1.000000 0.000000 44.000000 7.000000 0.000000 0.000000 1.000000 109 | 1.000000 1.000000 38.700000 60.000000 32.000000 4.000000 3.000000 2.000000 2.000000 4.000000 4.000000 4.000000 0.000000 0.000000 0.000000 4.000000 5.000000 53.000000 64.000000 3.000000 2.000000 0.000000 110 | 1.000000 9.000000 38.400000 84.000000 40.000000 3.000000 3.000000 2.000000 1.000000 3.000000 3.000000 3.000000 1.000000 1.000000 0.000000 0.000000 0.000000 36.000000 6.600000 2.000000 2.800000 0.000000 111 | 1.000000 1.000000 0.000000 70.000000 16.000000 3.000000 4.000000 5.000000 2.000000 2.000000 3.000000 2.000000 2.000000 1.000000 0.000000 4.000000 5.000000 60.000000 7.500000 0.000000 0.000000 0.000000 112 | 1.000000 1.000000 38.300000 40.000000 16.000000 3.000000 0.000000 0.000000 1.000000 1.000000 3.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 38.000000 58.000000 1.000000 2.000000 1.000000 113 | 1.000000 1.000000 0.000000 40.000000 0.000000 2.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000 1.000000 0.000000 0.000000 5.000000 39.000000 56.000000 0.000000 0.000000 1.000000 114 | 1.000000 1.000000 36.800000 60.000000 28.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 10.000000 0.000000 115 | 1.000000 1.000000 38.400000 44.000000 24.000000 3.000000 0.000000 4.000000 0.000000 5.000000 4.000000 3.000000 2.000000 1.000000 0.000000 4.000000 5.000000 50.000000 77.000000 0.000000 0.000000 1.000000 116 | 2.000000 1.000000 0.000000 0.000000 40.000000 3.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 45.000000 70.000000 0.000000 0.000000 1.000000 117 | 1.000000 1.000000 38.000000 44.000000 12.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 4.000000 5.000000 42.000000 65.000000 0.000000 0.000000 1.000000 118 | 2.000000 1.000000 39.500000 0.000000 0.000000 3.000000 3.000000 4.000000 2.000000 3.000000 4.000000 3.000000 0.000000 3.000000 5.500000 4.000000 5.000000 0.000000 6.700000 1.000000 0.000000 0.000000 119 | 1.000000 1.000000 36.500000 78.000000 30.000000 1.000000 0.000000 1.000000 1.000000 5.000000 3.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 34.000000 75.000000 2.000000 1.000000 1.000000 120 | 2.000000 1.000000 38.100000 56.000000 20.000000 2.000000 1.000000 2.000000 1.000000 1.000000 3.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 46.000000 70.000000 0.000000 0.000000 1.000000 121 | 1.000000 1.000000 39.400000 54.000000 66.000000 1.000000 1.000000 2.000000 1.000000 2.000000 3.000000 2.000000 1.000000 1.000000 0.000000 3.000000 4.000000 39.000000 6.000000 2.000000 0.000000 1.000000 122 | 1.000000 1.000000 38.300000 80.000000 40.000000 0.000000 0.000000 6.000000 2.000000 4.000000 3.000000 1.000000 0.000000 2.000000 0.000000 1.000000 4.000000 67.000000 10.200000 2.000000 1.000000 0.000000 123 | 2.000000 1.000000 38.700000 40.000000 28.000000 2.000000 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 39.000000 62.000000 1.000000 1.000000 1.000000 124 | 1.000000 1.000000 38.200000 64.000000 24.000000 1.000000 1.000000 3.000000 1.000000 4.000000 4.000000 3.000000 2.000000 1.000000 0.000000 4.000000 4.000000 45.000000 7.500000 1.000000 2.000000 0.000000 125 | 2.000000 1.000000 37.600000 48.000000 20.000000 3.000000 1.000000 4.000000 1.000000 1.000000 1.000000 3.000000 2.000000 1.000000 0.000000 1.000000 1.000000 37.000000 5.500000 0.000000 0.000000 0.000000 126 | 1.000000 1.000000 38.000000 42.000000 68.000000 4.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 2.000000 2.000000 0.000000 4.000000 4.000000 41.000000 7.600000 0.000000 0.000000 1.000000 127 | 1.000000 1.000000 38.700000 0.000000 0.000000 3.000000 1.000000 3.000000 1.000000 5.000000 4.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 33.000000 6.500000 2.000000 0.000000 1.000000 128 | 1.000000 1.000000 37.400000 50.000000 32.000000 3.000000 3.000000 0.000000 1.000000 4.000000 4.000000 1.000000 2.000000 1.000000 0.000000 1.000000 0.000000 45.000000 7.900000 2.000000 1.000000 1.000000 129 | 1.000000 1.000000 37.400000 84.000000 20.000000 0.000000 0.000000 3.000000 1.000000 2.000000 3.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 31.000000 61.000000 0.000000 1.000000 0.000000 130 | 1.000000 1.000000 38.400000 49.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 1.000000 2.000000 1.000000 0.000000 0.000000 0.000000 44.000000 7.600000 0.000000 0.000000 1.000000 131 | 1.000000 1.000000 37.800000 30.000000 12.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 132 | 2.000000 1.000000 37.600000 88.000000 36.000000 3.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 1.000000 3.000000 1.500000 0.000000 0.000000 44.000000 6.000000 0.000000 0.000000 0.000000 133 | 2.000000 1.000000 37.900000 40.000000 24.000000 1.000000 1.000000 1.000000 1.000000 2.000000 3.000000 1.000000 0.000000 0.000000 0.000000 0.000000 3.000000 40.000000 5.700000 0.000000 0.000000 1.000000 134 | 1.000000 1.000000 0.000000 100.000000 0.000000 3.000000 0.000000 4.000000 2.000000 5.000000 4.000000 0.000000 2.000000 0.000000 0.000000 2.000000 0.000000 59.000000 6.300000 0.000000 0.000000 0.000000 135 | 1.000000 9.000000 38.100000 136.000000 48.000000 3.000000 3.000000 3.000000 1.000000 5.000000 1.000000 3.000000 2.000000 2.000000 4.400000 2.000000 0.000000 33.000000 4.900000 2.000000 2.900000 0.000000 136 | 1.000000 1.000000 0.000000 0.000000 0.000000 3.000000 3.000000 3.000000 2.000000 5.000000 3.000000 3.000000 3.000000 2.000000 0.000000 4.000000 5.000000 46.000000 5.900000 0.000000 0.000000 0.000000 137 | 1.000000 1.000000 38.000000 48.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 4.000000 2.000000 2.000000 0.000000 4.000000 5.000000 0.000000 0.000000 0.000000 0.000000 1.000000 138 | 2.000000 1.000000 38.000000 56.000000 0.000000 1.000000 2.000000 3.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 1.000000 1.000000 42.000000 71.000000 0.000000 0.000000 1.000000 139 | 2.000000 1.000000 38.000000 60.000000 32.000000 1.000000 1.000000 0.000000 1.000000 3.000000 3.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 50.000000 7.000000 1.000000 1.000000 1.000000 140 | 1.000000 1.000000 38.100000 44.000000 9.000000 3.000000 1.000000 1.000000 1.000000 2.000000 2.000000 1.000000 1.000000 1.000000 0.000000 4.000000 5.000000 31.000000 7.300000 0.000000 0.000000 1.000000 141 | 2.000000 1.000000 36.000000 42.000000 30.000000 0.000000 0.000000 5.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 64.000000 6.800000 0.000000 0.000000 0.000000 142 | 1.000000 1.000000 0.000000 120.000000 0.000000 4.000000 3.000000 6.000000 2.000000 5.000000 4.000000 4.000000 0.000000 0.000000 0.000000 4.000000 5.000000 57.000000 4.500000 3.000000 3.900000 0.000000 143 | 1.000000 1.000000 37.800000 48.000000 28.000000 1.000000 1.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 0.000000 0.000000 1.000000 1.000000 46.000000 5.900000 2.000000 7.000000 1.000000 144 | 1.000000 1.000000 37.100000 84.000000 40.000000 3.000000 3.000000 6.000000 1.000000 2.000000 4.000000 4.000000 3.000000 2.000000 2.000000 4.000000 5.000000 75.000000 81.000000 0.000000 0.000000 0.000000 145 | 2.000000 1.000000 0.000000 80.000000 32.000000 3.000000 3.000000 2.000000 1.000000 2.000000 3.000000 3.000000 2.000000 1.000000 0.000000 3.000000 0.000000 50.000000 80.000000 0.000000 0.000000 1.000000 146 | 1.000000 1.000000 38.200000 48.000000 0.000000 1.000000 3.000000 3.000000 1.000000 3.000000 4.000000 4.000000 1.000000 3.000000 2.000000 4.000000 5.000000 42.000000 71.000000 0.000000 0.000000 1.000000 147 | 2.000000 1.000000 38.000000 44.000000 12.000000 2.000000 1.000000 3.000000 1.000000 3.000000 4.000000 3.000000 1.000000 2.000000 6.500000 1.000000 4.000000 33.000000 6.500000 0.000000 0.000000 0.000000 148 | 1.000000 1.000000 38.300000 132.000000 0.000000 0.000000 3.000000 6.000000 2.000000 2.000000 4.000000 2.000000 2.000000 3.000000 6.200000 4.000000 4.000000 57.000000 8.000000 0.000000 5.200000 1.000000 149 | 2.000000 1.000000 38.700000 48.000000 24.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 34.000000 63.000000 0.000000 0.000000 1.000000 150 | 2.000000 1.000000 38.900000 44.000000 14.000000 3.000000 1.000000 1.000000 1.000000 2.000000 3.000000 2.000000 0.000000 0.000000 0.000000 0.000000 2.000000 33.000000 64.000000 0.000000 0.000000 1.000000 151 | 1.000000 1.000000 39.300000 0.000000 0.000000 4.000000 3.000000 6.000000 2.000000 4.000000 4.000000 2.000000 1.000000 3.000000 4.000000 4.000000 4.000000 75.000000 0.000000 3.000000 4.300000 0.000000 152 | 1.000000 1.000000 0.000000 100.000000 0.000000 3.000000 3.000000 4.000000 2.000000 0.000000 4.000000 4.000000 2.000000 1.000000 2.000000 0.000000 0.000000 68.000000 64.000000 3.000000 2.000000 1.000000 153 | 2.000000 1.000000 38.600000 48.000000 20.000000 3.000000 1.000000 1.000000 1.000000 1.000000 3.000000 2.000000 2.000000 1.000000 0.000000 3.000000 2.000000 50.000000 7.300000 1.000000 0.000000 1.000000 154 | 2.000000 1.000000 38.800000 48.000000 40.000000 1.000000 1.000000 3.000000 1.000000 3.000000 3.000000 4.000000 2.000000 0.000000 0.000000 0.000000 5.000000 41.000000 65.000000 0.000000 0.000000 1.000000 155 | 2.000000 1.000000 38.000000 48.000000 20.000000 3.000000 3.000000 4.000000 1.000000 1.000000 4.000000 2.000000 2.000000 0.000000 5.000000 0.000000 2.000000 49.000000 8.300000 1.000000 0.000000 1.000000 156 | 2.000000 1.000000 38.600000 52.000000 20.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 2.000000 1.000000 1.000000 0.000000 1.000000 3.000000 36.000000 6.600000 1.000000 5.000000 1.000000 157 | 1.000000 1.000000 37.800000 60.000000 24.000000 1.000000 0.000000 3.000000 2.000000 0.000000 4.000000 4.000000 2.000000 3.000000 2.000000 0.000000 5.000000 52.000000 75.000000 0.000000 0.000000 0.000000 158 | 2.000000 1.000000 38.000000 42.000000 40.000000 3.000000 1.000000 1.000000 1.000000 3.000000 3.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 159 | 2.000000 1.000000 0.000000 0.000000 12.000000 1.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 3.000000 1.000000 0.000000 1.000000 3.000000 44.000000 7.500000 2.000000 0.000000 1.000000 160 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 5.000000 35.000000 58.000000 2.000000 1.000000 1.000000 161 | 1.000000 1.000000 38.300000 42.000000 24.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 40.000000 8.500000 0.000000 0.000000 0.000000 162 | 2.000000 1.000000 39.500000 60.000000 10.000000 3.000000 0.000000 0.000000 2.000000 3.000000 3.000000 2.000000 2.000000 1.000000 0.000000 3.000000 0.000000 38.000000 56.000000 1.000000 0.000000 1.000000 163 | 1.000000 1.000000 38.000000 66.000000 20.000000 1.000000 3.000000 3.000000 1.000000 5.000000 3.000000 1.000000 1.000000 1.000000 0.000000 3.000000 0.000000 46.000000 46.000000 3.000000 2.000000 0.000000 164 | 1.000000 1.000000 38.700000 76.000000 0.000000 1.000000 1.000000 5.000000 2.000000 3.000000 3.000000 2.000000 2.000000 2.000000 0.000000 4.000000 4.000000 50.000000 8.000000 0.000000 0.000000 1.000000 165 | 1.000000 1.000000 39.400000 120.000000 48.000000 0.000000 0.000000 5.000000 1.000000 0.000000 3.000000 3.000000 1.000000 0.000000 0.000000 4.000000 0.000000 56.000000 64.000000 1.000000 2.000000 0.000000 166 | 1.000000 1.000000 38.300000 40.000000 18.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000 0.000000 0.000000 0.000000 2.000000 1.000000 43.000000 5.900000 1.000000 0.000000 1.000000 167 | 2.000000 1.000000 0.000000 44.000000 24.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 1.000000 2.000000 1.000000 0.000000 0.000000 1.000000 0.000000 6.300000 0.000000 0.000000 1.000000 168 | 1.000000 1.000000 38.400000 104.000000 40.000000 1.000000 1.000000 3.000000 1.000000 2.000000 4.000000 2.000000 2.000000 3.000000 6.500000 0.000000 4.000000 55.000000 8.500000 0.000000 0.000000 1.000000 169 | 1.000000 1.000000 0.000000 65.000000 24.000000 0.000000 0.000000 0.000000 2.000000 5.000000 0.000000 4.000000 3.000000 1.000000 0.000000 0.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 170 | 2.000000 1.000000 37.500000 44.000000 20.000000 1.000000 1.000000 3.000000 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 35.000000 7.200000 0.000000 0.000000 1.000000 171 | 2.000000 1.000000 39.000000 86.000000 16.000000 3.000000 3.000000 5.000000 0.000000 3.000000 3.000000 3.000000 0.000000 2.000000 0.000000 0.000000 0.000000 68.000000 5.800000 3.000000 6.000000 0.000000 172 | 1.000000 1.000000 38.500000 129.000000 48.000000 3.000000 3.000000 3.000000 1.000000 2.000000 4.000000 3.000000 1.000000 3.000000 2.000000 0.000000 0.000000 57.000000 66.000000 3.000000 2.000000 1.000000 173 | 1.000000 1.000000 0.000000 104.000000 0.000000 3.000000 3.000000 5.000000 2.000000 2.000000 4.000000 3.000000 0.000000 3.000000 0.000000 4.000000 4.000000 69.000000 8.600000 2.000000 3.400000 0.000000 174 | 2.000000 1.000000 0.000000 0.000000 0.000000 3.000000 4.000000 6.000000 0.000000 4.000000 0.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 175 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 176 | 1.000000 1.000000 38.200000 60.000000 30.000000 1.000000 1.000000 3.000000 1.000000 3.000000 3.000000 1.000000 2.000000 1.000000 0.000000 3.000000 2.000000 48.000000 66.000000 0.000000 0.000000 1.000000 177 | 1.000000 1.000000 0.000000 68.000000 14.000000 0.000000 0.000000 4.000000 1.000000 4.000000 0.000000 0.000000 0.000000 1.000000 4.300000 0.000000 0.000000 0.000000 0.000000 2.000000 2.800000 0.000000 178 | 1.000000 1.000000 0.000000 60.000000 30.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 1.000000 1.000000 0.000000 4.000000 0.000000 45.000000 70.000000 3.000000 2.000000 1.000000 179 | 2.000000 1.000000 38.500000 100.000000 0.000000 3.000000 3.000000 5.000000 2.000000 4.000000 3.000000 4.000000 2.000000 1.000000 0.000000 4.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 180 | 1.000000 1.000000 38.400000 84.000000 30.000000 3.000000 1.000000 5.000000 2.000000 4.000000 3.000000 3.000000 2.000000 3.000000 6.500000 4.000000 4.000000 47.000000 7.500000 3.000000 0.000000 0.000000 181 | 2.000000 1.000000 37.800000 48.000000 14.000000 0.000000 0.000000 1.000000 1.000000 3.000000 0.000000 2.000000 1.000000 3.000000 5.300000 1.000000 0.000000 35.000000 7.500000 0.000000 0.000000 1.000000 182 | 1.000000 1.000000 38.000000 0.000000 24.000000 3.000000 3.000000 6.000000 2.000000 5.000000 0.000000 4.000000 1.000000 1.000000 0.000000 0.000000 0.000000 68.000000 7.800000 0.000000 0.000000 0.000000 183 | 2.000000 1.000000 37.800000 56.000000 16.000000 1.000000 1.000000 2.000000 1.000000 2.000000 1.000000 1.000000 2.000000 1.000000 0.000000 1.000000 0.000000 44.000000 68.000000 1.000000 1.000000 1.000000 184 | 2.000000 1.000000 38.200000 68.000000 32.000000 2.000000 2.000000 2.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 0.000000 1.000000 1.000000 43.000000 65.000000 0.000000 0.000000 1.000000 185 | 1.000000 1.000000 38.500000 120.000000 60.000000 4.000000 3.000000 6.000000 2.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 54.000000 0.000000 0.000000 0.000000 1.000000 186 | 1.000000 1.000000 39.300000 64.000000 90.000000 2.000000 3.000000 1.000000 1.000000 0.000000 3.000000 1.000000 1.000000 2.000000 0.000000 0.000000 0.000000 39.000000 6.700000 0.000000 0.000000 1.000000 187 | 1.000000 1.000000 38.400000 80.000000 30.000000 4.000000 3.000000 1.000000 1.000000 3.000000 3.000000 3.000000 3.000000 3.000000 0.000000 4.000000 5.000000 32.000000 6.100000 3.000000 4.300000 1.000000 188 | 1.000000 1.000000 38.500000 60.000000 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 33.000000 53.000000 1.000000 0.000000 1.000000 189 | 1.000000 1.000000 38.300000 60.000000 16.000000 3.000000 1.000000 1.000000 1.000000 2.000000 1.000000 1.000000 2.000000 2.000000 3.000000 1.000000 4.000000 30.000000 6.000000 1.000000 3.000000 1.000000 190 | 1.000000 1.000000 37.100000 40.000000 8.000000 0.000000 1.000000 4.000000 1.000000 3.000000 3.000000 1.000000 1.000000 1.000000 0.000000 3.000000 3.000000 23.000000 6.700000 3.000000 0.000000 1.000000 191 | 2.000000 9.000000 0.000000 100.000000 44.000000 2.000000 1.000000 1.000000 1.000000 4.000000 1.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 37.000000 4.700000 0.000000 0.000000 1.000000 192 | 1.000000 1.000000 38.200000 48.000000 18.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 3.000000 1.000000 2.000000 0.000000 4.000000 0.000000 48.000000 74.000000 1.000000 2.000000 1.000000 193 | 1.000000 1.000000 0.000000 60.000000 48.000000 3.000000 3.000000 4.000000 2.000000 4.000000 3.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 58.000000 7.600000 0.000000 0.000000 0.000000 194 | 2.000000 1.000000 37.900000 88.000000 24.000000 1.000000 1.000000 2.000000 1.000000 2.000000 2.000000 1.000000 0.000000 0.000000 0.000000 4.000000 1.000000 37.000000 56.000000 0.000000 0.000000 1.000000 195 | 2.000000 1.000000 38.000000 44.000000 12.000000 3.000000 1.000000 1.000000 0.000000 0.000000 1.000000 2.000000 0.000000 0.000000 0.000000 1.000000 0.000000 42.000000 64.000000 0.000000 0.000000 1.000000 196 | 2.000000 1.000000 38.500000 60.000000 20.000000 1.000000 1.000000 5.000000 2.000000 2.000000 2.000000 1.000000 2.000000 1.000000 0.000000 2.000000 3.000000 63.000000 7.500000 2.000000 2.300000 0.000000 197 | 2.000000 1.000000 38.500000 96.000000 36.000000 3.000000 3.000000 0.000000 2.000000 2.000000 4.000000 2.000000 1.000000 2.000000 0.000000 4.000000 5.000000 70.000000 8.500000 0.000000 0.000000 0.000000 198 | 2.000000 1.000000 38.300000 60.000000 20.000000 1.000000 1.000000 1.000000 2.000000 1.000000 3.000000 1.000000 0.000000 0.000000 0.000000 3.000000 0.000000 34.000000 66.000000 0.000000 0.000000 1.000000 199 | 2.000000 1.000000 38.500000 60.000000 40.000000 3.000000 1.000000 2.000000 1.000000 2.000000 1.000000 2.000000 0.000000 0.000000 0.000000 3.000000 2.000000 49.000000 59.000000 0.000000 0.000000 1.000000 200 | 1.000000 1.000000 37.300000 48.000000 12.000000 1.000000 0.000000 3.000000 1.000000 3.000000 1.000000 3.000000 2.000000 1.000000 0.000000 3.000000 3.000000 40.000000 6.600000 2.000000 0.000000 1.000000 201 | 1.000000 1.000000 38.500000 86.000000 0.000000 1.000000 1.000000 3.000000 1.000000 4.000000 4.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 45.000000 7.400000 1.000000 3.400000 0.000000 202 | 1.000000 1.000000 37.500000 48.000000 40.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 5.000000 41.000000 55.000000 3.000000 2.000000 0.000000 203 | 2.000000 1.000000 37.200000 36.000000 9.000000 1.000000 1.000000 1.000000 1.000000 2.000000 3.000000 1.000000 2.000000 1.000000 0.000000 4.000000 1.000000 35.000000 5.700000 0.000000 0.000000 1.000000 204 | 1.000000 1.000000 39.200000 0.000000 23.000000 3.000000 1.000000 3.000000 1.000000 4.000000 4.000000 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 36.000000 6.600000 1.000000 3.000000 1.000000 205 | 2.000000 1.000000 38.500000 100.000000 0.000000 3.000000 3.000000 5.000000 2.000000 4.000000 3.000000 4.000000 2.000000 1.000000 0.000000 4.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 206 | 1.000000 1.000000 38.500000 96.000000 30.000000 2.000000 3.000000 4.000000 2.000000 4.000000 4.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 50.000000 65.000000 0.000000 0.000000 1.000000 207 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 45.000000 8.700000 0.000000 0.000000 0.000000 208 | 1.000000 1.000000 37.800000 88.000000 80.000000 3.000000 3.000000 5.000000 2.000000 0.000000 3.000000 3.000000 2.000000 3.000000 0.000000 4.000000 5.000000 64.000000 89.000000 0.000000 0.000000 0.000000 209 | 2.000000 1.000000 37.500000 44.000000 10.000000 3.000000 1.000000 1.000000 1.000000 3.000000 1.000000 2.000000 2.000000 0.000000 0.000000 3.000000 3.000000 43.000000 51.000000 1.000000 1.000000 1.000000 210 | 1.000000 1.000000 37.900000 68.000000 20.000000 0.000000 1.000000 2.000000 1.000000 2.000000 4.000000 2.000000 0.000000 0.000000 0.000000 1.000000 5.000000 45.000000 4.000000 3.000000 2.800000 0.000000 211 | 1.000000 1.000000 38.000000 86.000000 24.000000 4.000000 3.000000 4.000000 1.000000 2.000000 4.000000 4.000000 1.000000 1.000000 0.000000 4.000000 5.000000 45.000000 5.500000 1.000000 10.100000 0.000000 212 | 1.000000 9.000000 38.900000 120.000000 30.000000 1.000000 3.000000 2.000000 2.000000 3.000000 3.000000 3.000000 3.000000 1.000000 3.000000 0.000000 0.000000 47.000000 6.300000 1.000000 0.000000 1.000000 213 | 1.000000 1.000000 37.600000 45.000000 12.000000 3.000000 1.000000 3.000000 1.000000 0.000000 2.000000 2.000000 2.000000 1.000000 0.000000 1.000000 4.000000 39.000000 7.000000 2.000000 1.500000 1.000000 214 | 2.000000 1.000000 38.600000 56.000000 32.000000 2.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 0.000000 0.000000 2.000000 0.000000 40.000000 7.000000 2.000000 2.100000 1.000000 215 | 1.000000 1.000000 37.800000 40.000000 12.000000 1.000000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 2.000000 1.000000 0.000000 1.000000 2.000000 38.000000 7.000000 0.000000 0.000000 1.000000 216 | 2.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 217 | 1.000000 1.000000 38.000000 76.000000 18.000000 0.000000 0.000000 0.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 71.000000 11.000000 0.000000 0.000000 1.000000 218 | 1.000000 1.000000 38.100000 40.000000 36.000000 1.000000 2.000000 2.000000 1.000000 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 219 | 1.000000 1.000000 0.000000 52.000000 28.000000 3.000000 3.000000 4.000000 1.000000 3.000000 4.000000 3.000000 2.000000 1.000000 0.000000 4.000000 4.000000 37.000000 8.100000 0.000000 0.000000 1.000000 220 | 1.000000 1.000000 39.200000 88.000000 58.000000 4.000000 4.000000 0.000000 2.000000 5.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000 2.000000 0.000000 221 | 1.000000 1.000000 38.500000 92.000000 40.000000 4.000000 3.000000 0.000000 1.000000 2.000000 4.000000 3.000000 0.000000 0.000000 0.000000 4.000000 0.000000 46.000000 67.000000 2.000000 2.000000 1.000000 222 | 1.000000 1.000000 0.000000 112.000000 13.000000 4.000000 4.000000 4.000000 1.000000 2.000000 3.000000 1.000000 2.000000 1.000000 4.500000 4.000000 4.000000 60.000000 6.300000 3.000000 0.000000 1.000000 223 | 1.000000 1.000000 37.700000 66.000000 12.000000 1.000000 1.000000 3.000000 1.000000 3.000000 3.000000 2.000000 2.000000 0.000000 0.000000 4.000000 4.000000 31.500000 6.200000 2.000000 1.600000 1.000000 224 | 1.000000 1.000000 38.800000 50.000000 14.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000 1.000000 1.000000 0.000000 3.000000 5.000000 38.000000 58.000000 0.000000 0.000000 1.000000 225 | 2.000000 1.000000 38.400000 54.000000 24.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 2.000000 1.000000 0.000000 3.000000 2.000000 49.000000 7.200000 1.000000 8.000000 1.000000 226 | 1.000000 1.000000 39.200000 120.000000 20.000000 4.000000 3.000000 5.000000 2.000000 2.000000 3.000000 3.000000 1.000000 3.000000 0.000000 0.000000 4.000000 60.000000 8.800000 3.000000 0.000000 0.000000 227 | 1.000000 9.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 45.000000 6.500000 2.000000 0.000000 1.000000 228 | 1.000000 1.000000 37.300000 90.000000 40.000000 3.000000 0.000000 6.000000 2.000000 5.000000 4.000000 3.000000 2.000000 2.000000 0.000000 1.000000 5.000000 65.000000 50.000000 3.000000 2.000000 0.000000 229 | 1.000000 9.000000 38.500000 120.000000 70.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 2.000000 0.000000 0.000000 1.000000 0.000000 35.000000 54.000000 1.000000 1.000000 1.000000 230 | 1.000000 1.000000 38.500000 104.000000 40.000000 3.000000 3.000000 0.000000 1.000000 4.000000 3.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 231 | 2.000000 1.000000 39.500000 92.000000 28.000000 3.000000 3.000000 6.000000 1.000000 5.000000 4.000000 1.000000 0.000000 3.000000 0.000000 4.000000 0.000000 72.000000 6.400000 0.000000 3.600000 0.000000 232 | 1.000000 1.000000 38.500000 30.000000 18.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 40.000000 7.700000 0.000000 0.000000 1.000000 233 | 1.000000 1.000000 38.300000 72.000000 30.000000 4.000000 3.000000 3.000000 2.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 43.000000 7.000000 2.000000 3.900000 1.000000 234 | 2.000000 1.000000 37.500000 48.000000 30.000000 4.000000 1.000000 3.000000 1.000000 0.000000 2.000000 1.000000 1.000000 1.000000 0.000000 1.000000 1.000000 48.000000 8.600000 0.000000 0.000000 1.000000 235 | 1.000000 1.000000 38.100000 52.000000 24.000000 1.000000 1.000000 5.000000 1.000000 4.000000 3.000000 1.000000 2.000000 3.000000 7.000000 1.000000 0.000000 54.000000 7.500000 2.000000 2.600000 0.000000 236 | 2.000000 1.000000 38.200000 42.000000 26.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 2.000000 0.000000 0.000000 0.000000 1.000000 0.000000 36.000000 6.900000 0.000000 0.000000 1.000000 237 | 2.000000 1.000000 37.900000 54.000000 42.000000 2.000000 1.000000 5.000000 1.000000 3.000000 1.000000 1.000000 0.000000 1.000000 0.000000 0.000000 2.000000 47.000000 54.000000 3.000000 1.000000 1.000000 238 | 2.000000 1.000000 36.100000 88.000000 0.000000 3.000000 3.000000 3.000000 1.000000 3.000000 3.000000 2.000000 2.000000 3.000000 0.000000 0.000000 4.000000 45.000000 7.000000 3.000000 4.800000 0.000000 239 | 1.000000 1.000000 38.100000 70.000000 22.000000 0.000000 1.000000 0.000000 1.000000 5.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000 36.000000 65.000000 0.000000 0.000000 0.000000 240 | 1.000000 1.000000 38.000000 90.000000 30.000000 4.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 0.000000 0.000000 0.000000 4.000000 5.000000 55.000000 6.100000 0.000000 0.000000 0.000000 241 | 1.000000 1.000000 38.200000 52.000000 16.000000 1.000000 1.000000 2.000000 1.000000 1.000000 2.000000 1.000000 1.000000 1.000000 0.000000 1.000000 0.000000 43.000000 8.100000 0.000000 0.000000 1.000000 242 | 1.000000 1.000000 0.000000 36.000000 32.000000 1.000000 1.000000 4.000000 1.000000 5.000000 3.000000 3.000000 2.000000 3.000000 4.000000 0.000000 4.000000 41.000000 5.900000 0.000000 0.000000 0.000000 243 | 1.000000 1.000000 38.400000 92.000000 20.000000 1.000000 0.000000 0.000000 2.000000 0.000000 3.000000 3.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 244 | 1.000000 9.000000 38.200000 124.000000 88.000000 1.000000 3.000000 2.000000 1.000000 2.000000 3.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 47.000000 8.000000 1.000000 0.000000 1.000000 245 | 2.000000 1.000000 0.000000 96.000000 0.000000 3.000000 3.000000 3.000000 2.000000 5.000000 4.000000 4.000000 0.000000 1.000000 0.000000 4.000000 5.000000 60.000000 0.000000 0.000000 0.000000 0.000000 246 | 1.000000 1.000000 37.600000 68.000000 32.000000 3.000000 0.000000 3.000000 1.000000 4.000000 2.000000 4.000000 2.000000 2.000000 6.500000 1.000000 5.000000 47.000000 7.200000 1.000000 0.000000 1.000000 247 | 1.000000 1.000000 38.100000 88.000000 24.000000 3.000000 3.000000 4.000000 1.000000 5.000000 4.000000 3.000000 2.000000 1.000000 0.000000 3.000000 4.000000 41.000000 4.600000 0.000000 0.000000 0.000000 248 | 1.000000 1.000000 38.000000 108.000000 60.000000 2.000000 3.000000 4.000000 1.000000 4.000000 3.000000 3.000000 2.000000 0.000000 0.000000 3.000000 4.000000 0.000000 0.000000 3.000000 0.000000 1.000000 249 | 2.000000 1.000000 38.200000 48.000000 0.000000 2.000000 0.000000 1.000000 2.000000 3.000000 3.000000 1.000000 2.000000 1.000000 0.000000 0.000000 2.000000 34.000000 6.600000 0.000000 0.000000 1.000000 250 | 1.000000 1.000000 39.300000 100.000000 51.000000 4.000000 4.000000 6.000000 1.000000 2.000000 4.000000 1.000000 1.000000 3.000000 2.000000 0.000000 4.000000 66.000000 13.000000 3.000000 2.000000 0.000000 251 | 2.000000 1.000000 36.600000 42.000000 18.000000 3.000000 3.000000 2.000000 1.000000 1.000000 4.000000 1.000000 1.000000 1.000000 0.000000 0.000000 5.000000 52.000000 7.100000 0.000000 0.000000 0.000000 252 | 1.000000 9.000000 38.800000 124.000000 36.000000 3.000000 1.000000 2.000000 1.000000 2.000000 3.000000 4.000000 1.000000 1.000000 0.000000 4.000000 4.000000 50.000000 7.600000 3.000000 0.000000 0.000000 253 | 2.000000 1.000000 0.000000 112.000000 24.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 2.000000 0.000000 0.000000 0.000000 4.000000 0.000000 40.000000 5.300000 3.000000 2.600000 1.000000 254 | 1.000000 1.000000 0.000000 80.000000 0.000000 3.000000 3.000000 3.000000 1.000000 4.000000 4.000000 4.000000 0.000000 0.000000 0.000000 4.000000 5.000000 43.000000 70.000000 0.000000 0.000000 1.000000 255 | 1.000000 9.000000 38.800000 184.000000 84.000000 1.000000 0.000000 1.000000 1.000000 4.000000 1.000000 3.000000 0.000000 0.000000 0.000000 2.000000 0.000000 33.000000 3.300000 0.000000 0.000000 0.000000 256 | 1.000000 1.000000 37.500000 72.000000 0.000000 2.000000 1.000000 1.000000 1.000000 2.000000 1.000000 1.000000 1.000000 1.000000 0.000000 1.000000 0.000000 35.000000 65.000000 2.000000 2.000000 0.000000 257 | 1.000000 1.000000 38.700000 96.000000 28.000000 3.000000 3.000000 4.000000 1.000000 0.000000 4.000000 0.000000 0.000000 3.000000 7.500000 0.000000 0.000000 64.000000 9.000000 0.000000 0.000000 0.000000 258 | 2.000000 1.000000 37.500000 52.000000 12.000000 1.000000 1.000000 1.000000 1.000000 2.000000 3.000000 2.000000 2.000000 1.000000 0.000000 3.000000 5.000000 36.000000 61.000000 1.000000 1.000000 1.000000 259 | 1.000000 1.000000 40.800000 72.000000 42.000000 3.000000 3.000000 1.000000 1.000000 2.000000 3.000000 1.000000 2.000000 1.000000 0.000000 0.000000 0.000000 54.000000 7.400000 3.000000 0.000000 0.000000 260 | 2.000000 1.000000 38.000000 40.000000 25.000000 0.000000 1.000000 1.000000 1.000000 4.000000 3.000000 2.000000 1.000000 1.000000 0.000000 4.000000 0.000000 37.000000 69.000000 0.000000 0.000000 1.000000 261 | 2.000000 1.000000 38.400000 48.000000 16.000000 2.000000 1.000000 1.000000 1.000000 1.000000 0.000000 2.000000 2.000000 1.000000 0.000000 0.000000 2.000000 39.000000 6.500000 0.000000 0.000000 1.000000 262 | 2.000000 9.000000 38.600000 88.000000 28.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 35.000000 5.900000 0.000000 0.000000 1.000000 263 | 1.000000 1.000000 37.100000 75.000000 36.000000 0.000000 0.000000 3.000000 2.000000 4.000000 4.000000 2.000000 2.000000 3.000000 5.000000 4.000000 4.000000 48.000000 7.400000 3.000000 3.200000 0.000000 264 | 1.000000 1.000000 38.300000 44.000000 21.000000 3.000000 1.000000 2.000000 1.000000 3.000000 3.000000 3.000000 2.000000 1.000000 0.000000 1.000000 5.000000 44.000000 6.500000 2.000000 4.400000 1.000000 265 | 2.000000 1.000000 0.000000 56.000000 68.000000 3.000000 1.000000 1.000000 1.000000 3.000000 3.000000 1.000000 2.000000 1.000000 0.000000 1.000000 0.000000 40.000000 6.000000 0.000000 0.000000 0.000000 266 | 2.000000 1.000000 38.600000 68.000000 20.000000 2.000000 1.000000 3.000000 1.000000 3.000000 3.000000 2.000000 1.000000 1.000000 0.000000 1.000000 5.000000 38.000000 6.500000 1.000000 0.000000 1.000000 267 | 2.000000 1.000000 38.300000 54.000000 18.000000 3.000000 1.000000 2.000000 1.000000 2.000000 3.000000 2.000000 0.000000 3.000000 5.400000 0.000000 4.000000 44.000000 7.200000 3.000000 0.000000 1.000000 268 | 1.000000 1.000000 38.200000 42.000000 20.000000 0.000000 0.000000 1.000000 1.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000 3.000000 0.000000 47.000000 60.000000 0.000000 0.000000 1.000000 269 | 1.000000 1.000000 39.300000 64.000000 90.000000 2.000000 3.000000 1.000000 1.000000 0.000000 3.000000 1.000000 1.000000 2.000000 6.500000 1.000000 5.000000 39.000000 6.700000 0.000000 0.000000 1.000000 270 | 1.000000 1.000000 37.500000 60.000000 50.000000 3.000000 3.000000 1.000000 1.000000 3.000000 3.000000 2.000000 2.000000 2.000000 3.500000 3.000000 4.000000 35.000000 6.500000 0.000000 0.000000 0.000000 271 | 1.000000 1.000000 37.700000 80.000000 0.000000 3.000000 3.000000 6.000000 1.000000 5.000000 4.000000 1.000000 2.000000 3.000000 0.000000 3.000000 1.000000 50.000000 55.000000 3.000000 2.000000 1.000000 272 | 1.000000 1.000000 0.000000 100.000000 30.000000 3.000000 3.000000 4.000000 2.000000 5.000000 4.000000 4.000000 3.000000 3.000000 0.000000 4.000000 4.000000 52.000000 6.600000 0.000000 0.000000 1.000000 273 | 1.000000 1.000000 37.700000 120.000000 28.000000 3.000000 3.000000 3.000000 1.000000 5.000000 3.000000 3.000000 1.000000 1.000000 0.000000 0.000000 0.000000 65.000000 7.000000 3.000000 0.000000 0.000000 274 | 1.000000 1.000000 0.000000 76.000000 0.000000 0.000000 3.000000 0.000000 0.000000 0.000000 4.000000 4.000000 0.000000 0.000000 0.000000 0.000000 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 275 | 1.000000 9.000000 38.800000 150.000000 50.000000 1.000000 3.000000 6.000000 2.000000 5.000000 3.000000 2.000000 1.000000 1.000000 0.000000 0.000000 0.000000 50.000000 6.200000 0.000000 0.000000 0.000000 276 | 1.000000 1.000000 38.000000 36.000000 16.000000 3.000000 1.000000 1.000000 1.000000 4.000000 2.000000 2.000000 3.000000 3.000000 2.000000 3.000000 0.000000 37.000000 75.000000 2.000000 1.000000 0.000000 277 | 2.000000 1.000000 36.900000 50.000000 40.000000 2.000000 3.000000 3.000000 1.000000 1.000000 3.000000 2.000000 3.000000 1.000000 7.000000 0.000000 0.000000 37.500000 6.500000 0.000000 0.000000 1.000000 278 | 2.000000 1.000000 37.800000 40.000000 16.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 1.000000 1.000000 37.000000 6.800000 0.000000 0.000000 1.000000 279 | 2.000000 1.000000 38.200000 56.000000 40.000000 4.000000 3.000000 1.000000 1.000000 2.000000 4.000000 3.000000 2.000000 2.000000 7.500000 0.000000 0.000000 47.000000 7.200000 1.000000 2.500000 1.000000 280 | 1.000000 1.000000 38.600000 48.000000 12.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 36.000000 67.000000 0.000000 0.000000 1.000000 281 | 2.000000 1.000000 40.000000 78.000000 0.000000 3.000000 3.000000 5.000000 1.000000 2.000000 3.000000 1.000000 1.000000 1.000000 0.000000 4.000000 1.000000 66.000000 6.500000 0.000000 0.000000 0.000000 282 | 1.000000 1.000000 0.000000 70.000000 16.000000 3.000000 4.000000 5.000000 2.000000 2.000000 3.000000 2.000000 2.000000 1.000000 0.000000 4.000000 5.000000 60.000000 7.500000 0.000000 0.000000 0.000000 283 | 1.000000 1.000000 38.200000 72.000000 18.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 35.000000 6.400000 0.000000 0.000000 1.000000 284 | 2.000000 1.000000 38.500000 54.000000 0.000000 1.000000 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000 2.000000 1.000000 0.000000 1.000000 0.000000 40.000000 6.800000 2.000000 7.000000 1.000000 285 | 1.000000 1.000000 38.500000 66.000000 24.000000 1.000000 1.000000 1.000000 1.000000 3.000000 3.000000 1.000000 2.000000 1.000000 0.000000 4.000000 5.000000 40.000000 6.700000 1.000000 0.000000 1.000000 286 | 2.000000 1.000000 37.800000 82.000000 12.000000 3.000000 1.000000 1.000000 2.000000 4.000000 0.000000 3.000000 1.000000 3.000000 0.000000 0.000000 0.000000 50.000000 7.000000 0.000000 0.000000 0.000000 287 | 2.000000 9.000000 39.500000 84.000000 30.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 28.000000 5.000000 0.000000 0.000000 1.000000 288 | 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 289 | 1.000000 1.000000 38.000000 50.000000 36.000000 0.000000 1.000000 1.000000 1.000000 3.000000 2.000000 2.000000 0.000000 0.000000 0.000000 3.000000 0.000000 39.000000 6.600000 1.000000 5.300000 1.000000 290 | 2.000000 1.000000 38.600000 45.000000 16.000000 2.000000 1.000000 2.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 43.000000 58.000000 0.000000 0.000000 1.000000 291 | 1.000000 1.000000 38.900000 80.000000 44.000000 3.000000 3.000000 3.000000 1.000000 2.000000 3.000000 3.000000 2.000000 2.000000 7.000000 3.000000 1.000000 54.000000 6.500000 3.000000 0.000000 0.000000 292 | 1.000000 1.000000 37.000000 66.000000 20.000000 1.000000 3.000000 2.000000 1.000000 4.000000 3.000000 3.000000 1.000000 0.000000 0.000000 1.000000 5.000000 35.000000 6.900000 2.000000 0.000000 0.000000 293 | 1.000000 1.000000 0.000000 78.000000 24.000000 3.000000 3.000000 3.000000 1.000000 0.000000 3.000000 0.000000 2.000000 1.000000 0.000000 0.000000 4.000000 43.000000 62.000000 0.000000 2.000000 0.000000 294 | 2.000000 1.000000 38.500000 40.000000 16.000000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 1.000000 0.000000 0.000000 0.000000 3.000000 2.000000 37.000000 67.000000 0.000000 0.000000 1.000000 295 | 1.000000 1.000000 0.000000 120.000000 70.000000 4.000000 0.000000 4.000000 2.000000 2.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000 55.000000 65.000000 0.000000 0.000000 0.000000 296 | 2.000000 1.000000 37.200000 72.000000 24.000000 3.000000 2.000000 4.000000 2.000000 4.000000 3.000000 3.000000 3.000000 1.000000 0.000000 4.000000 4.000000 44.000000 0.000000 3.000000 3.300000 0.000000 297 | 1.000000 1.000000 37.500000 72.000000 30.000000 4.000000 3.000000 4.000000 1.000000 4.000000 4.000000 3.000000 2.000000 1.000000 0.000000 3.000000 5.000000 60.000000 6.800000 0.000000 0.000000 0.000000 298 | 1.000000 1.000000 36.500000 100.000000 24.000000 3.000000 3.000000 3.000000 1.000000 3.000000 3.000000 3.000000 3.000000 1.000000 0.000000 4.000000 4.000000 50.000000 6.000000 3.000000 3.400000 1.000000 299 | 1.000000 1.000000 37.200000 40.000000 20.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.000000 1.000000 36.000000 62.000000 1.000000 1.000000 0.000000 -------------------------------------------------------------------------------- /5.逻辑回归/testSet.txt: -------------------------------------------------------------------------------- 1 | -0.017612 14.053064 0 2 | -1.395634 4.662541 1 3 | -0.752157 6.538620 0 4 | -1.322371 7.152853 0 5 | 0.423363 11.054677 0 6 | 0.406704 7.067335 1 7 | 0.667394 12.741452 0 8 | -2.460150 6.866805 1 9 | 0.569411 9.548755 0 10 | -0.026632 10.427743 0 11 | 0.850433 6.920334 1 12 | 1.347183 13.175500 0 13 | 1.176813 3.167020 1 14 | -1.781871 9.097953 0 15 | -0.566606 5.749003 1 16 | 0.931635 1.589505 1 17 | -0.024205 6.151823 1 18 | -0.036453 2.690988 1 19 | -0.196949 0.444165 1 20 | 1.014459 5.754399 1 21 | 1.985298 3.230619 1 22 | -1.693453 -0.557540 1 23 | -0.576525 11.778922 0 24 | -0.346811 -1.678730 1 25 | -2.124484 2.672471 1 26 | 1.217916 9.597015 0 27 | -0.733928 9.098687 0 28 | -3.642001 -1.618087 1 29 | 0.315985 3.523953 1 30 | 1.416614 9.619232 0 31 | -0.386323 3.989286 1 32 | 0.556921 8.294984 1 33 | 1.224863 11.587360 0 34 | -1.347803 -2.406051 1 35 | 1.196604 4.951851 1 36 | 0.275221 9.543647 0 37 | 0.470575 9.332488 0 38 | -1.889567 9.542662 0 39 | -1.527893 12.150579 0 40 | -1.185247 11.309318 0 41 | -0.445678 3.297303 1 42 | 1.042222 6.105155 1 43 | -0.618787 10.320986 0 44 | 1.152083 0.548467 1 45 | 0.828534 2.676045 1 46 | -1.237728 10.549033 0 47 | -0.683565 -2.166125 1 48 | 0.229456 5.921938 1 49 | -0.959885 11.555336 0 50 | 0.492911 10.993324 0 51 | 0.184992 8.721488 0 52 | -0.355715 10.325976 0 53 | -0.397822 8.058397 0 54 | 0.824839 13.730343 0 55 | 1.507278 5.027866 1 56 | 0.099671 6.835839 1 57 | -0.344008 10.717485 0 58 | 1.785928 7.718645 1 59 | -0.918801 11.560217 0 60 | -0.364009 4.747300 1 61 | -0.841722 4.119083 1 62 | 0.490426 1.960539 1 63 | -0.007194 9.075792 0 64 | 0.356107 12.447863 0 65 | 0.342578 12.281162 0 66 | -0.810823 -1.466018 1 67 | 2.530777 6.476801 1 68 | 1.296683 11.607559 0 69 | 0.475487 12.040035 0 70 | -0.783277 11.009725 0 71 | 0.074798 11.023650 0 72 | -1.337472 0.468339 1 73 | -0.102781 13.763651 0 74 | -0.147324 2.874846 1 75 | 0.518389 9.887035 0 76 | 1.015399 7.571882 0 77 | -1.658086 -0.027255 1 78 | 1.319944 2.171228 1 79 | 2.056216 5.019981 1 80 | -0.851633 4.375691 1 81 | -1.510047 6.061992 0 82 | -1.076637 -3.181888 1 83 | 1.821096 10.283990 0 84 | 3.010150 8.401766 1 85 | -1.099458 1.688274 1 86 | -0.834872 -1.733869 1 87 | -0.846637 3.849075 1 88 | 1.400102 12.628781 0 89 | 1.752842 5.468166 1 90 | 0.078557 0.059736 1 91 | 0.089392 -0.715300 1 92 | 1.825662 12.693808 0 93 | 0.197445 9.744638 0 94 | 0.126117 0.922311 1 95 | -0.679797 1.220530 1 96 | 0.677983 2.556666 1 97 | 0.761349 10.693862 0 98 | -2.168791 0.143632 1 99 | 1.388610 9.341997 0 100 | 0.317029 14.739025 0 101 | -------------------------------------------------------------------------------- /5.逻辑回归/新建 Microsoft Word 文档.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/5.逻辑回归/新建 Microsoft Word 文档.docx -------------------------------------------------------------------------------- /6.支持向量机/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /6.支持向量机/sasvm.py: -------------------------------------------------------------------------------- 1 | 2 | """ 3 | Created on Mon Oct 21 18:09:28 2019 4 | 5 | @author: 韩琳琳 6 | """ 7 | 8 | from numpy import * 9 | from time import sleep 10 | 11 | def loadDataSet(fileName): 12 | dataMat = []; labelMat = [] 13 | fr = open(fileName) 14 | for line in fr.readlines(): 15 | lineArr = line.strip().split('\t') 16 | dataMat.append([float(lineArr[0]), float(lineArr[1])]) 17 | labelMat.append(float(lineArr[2])) 18 | return dataMat,labelMat 19 | 20 | def selectJrand(i,m): 21 | j=i #we want to select any J not equal to i 22 | while (j==i): 23 | j = int(random.uniform(0,m)) 24 | return j 25 | 26 | def clipAlpha(aj,H,L): 27 | if aj > H: 28 | aj = H 29 | if L > aj: 30 | aj = L 31 | return aj 32 | 33 | def smoSimple(dataMatIn, classLabels, C, toler, maxIter): 34 | dataMatrix = mat(dataMatIn); labelMat = mat(classLabels).transpose() 35 | b = 0; m,n = shape(dataMatrix) 36 | alphas = mat(zeros((m,1))) 37 | iter = 0 38 | while (iter < maxIter): 39 | alphaPairsChanged = 0 40 | for i in range(m): 41 | fXi = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b 42 | Ei = fXi - float(labelMat[i])#if checks if an example violates KKT conditions 43 | if ((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)): 44 | j = selectJrand(i,m) 45 | fXj = float(multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[j,:].T)) + b 46 | Ej = fXj - float(labelMat[j]) 47 | alphaIold = alphas[i].copy(); alphaJold = alphas[j].copy(); 48 | if (labelMat[i] != labelMat[j]): 49 | L = max(0, alphas[j] - alphas[i]) 50 | H = min(C, C + alphas[j] - alphas[i]) 51 | else: 52 | L = max(0, alphas[j] + alphas[i] - C) 53 | H = min(C, alphas[j] + alphas[i]) 54 | if L==H: print "L==H"; continue 55 | eta = 2.0 * dataMatrix[i,:]*dataMatrix[j,:].T - dataMatrix[i,:]*dataMatrix[i,:].T - dataMatrix[j,:]*dataMatrix[j,:].T 56 | if eta >= 0: print "eta>=0"; continue 57 | alphas[j] -= labelMat[j]*(Ei - Ej)/eta 58 | alphas[j] = clipAlpha(alphas[j],H,L) 59 | if (abs(alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; continue 60 | alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j])#update i by the same amount as j 61 | #the update is in the oppostie direction 62 | b1 = b - Ei- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[i,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[i,:]*dataMatrix[j,:].T 63 | b2 = b - Ej- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[j,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[j,:]*dataMatrix[j,:].T 64 | if (0 < alphas[i]) and (C > alphas[i]): b = b1 65 | elif (0 < alphas[j]) and (C > alphas[j]): b = b2 66 | else: b = (b1 + b2)/2.0 67 | alphaPairsChanged += 1 68 | print "iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 69 | if (alphaPairsChanged == 0): iter += 1 70 | else: iter = 0 71 | print "iteration number: %d" % iter 72 | return b,alphas 73 | 74 | def kernelTrans(X, A, kTup): #calc the kernel or transform data to a higher dimensional space 75 | m,n = shape(X) 76 | K = mat(zeros((m,1))) 77 | if kTup[0]=='lin': K = X * A.T #linear kernel 78 | elif kTup[0]=='rbf': 79 | for j in range(m): 80 | deltaRow = X[j,:] - A 81 | K[j] = deltaRow*deltaRow.T 82 | K = exp(K/(-1*kTup[1]**2)) #divide in NumPy is element-wise not matrix like Matlab 83 | else: raise NameError('Houston We Have a Problem -- \ 84 | That Kernel is not recognized') 85 | return K 86 | 87 | class optStruct: 88 | def __init__(self,dataMatIn, classLabels, C, toler, kTup): # Initialize the structure with the parameters 89 | self.X = dataMatIn 90 | self.labelMat = classLabels 91 | self.C = C 92 | self.tol = toler 93 | self.m = shape(dataMatIn)[0] 94 | self.alphas = mat(zeros((self.m,1))) 95 | self.b = 0 96 | self.eCache = mat(zeros((self.m,2))) #first column is valid flag 97 | self.K = mat(zeros((self.m,self.m))) 98 | for i in range(self.m): 99 | self.K[:,i] = kernelTrans(self.X, self.X[i,:], kTup) 100 | 101 | def calcEk(oS, k): 102 | fXk = float(multiply(oS.alphas,oS.labelMat).T*oS.K[:,k] + oS.b) 103 | Ek = fXk - float(oS.labelMat[k]) 104 | return Ek 105 | 106 | def selectJ(i, oS, Ei): #this is the second choice -heurstic, and calcs Ej 107 | maxK = -1; maxDeltaE = 0; Ej = 0 108 | oS.eCache[i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E 109 | validEcacheList = nonzero(oS.eCache[:,0].A)[0] 110 | if (len(validEcacheList)) > 1: 111 | for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E 112 | if k == i: continue #don't calc for i, waste of time 113 | Ek = calcEk(oS, k) 114 | deltaE = abs(Ei - Ek) 115 | if (deltaE > maxDeltaE): 116 | maxK = k; maxDeltaE = deltaE; Ej = Ek 117 | return maxK, Ej 118 | else: #in this case (first time around) we don't have any valid eCache values 119 | j = selectJrand(i, oS.m) 120 | Ej = calcEk(oS, j) 121 | return j, Ej 122 | 123 | def updateEk(oS, k):#after any alpha has changed update the new value in the cache 124 | Ek = calcEk(oS, k) 125 | oS.eCache[k] = [1,Ek] 126 | 127 | def innerL(i, oS): 128 | Ei = calcEk(oS, i) 129 | if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)): 130 | j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand 131 | alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy(); 132 | if (oS.labelMat[i] != oS.labelMat[j]): 133 | L = max(0, oS.alphas[j] - oS.alphas[i]) 134 | H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) 135 | else: 136 | L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) 137 | H = min(oS.C, oS.alphas[j] + oS.alphas[i]) 138 | if L==H: print "L==H"; return 0 139 | eta = 2.0 * oS.K[i,j] - oS.K[i,i] - oS.K[j,j] #changed for kernel 140 | if eta >= 0: print "eta>=0"; return 0 141 | oS.alphas[j] -= oS.labelMat[j]*(Ei - Ej)/eta 142 | oS.alphas[j] = clipAlpha(oS.alphas[j],H,L) 143 | updateEk(oS, j) #added this for the Ecache 144 | if (abs(oS.alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; return 0 145 | oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#update i by the same amount as j 146 | updateEk(oS, i) #added this for the Ecache #the update is in the oppostie direction 147 | b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,i] - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[i,j] 148 | b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.K[i,j]- oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.K[j,j] 149 | if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 150 | elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 151 | else: oS.b = (b1 + b2)/2.0 152 | return 1 153 | else: return 0 154 | 155 | def smoP(dataMatIn, classLabels, C, toler, maxIter,kTup=('lin', 0)): #full Platt SMO 156 | oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler, kTup) 157 | iter = 0 158 | entireSet = True; alphaPairsChanged = 0 159 | while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): 160 | alphaPairsChanged = 0 161 | if entireSet: #go over all 162 | for i in range(oS.m): 163 | alphaPairsChanged += innerL(i,oS) 164 | print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 165 | iter += 1 166 | else:#go over non-bound (railed) alphas 167 | nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] 168 | for i in nonBoundIs: 169 | alphaPairsChanged += innerL(i,oS) 170 | print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 171 | iter += 1 172 | if entireSet: entireSet = False #toggle entire set loop 173 | elif (alphaPairsChanged == 0): entireSet = True 174 | print "iteration number: %d" % iter 175 | return oS.b,oS.alphas 176 | 177 | def calcWs(alphas,dataArr,classLabels): 178 | X = mat(dataArr); labelMat = mat(classLabels).transpose() 179 | m,n = shape(X) 180 | w = zeros((n,1)) 181 | for i in range(m): 182 | w += multiply(alphas[i]*labelMat[i],X[i,:].T) 183 | return w 184 | 185 | def testRbf(k1=1.3): 186 | dataArr,labelArr = loadDataSet('testSetRBF.txt') 187 | b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) #C=200 important 188 | datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 189 | svInd=nonzero(alphas.A>0)[0] 190 | sVs=datMat[svInd] #get matrix of only support vectors 191 | labelSV = labelMat[svInd]; 192 | print "there are %d Support Vectors" % shape(sVs)[0] 193 | m,n = shape(datMat) 194 | errorCount = 0 195 | for i in range(m): 196 | kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1)) 197 | predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b 198 | if sign(predict)!=sign(labelArr[i]): errorCount += 1 199 | print "the training error rate is: %f" % (float(errorCount)/m) 200 | dataArr,labelArr = loadDataSet('testSetRBF2.txt') 201 | errorCount = 0 202 | datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 203 | m,n = shape(datMat) 204 | for i in range(m): 205 | kernelEval = kernelTrans(sVs,datMat[i,:],('rbf', k1)) 206 | predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b 207 | if sign(predict)!=sign(labelArr[i]): errorCount += 1 208 | print "the test error rate is: %f" % (float(errorCount)/m) 209 | 210 | def img2vector(filename): 211 | returnVect = zeros((1,1024)) 212 | fr = open(filename) 213 | for i in range(32): 214 | lineStr = fr.readline() 215 | for j in range(32): 216 | returnVect[0,32*i+j] = int(lineStr[j]) 217 | return returnVect 218 | 219 | def loadImages(dirName): 220 | from os import listdir 221 | hwLabels = [] 222 | trainingFileList = listdir(dirName) #load the training set 223 | m = len(trainingFileList) 224 | trainingMat = zeros((m,1024)) 225 | for i in range(m): 226 | fileNameStr = trainingFileList[i] 227 | fileStr = fileNameStr.split('.')[0] #take off .txt 228 | classNumStr = int(fileStr.split('_')[0]) 229 | if classNumStr == 9: hwLabels.append(-1) 230 | else: hwLabels.append(1) 231 | trainingMat[i,:] = img2vector('%s/%s' % (dirName, fileNameStr)) 232 | return trainingMat, hwLabels 233 | 234 | def testDigits(kTup=('rbf', 10)): 235 | dataArr,labelArr = loadImages('trainingDigits') 236 | b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup) 237 | datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 238 | svInd=nonzero(alphas.A>0)[0] 239 | sVs=datMat[svInd] 240 | labelSV = labelMat[svInd]; 241 | print "there are %d Support Vectors" % shape(sVs)[0] 242 | m,n = shape(datMat) 243 | errorCount = 0 244 | for i in range(m): 245 | kernelEval = kernelTrans(sVs,datMat[i,:],kTup) 246 | predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b 247 | if sign(predict)!=sign(labelArr[i]): errorCount += 1 248 | print "the training error rate is: %f" % (float(errorCount)/m) 249 | dataArr,labelArr = loadImages('testDigits') 250 | errorCount = 0 251 | datMat=mat(dataArr); labelMat = mat(labelArr).transpose() 252 | m,n = shape(datMat) 253 | for i in range(m): 254 | kernelEval = kernelTrans(sVs,datMat[i,:],kTup) 255 | predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b 256 | if sign(predict)!=sign(labelArr[i]): errorCount += 1 257 | print "the test error rate is: %f" % (float(errorCount)/m) 258 | 259 | 260 | '''#######******************************** 261 | Non-Kernel VErsions below 262 | '''#######******************************** 263 | 264 | class optStructK: 265 | def __init__(self,dataMatIn, classLabels, C, toler): # Initialize the structure with the parameters 266 | self.X = dataMatIn 267 | self.labelMat = classLabels 268 | self.C = C 269 | self.tol = toler 270 | self.m = shape(dataMatIn)[0] 271 | self.alphas = mat(zeros((self.m,1))) 272 | self.b = 0 273 | self.eCache = mat(zeros((self.m,2))) #first column is valid flag 274 | 275 | def calcEkK(oS, k): 276 | fXk = float(multiply(oS.alphas,oS.labelMat).T*(oS.X*oS.X[k,:].T)) + oS.b 277 | Ek = fXk - float(oS.labelMat[k]) 278 | return Ek 279 | 280 | def selectJK(i, oS, Ei): #this is the second choice -heurstic, and calcs Ej 281 | maxK = -1; maxDeltaE = 0; Ej = 0 282 | oS.eCache[i] = [1,Ei] #set valid #choose the alpha that gives the maximum delta E 283 | validEcacheList = nonzero(oS.eCache[:,0].A)[0] 284 | if (len(validEcacheList)) > 1: 285 | for k in validEcacheList: #loop through valid Ecache values and find the one that maximizes delta E 286 | if k == i: continue #don't calc for i, waste of time 287 | Ek = calcEk(oS, k) 288 | deltaE = abs(Ei - Ek) 289 | if (deltaE > maxDeltaE): 290 | maxK = k; maxDeltaE = deltaE; Ej = Ek 291 | return maxK, Ej 292 | else: #in this case (first time around) we don't have any valid eCache values 293 | j = selectJrand(i, oS.m) 294 | Ej = calcEk(oS, j) 295 | return j, Ej 296 | 297 | def updateEkK(oS, k):#after any alpha has changed update the new value in the cache 298 | Ek = calcEk(oS, k) 299 | oS.eCache[k] = [1,Ek] 300 | 301 | def innerLK(i, oS): 302 | Ei = calcEk(oS, i) 303 | if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i]*Ei > oS.tol) and (oS.alphas[i] > 0)): 304 | j,Ej = selectJ(i, oS, Ei) #this has been changed from selectJrand 305 | alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy(); 306 | if (oS.labelMat[i] != oS.labelMat[j]): 307 | L = max(0, oS.alphas[j] - oS.alphas[i]) 308 | H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) 309 | else: 310 | L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) 311 | H = min(oS.C, oS.alphas[j] + oS.alphas[i]) 312 | if L==H: print "L==H"; return 0 313 | eta = 2.0 * oS.X[i,:]*oS.X[j,:].T - oS.X[i,:]*oS.X[i,:].T - oS.X[j,:]*oS.X[j,:].T 314 | if eta >= 0: print "eta>=0"; return 0 315 | oS.alphas[j] -= oS.labelMat[j]*(Ei - Ej)/eta 316 | oS.alphas[j] = clipAlpha(oS.alphas[j],H,L) 317 | updateEk(oS, j) #added this for the Ecache 318 | if (abs(oS.alphas[j] - alphaJold) < 0.00001): print "j not moving enough"; return 0 319 | oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j])#update i by the same amount as j 320 | updateEk(oS, i) #added this for the Ecache #the update is in the oppostie direction 321 | b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[i,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[i,:]*oS.X[j,:].T 322 | b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[j,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[j,:]*oS.X[j,:].T 323 | if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 324 | elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 325 | else: oS.b = (b1 + b2)/2.0 326 | return 1 327 | else: return 0 328 | 329 | def smoPK(dataMatIn, classLabels, C, toler, maxIter): #full Platt SMO 330 | oS = optStruct(mat(dataMatIn),mat(classLabels).transpose(),C,toler) 331 | iter = 0 332 | entireSet = True; alphaPairsChanged = 0 333 | while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): 334 | alphaPairsChanged = 0 335 | if entireSet: #go over all 336 | for i in range(oS.m): 337 | alphaPairsChanged += innerL(i,oS) 338 | print "fullSet, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 339 | iter += 1 340 | else:#go over non-bound (railed) alphas 341 | nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] 342 | for i in nonBoundIs: 343 | alphaPairsChanged += innerL(i,oS) 344 | print "non-bound, iter: %d i:%d, pairs changed %d" % (iter,i,alphaPairsChanged) 345 | iter += 1 346 | if entireSet: entireSet = False #toggle entire set loop 347 | elif (alphaPairsChanged == 0): entireSet = True 348 | print "iteration number: %d" % iter 349 | return oS.b,oS.alphas 350 | -------------------------------------------------------------------------------- /6.支持向量机/svm.py: -------------------------------------------------------------------------------- 1 | 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | import random 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | def loadDataSet(fileName): 16 | dataMat = []; labelMat = [] 17 | fr = open(fileName) 18 | for line in fr.readlines(): #逐行读取,滤除空格等 19 | lineArr = line.strip().split('\t') 20 | dataMat.append([float(lineArr[0]), float(lineArr[1])]) #添加数据 21 | labelMat.append(float(lineArr[2])) #添加标签 22 | return dataMat,labelMat 23 | 24 | 25 | """ 26 | 函数说明:随机选择alpha 27 | Parameters: 28 | i - alpha_i的索引值 29 | m - alpha参数个数 30 | Returns: 31 | j - alpha_j的索引值 32 | 33 | """ 34 | def selectJrand(i, m): 35 | j = i #选择一个不等于i的j 36 | while (j == i): 37 | j = int(random.uniform(0, m)) 38 | return j 39 | 40 | """ 41 | 函数说明:修剪alpha 42 | Parameters: 43 | aj - alpha_j值 44 | H - alpha上限 45 | L - alpha下限 46 | Returns: 47 | aj - alpah值 48 | 49 | """ 50 | def clipAlpha(aj,H,L): 51 | if aj > H: 52 | aj = H 53 | if L > aj: 54 | aj = L 55 | return aj 56 | 57 | """ 58 | 函数说明:数据可视化 59 | Parameters: 60 | dataMat - 数据矩阵 61 | labelMat - 数据标签 62 | """ 63 | def showDataSet(dataMat, labelMat): 64 | data_plus = [] #正样本 65 | data_minus = [] #负样本 66 | for i in range(len(dataMat)): 67 | if labelMat[i] > 0: 68 | data_plus.append(dataMat[i]) 69 | else: 70 | data_minus.append(dataMat[i]) 71 | data_plus_np = np.array(data_plus) #转换为numpy矩阵 72 | data_minus_np = np.array(data_minus) #转换为numpy矩阵 73 | plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1]) #正样本散点图 74 | plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1]) #负样本散点图 75 | plt.show() 76 | 77 | 78 | """ 79 | 函数说明:简化版SMO算法 80 | Parameters: 81 | dataMatIn - 数据矩阵 82 | classLabels - 数据标签 83 | C - 松弛变量 84 | toler - 容错率 85 | maxIter - 最大迭代次数 86 | 87 | """ 88 | def smoSimple(dataMatIn, classLabels, C, toler, maxIter): 89 | #转换为numpy的mat存储 90 | dataMatrix = np.mat(dataMatIn); labelMat = np.mat(classLabels).transpose() 91 | #初始化b参数,统计dataMatrix的维度 92 | b = 0; m,n = np.shape(dataMatrix) 93 | #初始化alpha参数,设为0 94 | alphas = np.mat(np.zeros((m,1))) 95 | #初始化迭代次数 96 | iter_num = 0 97 | #最多迭代matIter次 98 | while (iter_num < maxIter): 99 | alphaPairsChanged = 0 100 | for i in range(m): 101 | #步骤1:计算误差Ei 102 | fXi = float(np.multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b 103 | Ei = fXi - float(labelMat[i]) 104 | #优化alpha,设定一定的容错率。 105 | if ((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)): 106 | #随机选择另一个与alpha_i成对优化的alpha_j 107 | j = selectJrand(i,m) 108 | #步骤1:计算误差Ej 109 | fXj = float(np.multiply(alphas,labelMat).T*(dataMatrix*dataMatrix[j,:].T)) + b 110 | Ej = fXj - float(labelMat[j]) 111 | #保存更新前的aplpha值,使用深拷贝 112 | alphaIold = alphas[i].copy(); alphaJold = alphas[j].copy(); 113 | #步骤2:计算上下界L和H 114 | if (labelMat[i] != labelMat[j]): 115 | L = max(0, alphas[j] - alphas[i]) 116 | H = min(C, C + alphas[j] - alphas[i]) 117 | else: 118 | L = max(0, alphas[j] + alphas[i] - C) 119 | H = min(C, alphas[j] + alphas[i]) 120 | if L==H: print("L==H"); continue 121 | #步骤3:计算eta 122 | eta = 2.0 * dataMatrix[i,:]*dataMatrix[j,:].T - dataMatrix[i,:]*dataMatrix[i,:].T - dataMatrix[j,:]*dataMatrix[j,:].T 123 | if eta >= 0: print("eta>=0"); continue 124 | #步骤4:更新alpha_j 125 | alphas[j] -= labelMat[j]*(Ei - Ej)/eta 126 | #步骤5:修剪alpha_j 127 | alphas[j] = clipAlpha(alphas[j],H,L) 128 | if (abs(alphas[j] - alphaJold) < 0.00001): print("alpha_j变化太小"); continue 129 | #步骤6:更新alpha_i 130 | alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j]) 131 | #步骤7:更新b_1和b_2 132 | b1 = b - Ei- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[i,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[i,:]*dataMatrix[j,:].T 133 | b2 = b - Ej- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i,:]*dataMatrix[j,:].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[j,:]*dataMatrix[j,:].T 134 | #步骤8:根据b_1和b_2更新b 135 | if (0 < alphas[i]) and (C > alphas[i]): b = b1 136 | elif (0 < alphas[j]) and (C > alphas[j]): b = b2 137 | else: b = (b1 + b2)/2.0 138 | #统计优化次数 139 | alphaPairsChanged += 1 140 | #打印统计信息 141 | print("第%d次迭代 样本:%d, alpha优化次数:%d" % (iter_num,i,alphaPairsChanged)) 142 | #更新迭代次数 143 | if (alphaPairsChanged == 0): iter_num += 1 144 | else: iter_num = 0 145 | print("迭代次数: %d" % iter_num) 146 | return b,alphas 147 | 148 | """ 149 | 函数说明:分类结果可视化 150 | Parameters: 151 | dataMat - 数据矩阵 152 | w - 直线法向量 153 | b - 直线解决 154 | 155 | """ 156 | def showClassifer(dataMat, w, b): 157 | #绘制样本点 158 | data_plus = [] #正样本 159 | data_minus = [] #负样本 160 | for i in range(len(dataMat)): 161 | if labelMat[i] > 0: 162 | data_plus.append(dataMat[i]) 163 | else: 164 | data_minus.append(dataMat[i]) 165 | data_plus_np = np.array(data_plus) #转换为numpy矩阵 166 | data_minus_np = np.array(data_minus) #转换为numpy矩阵 167 | plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1], s=30, alpha=0.7) #正样本散点图 168 | plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1], s=30, alpha=0.7) #负样本散点图 169 | #绘制直线 170 | x1 = max(dataMat)[0] 171 | x2 = min(dataMat)[0] 172 | a1, a2 = w 173 | b = float(b) 174 | a1 = float(a1[0]) 175 | a2 = float(a2[0]) 176 | y1, y2 = (-b- a1*x1)/a2, (-b - a1*x2)/a2 177 | plt.plot([x1, x2], [y1, y2]) 178 | #找出支持向量点 179 | for i, alpha in enumerate(alphas): 180 | if abs(alpha) > 0: 181 | x, y = dataMat[i] 182 | plt.scatter([x], [y], s=150, c='none', alpha=0.7, linewidth=1.5, edgecolor='red') 183 | plt.show() 184 | 185 | 186 | """ 187 | 函数说明:计算w 188 | Parameters: 189 | dataMat - 数据矩阵 190 | labelMat - 数据标签 191 | alphas - alphas值 192 | 193 | """ 194 | def get_w(dataMat, labelMat, alphas): 195 | alphas, dataMat, labelMat = np.array(alphas), np.array(dataMat), np.array(labelMat) 196 | w = np.dot((np.tile(labelMat.reshape(1, -1).T, (1, 2)) * dataMat).T, alphas) 197 | return w.tolist() 198 | 199 | 200 | if __name__ == '__main__': 201 | dataMat, labelMat = loadDataSet('testSet.txt') 202 | b,alphas = smoSimple(dataMat, labelMat, 0.6, 0.001, 40) 203 | w = get_w(dataMat, labelMat, alphas) 204 | showClassifer(dataMat, w, b) 205 | 206 | 207 | class optStruct: 208 | """ 209 | 数据结构,维护所有需要操作的值 210 | Parameters: 211 | dataMatIn - 数据矩阵 212 | classLabels - 数据标签 213 | C - 松弛变量 214 | toler - 容错率 215 | """ 216 | def __init__(self, dataMatIn, classLabels, C, toler): 217 | self.X = dataMatIn #数据矩阵 218 | self.labelMat = classLabels #数据标签 219 | self.C = C #松弛变量 220 | self.tol = toler #容错率 221 | self.m = np.shape(dataMatIn)[0] #数据矩阵行数 222 | self.alphas = np.mat(np.zeros((self.m,1))) #根据矩阵行数初始化alpha参数为0 223 | self.b = 0 #初始化b参数为0 224 | self.eCache = np.mat(np.zeros((self.m,2))) #根据矩阵行数初始化虎误差缓存,第一列为是否有效的标志位,第二列为实际的误差E的值。 225 | 226 | def loadDataSet(fileName): 227 | """ 228 | 读取数据 229 | Parameters: 230 | fileName - 文件名 231 | Returns: 232 | dataMat - 数据矩阵 233 | labelMat - 数据标签 234 | """ 235 | dataMat = []; labelMat = [] 236 | fr = open(fileName) 237 | for line in fr.readlines(): #逐行读取,滤除空格等 238 | lineArr = line.strip().split('\t') 239 | dataMat.append([float(lineArr[0]), float(lineArr[1])]) #添加数据 240 | labelMat.append(float(lineArr[2])) #添加标签 241 | return dataMat,labelMat 242 | 243 | def calcEk(oS, k): 244 | """ 245 | 计算误差 246 | Parameters: 247 | oS - 数据结构 248 | k - 标号为k的数据 249 | Returns: 250 | Ek - 标号为k的数据误差 251 | """ 252 | fXk = float(np.multiply(oS.alphas,oS.labelMat).T*(oS.X*oS.X[k,:].T) + oS.b) 253 | Ek = fXk - float(oS.labelMat[k]) 254 | return Ek 255 | 256 | def selectJrand(i, m): 257 | """ 258 | 函数说明:随机选择alpha_j的索引值 259 | Parameters: 260 | i - alpha_i的索引值 261 | m - alpha参数个数 262 | Returns: 263 | j - alpha_j的索引值 264 | """ 265 | j = i #选择一个不等于i的j 266 | while (j == i): 267 | j = int(random.uniform(0, m)) 268 | return j 269 | 270 | def selectJ(i, oS, Ei): 271 | """ 272 | 内循环启发方式2 273 | Parameters: 274 | i - 标号为i的数据的索引值 275 | oS - 数据结构 276 | Ei - 标号为i的数据误差 277 | Returns: 278 | j, maxK - 标号为j或maxK的数据的索引值 279 | Ej - 标号为j的数据误差 280 | """ 281 | maxK = -1; maxDeltaE = 0; Ej = 0 #初始化 282 | oS.eCache[i] = [1,Ei] #根据Ei更新误差缓存 283 | validEcacheList = np.nonzero(oS.eCache[:,0].A)[0] #返回误差不为0的数据的索引值 284 | if (len(validEcacheList)) > 1: #有不为0的误差 285 | for k in validEcacheList: #遍历,找到最大的Ek 286 | if k == i: continue #不计算i,浪费时间 287 | Ek = calcEk(oS, k) #计算Ek 288 | deltaE = abs(Ei - Ek) #计算|Ei-Ek| 289 | if (deltaE > maxDeltaE): #找到maxDeltaE 290 | maxK = k; maxDeltaE = deltaE; Ej = Ek 291 | return maxK, Ej #返回maxK,Ej 292 | else: #没有不为0的误差 293 | j = selectJrand(i, oS.m) #随机选择alpha_j的索引值 294 | Ej = calcEk(oS, j) #计算Ej 295 | return j, Ej #j,Ej 296 | 297 | def updateEk(oS, k): 298 | """ 299 | 计算Ek,并更新误差缓存 300 | Parameters: 301 | oS - 数据结构 302 | k - 标号为k的数据的索引值 303 | Returns: 304 | 无 305 | """ 306 | Ek = calcEk(oS, k) #计算Ek 307 | oS.eCache[k] = [1,Ek] #更新误差缓存 308 | 309 | 310 | def clipAlpha(aj,H,L): 311 | """ 312 | 修剪alpha_j 313 | Parameters: 314 | aj - alpha_j的值 315 | H - alpha上限 316 | L - alpha下限 317 | Returns: 318 | aj - 修剪后的alpah_j的值 319 | """ 320 | if aj > H: 321 | aj = H 322 | if L > aj: 323 | aj = L 324 | return aj 325 | 326 | def innerL(i, oS): 327 | """ 328 | 优化的SMO算法 329 | Parameters: 330 | i - 标号为i的数据的索引值 331 | oS - 数据结构 332 | Returns: 333 | 1 - 有任意一对alpha值发生变化 334 | 0 - 没有任意一对alpha值发生变化或变化太小 335 | """ 336 | #步骤1:计算误差Ei 337 | Ei = calcEk(oS, i) 338 | #优化alpha,设定一定的容错率。 339 | if ((oS.labelMat[i] * Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or ((oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): 340 | #使用内循环启发方式2选择alpha_j,并计算Ej 341 | j,Ej = selectJ(i, oS, Ei) 342 | #保存更新前的aplpha值,使用深拷贝 343 | alphaIold = oS.alphas[i].copy(); alphaJold = oS.alphas[j].copy(); 344 | #步骤2:计算上下界L和H 345 | if (oS.labelMat[i] != oS.labelMat[j]): 346 | L = max(0, oS.alphas[j] - oS.alphas[i]) 347 | H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) 348 | else: 349 | L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) 350 | H = min(oS.C, oS.alphas[j] + oS.alphas[i]) 351 | if L == H: 352 | print("L==H") 353 | return 0 354 | #步骤3:计算eta 355 | eta = 2.0 * oS.X[i,:] * oS.X[j,:].T - oS.X[i,:] * oS.X[i,:].T - oS.X[j,:] * oS.X[j,:].T 356 | if eta >= 0: 357 | print("eta>=0") 358 | return 0 359 | #步骤4:更新alpha_j 360 | oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej)/eta 361 | #步骤5:修剪alpha_j 362 | oS.alphas[j] = clipAlpha(oS.alphas[j],H,L) 363 | #更新Ej至误差缓存 364 | updateEk(oS, j) 365 | if (abs(oS.alphas[j] - alphaJold) < 0.00001): 366 | print("alpha_j变化太小") 367 | return 0 368 | #步骤6:更新alpha_i 369 | oS.alphas[i] += oS.labelMat[j]*oS.labelMat[i]*(alphaJold - oS.alphas[j]) 370 | #更新Ei至误差缓存 371 | updateEk(oS, i) 372 | #步骤7:更新b_1和b_2 373 | b1 = oS.b - Ei- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[i,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[i,:]*oS.X[j,:].T 374 | b2 = oS.b - Ej- oS.labelMat[i]*(oS.alphas[i]-alphaIold)*oS.X[i,:]*oS.X[j,:].T - oS.labelMat[j]*(oS.alphas[j]-alphaJold)*oS.X[j,:]*oS.X[j,:].T 375 | #步骤8:根据b_1和b_2更新b 376 | if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 377 | elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 378 | else: oS.b = (b1 + b2)/2.0 379 | return 1 380 | else: 381 | return 0 382 | 383 | def smoP(dataMatIn, classLabels, C, toler, maxIter): 384 | """ 385 | 完整的线性SMO算法 386 | Parameters: 387 | dataMatIn - 数据矩阵 388 | classLabels - 数据标签 389 | C - 松弛变量 390 | toler - 容错率 391 | maxIter - 最大迭代次数 392 | Returns: 393 | oS.b - SMO算法计算的b 394 | oS.alphas - SMO算法计算的alphas 395 | """ 396 | oS = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler) #初始化数据结构 397 | iter = 0 #初始化当前迭代次数 398 | entireSet = True; alphaPairsChanged = 0 399 | while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): #遍历整个数据集都alpha也没有更新或者超过最大迭代次数,则退出循环 400 | alphaPairsChanged = 0 401 | if entireSet: #遍历整个数据集 402 | for i in range(oS.m): 403 | alphaPairsChanged += innerL(i,oS) #使用优化的SMO算法 404 | print("全样本遍历:第%d次迭代 样本:%d, alpha优化次数:%d" % (iter,i,alphaPairsChanged)) 405 | iter += 1 406 | else: #遍历非边界值 407 | nonBoundIs = np.nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] #遍历不在边界0和C的alpha 408 | for i in nonBoundIs: 409 | alphaPairsChanged += innerL(i,oS) 410 | print("非边界遍历:第%d次迭代 样本:%d, alpha优化次数:%d" % (iter,i,alphaPairsChanged)) 411 | iter += 1 412 | if entireSet: #遍历一次后改为非边界遍历 413 | entireSet = False 414 | elif (alphaPairsChanged == 0): #如果alpha没有更新,计算全样本遍历 415 | entireSet = True 416 | print("迭代次数: %d" % iter) 417 | return oS.b,oS.alphas #返回SMO算法计算的b和alphas 418 | 419 | 420 | def showClassifer(dataMat, classLabels, w, b): 421 | """ 422 | 分类结果可视化 423 | Parameters: 424 | dataMat - 数据矩阵 425 | w - 直线法向量 426 | b - 直线解决 427 | Returns: 428 | 无 429 | """ 430 | #绘制样本点 431 | data_plus = [] #正样本 432 | data_minus = [] #负样本 433 | for i in range(len(dataMat)): 434 | if classLabels[i] > 0: 435 | data_plus.append(dataMat[i]) 436 | else: 437 | data_minus.append(dataMat[i]) 438 | data_plus_np = np.array(data_plus) #转换为numpy矩阵 439 | data_minus_np = np.array(data_minus) #转换为numpy矩阵 440 | plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1], s=30, alpha=0.7) #正样本散点图 441 | plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1], s=30, alpha=0.7) #负样本散点图 442 | #绘制直线 443 | x1 = max(dataMat)[0] 444 | x2 = min(dataMat)[0] 445 | a1, a2 = w 446 | b = float(b) 447 | a1 = float(a1[0]) 448 | a2 = float(a2[0]) 449 | y1, y2 = (-b- a1*x1)/a2, (-b - a1*x2)/a2 450 | plt.plot([x1, x2], [y1, y2]) 451 | #找出支持向量点 452 | for i, alpha in enumerate(alphas): 453 | if alpha > 0: 454 | x, y = dataMat[i] 455 | plt.scatter([x], [y], s=150, c='none', alpha=0.7, linewidth=1.5, edgecolor='red') 456 | plt.show() 457 | 458 | 459 | def calcWs(alphas,dataArr,classLabels): 460 | """ 461 | 计算w 462 | Parameters: 463 | dataArr - 数据矩阵 464 | classLabels - 数据标签 465 | alphas - alphas值 466 | Returns: 467 | w - 计算得到的w 468 | """ 469 | X = np.mat(dataArr); labelMat = np.mat(classLabels).transpose() 470 | m,n = np.shape(X) 471 | w = np.zeros((n,1)) 472 | for i in range(m): 473 | w += np.multiply(alphas[i]*labelMat[i],X[i,:].T) 474 | return w 475 | 476 | if __name__ == '__main__': 477 | dataArr, classLabels = loadDataSet('testSet.txt') 478 | b, alphas = smoP(dataArr, classLabels, 0.6, 0.001, 40) 479 | w = calcWs(alphas,dataArr, classLabels) 480 | showClassifer(dataArr, classLabels, w, b) 481 | -------------------------------------------------------------------------------- /7.Adaboost/AdaBoost.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/7.Adaboost/AdaBoost.pptx -------------------------------------------------------------------------------- /7.Adaboost/Adaboost.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sun Nov 10 20:27:59 2019 4 | 5 | @author: user 6 | """ 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | 10 | def loadSimpData(): 11 | datMat = np.matrix([[ 1. , 2.1], 12 | [ 1.5 , 1.6], 13 | [ 1.3, 1. ], 14 | [ 1. , 1. ], 15 | [ 2. , 1. ]]) 16 | classLabels = [1.0, 1.0, -1.0, -1.0, 1.0] 17 | return datMat,classLabels 18 | 19 | #数据可视化 20 | def showDataSet(dataMat, labelMat): 21 | data_plus = [] #正样本 22 | data_minus = [] #负样本 23 | for i in range(len(dataMat)): 24 | if labelMat[i] > 0: 25 | data_plus.append(dataMat[i]) 26 | else: 27 | data_minus.append(dataMat[i]) 28 | data_plus_np = np.array(data_plus) #转换为numpy矩阵 29 | data_minus_np = np.array(data_minus) #转换为numpy矩阵 30 | plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1]) #正样本散点图 31 | plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1]) #负样本散点图 32 | plt.show() 33 | 34 | if __name__ == '__main__': 35 | dataArr,classLabels = loadSimpData() 36 | showDataSet(dataArr,classLabels) 37 | 38 | 39 | def stumpClassify(dataMatrix, dimen, threshVal, threshIneq): 40 | retArray = np.ones((np.shape(dataMatrix)[0], 1)) # 初始化retArray为1 41 | if threshIneq == 'lt': 42 | retArray[dataMatrix[:, dimen] <= threshVal] = -1.0 # 如果小于阈值,则赋值为-1 43 | else: 44 | retArray[dataMatrix[:, dimen] > threshVal] = -1.0 # 如果大于阈值,则赋值为-1 45 | return retArray 46 | 47 | #找到数据集上最佳的单层决策树 48 | def buildStump(dataArr, classLabels, D): 49 | dataMatrix = np.mat(dataArr); 50 | labelMat = np.mat(classLabels).T 51 | m, n = np.shape(dataMatrix) 52 | numSteps = 10.0; 53 | bestStump = {}; 54 | bestClasEst = np.mat(np.zeros((m, 1))) 55 | minError = float('inf') # 最小误差初始化为正无穷大 56 | for i in range(n): # 遍历所有特征 57 | rangeMin = dataMatrix[:, i].min(); 58 | rangeMax = dataMatrix[:, i].max() # 找到特征中最小的值和最大值 59 | stepSize = (rangeMax - rangeMin) / numSteps # 计算步长 60 | for j in range(-1, int(numSteps) + 1): 61 | for inequal in ['lt', 'gt']: # 大于和小于的情况,均遍历。lt:less than,gt:greater than 62 | threshVal = (rangeMin + float(j) * stepSize) # 计算阈值 63 | predictedVals = stumpClassify(dataMatrix, i, threshVal, inequal) # 计算分类结果 64 | errArr = np.mat(np.ones((m, 1))) # 初始化误差矩阵 65 | errArr[predictedVals == labelMat] = 0 # 分类正确的,赋值为0 66 | weightedError = D.T * errArr # 计算误差 67 | print("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % ( 68 | i, threshVal, inequal, weightedError)) 69 | if weightedError < minError: # 找到误差最小的分类方式 70 | minError = weightedError 71 | bestClasEst = predictedVals.copy() 72 | bestStump['dim'] = i 73 | bestStump['thresh'] = threshVal 74 | bestStump['ineq'] = inequal 75 | return bestStump, minError, bestClasEst 76 | 77 | 78 | if __name__ == '__main__': 79 | dataArr, classLabels = loadSimpData() 80 | D = np.mat(np.ones((5, 1)) / 5) 81 | bestStump, minError, bestClasEst = buildStump(dataArr, classLabels, D) 82 | print('bestStump:\n', bestStump) 83 | print('minError:\n', minError) 84 | print('bestClasEst:\n', bestClasEst) 85 | 86 | 87 | 88 | #使用AdaBoost算法提升弱分类器性能 89 | def adaBoostTrainDS(dataArr, classLabels, numIt = 40): 90 | weakClassArr = [] 91 | m = np.shape(dataArr)[0] 92 | D = np.mat(np.ones((m, 1)) / m) #初始化权重 93 | aggClassEst = np.mat(np.zeros((m,1))) 94 | for i in range(numIt): 95 | bestStump, error, classEst = buildStump(dataArr, classLabels, D) #构建单层决策树 96 | alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16))) #计算弱学习算法权重alpha,使error不等于0,因为分母不能为0 97 | bestStump['alpha'] = alpha #存储弱学习算法权重 98 | weakClassArr.append(bestStump) #存储单层决策树 99 | expon = np.multiply(-1 * alpha * np.mat(classLabels).T, classEst) #计算e的指数项 100 | D = np.multiply(D, np.exp(expon)) 101 | D = D / D.sum() #根据样本权重公式,更新样本权重 102 | #计算AdaBoost误差,当误差为0的时候,退出循环 103 | aggClassEst += alpha * classEst #计算类别估计累计值 104 | aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(classLabels).T, np.ones((m,1))) #计算误差 105 | errorRate = aggErrors.sum() / m 106 | if errorRate == 0.0: break #误差为0,退出循环 107 | return weakClassArr, aggClassEst 108 | 109 | 110 | classifierArray=adaBoostTrainDS(dataArr, classLabels, numIt = 40) 111 | print(classifierArray) 112 | 113 | #测试算法 114 | def adaClassify(datToClass,classifierArr): 115 | dataMatrix = np.mat(datToClass) 116 | m = np.shape(dataMatrix)[0] 117 | aggClassEst = np.mat(np.zeros((m,1))) 118 | for i in range(len(classifierArr)): #遍历所有分类器,进行分类 119 | classEst = stumpClassify(dataMatrix, classifierArr[i]['dim'], classifierArr[i]['thresh'], classifierArr[i]['ineq']) 120 | aggClassEst += classifierArr[i]['alpha'] * classEst 121 | # print(aggClassEst) 122 | return np.sign(aggClassEst) 123 | 124 | print(adaClassify([[0,0],[5,5]],classifierArray[0])) -------------------------------------------------------------------------------- /7.Adaboost/readme.md: -------------------------------------------------------------------------------- 1 | # 理论部分 2 | boosting各分类器之间是并行训练的,它是关注被已有的分类器错分的那些数据来获得新的分类器,并且对每个分类器分配不同的权重, 3 | 每个权重代表的是该分类器对这一轮分类的成功度。其中最经典最流行的boosting集成方法是AdaBoost(adaptive boosting)。 4 | 主要过程为:1.给数据样本权重2.训练数据样本,得到分类器3.算每个分类器的错误率,给分类器分权重4.将分类器分错的样本权重值增加,分错的样本权重减小 5 | 5.用新样本训练数据,一个新的分类器再从第三步重新进行6.知道步骤三中分类器的错误为零,或者到达迭代次数就终止7.将所有弱分类器加权求和得到分类结果 6 | 7 | # 代码部分 8 | Adaboost代码,主要包括四部分: 9 | 一、构建单层决策树,仅基于单个特征来做决策。 10 | 本文选择了五个简单的数据,但是这五个数据简单的分类器一下子不能将他们分隔开,所以需要用到这章节讲到的集成分类器,来提高简单分类器的性能。 二、单层决策树生成函数 11 | 1.选择合适的分类方式 12 | 2.分类训练函数。第一个循环为了切割所有的特征值X1和X2,第二个循环是找到其中的一个特征值,改变步长对这一个特征值进行划分,第三个循环是对阈值的划分取一种情况看哪个好,阈值划分有两种,一种是左边为-1,右边为+1,另一种是左边为+1,右边为-1,看最后两种分类方式哪个好。 13 | 三、基于单层决策树的Adaboost训练过程 14 | 给出了完整的计算过程。 15 | 四、Adaboost分类函数 16 | 将弱分类器的训练过程从程序中抽取出来,然后应用到某个具体实例中去。 17 | dataMatrix:数据矩阵; dimen:第n特征; threshVal:阈值; threshIneq:标志不等号 18 | numit:迭代次数;D:样本权重;a:分类器权重 19 | -------------------------------------------------------------------------------- /7.Adaboost/常见代码.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/7.Adaboost/常见代码.pptx -------------------------------------------------------------------------------- /8.回归/readme.md: -------------------------------------------------------------------------------- 1 | 本节分享的内容是线性回归,局部加权线性回归,**岭回归和lasso**. 2 | 3 | ## 正则化: 4 | 有些情况下无法按照典型回归的方法去训练模型。比如,训练样本数量少,甚至少于样本维数,这样将导致数据矩阵无法求逆;又比如样本特征中存在大量相似的特征,导致很多参数所代表的意义重复。总得来说,就是光靠训练样本进行无偏估计是不好用了。这个时候,我们就应用结构风险最小化的模型选择策略,在经验风险最小化的基础上加入正则化因子。 5 | 通过对损失函数(即优化目标)加入惩罚项,使得训练求解参数过程中会考虑到系数的大小,通过设置缩减系数(惩罚系数),会使得影响较小的特征的系数衰减到0,只保留重要的特征。常用的缩减系数方法有lasso(L1正则化),岭回归(L2正则化)。 6 | > 1-范数: 即向量元素绝对值之和。 7 | 2-范数:Euclid范数(欧几里得范数,常用计算向量长度),即向量元素绝对值的平方和再开方。 8 | 9 | ## lasso: 10 | 当正则化因子选择为模型参数的一范数的时候,那就是lasso回归了。lasso回归相比于岭回归,会比较极端。它不仅可以解决过拟合问题,而且可以在参数缩减过程中,将一些重复的没必要的参数直接缩减为零,也就是完全减掉了。这可以达到**提取有用特征**的作用。但是lasso回归的计算过程复杂,毕竟一范数不是连续可导的。 11 | ## 岭回归: 12 | 当正则化因子选择为模型参数的二范数的时候,整个回归的方法就叫做岭回归。为什么叫“岭”回归呢?这是因为按照这种方法求取参数的解析解的时候,最后的表达式是在原来的基础上在求逆矩阵内部加上一个对角矩阵,就好像一条“岭”一样。加上这条岭以后,原来不可求逆的数据矩阵就可以求逆了。 13 | ### 那么岭回归是如何解决过拟合的问题呢? 14 | 岭回归用于控制模型系数的大小来**防止过度拟合**。岭回归通过在损失函数中加入模型参数的L2正则项以平衡数据的拟合和系数的大小。 L2范数是指向量各元素的平方和然后求平方根。我们让L2范数的规则项||W||2最小,可以使得W的每个元素都很小,都接近于0,但与L1范数不同,它不会让它等于0,而是接近于0,而越小的参数说明模型越简单,越简单的模型则越不容易产生过拟合现象。 15 | 对角矩阵其实是由一个参数lamda和单位对角矩阵相乘组成。lamda越大,说明偏差就越大,原始数据对回归求取参数的作用就越小,当lamda取到一个合适的值,就能在一定意义上解决过拟合的问题:原先过拟合的特别大或者特别小的参数会被约束到正常甚至很小的值,但不会为零。 16 | -------------------------------------------------------------------------------- /8.回归/saLR.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | import numpy as np 5 | import pandas as pd 6 | import random 7 | import matplotlib as mpl 8 | import matplotlib.pyplot as plt 9 | plt.rcParams['font.sans-serif']=['simhei'] #显示中文 10 | plt.rcParams['axes.unicode_minus']=False #用来正常显示负号 11 | 12 | import os 13 | os.chdir('C:\\Users\\SA\\Documents\\machine learning\\code\\Ch08') 14 | 15 | ex0 = pd.read_csv('ex0.txt',sep='\t',header=None) 16 | 17 | ex0.head() 18 | 19 | 20 | # In[4]: 21 | 22 | 23 | ex0.shape 24 | 25 | 26 | # In[5]: 27 | 28 | 29 | ex0.describe() 30 | 31 | 32 | # # 获取特征矩阵和标签列 33 | 34 | # In[6]: 35 | 36 | 37 | def get_Mat(dataSet): 38 | xMat = np.mat(dataSet.iloc[:,:-1].values) 39 | yMat = np.mat(dataSet.iloc[:,-1].values).T 40 | return xMat,yMat 41 | 42 | 43 | # In[7]: 44 | 45 | 46 | xMat,yMat = get_Mat(ex0) 47 | 48 | 49 | # In[8]: 50 | 51 | 52 | xMat[:10] 53 | 54 | 55 | # In[9]: 56 | 57 | 58 | yMat[:10] 59 | 60 | 61 | # # 数据可视化 62 | 63 | # In[10]: 64 | 65 | 66 | def plotShow(dataSet): 67 | xMat,yMat=get_Mat(dataSet) 68 | plt.scatter(xMat.A[:,1],yMat.A,c='b',s=5) 69 | plt.show() 70 | 71 | 72 | # In[11]: 73 | 74 | 75 | len(xMat.A[:,1]) 76 | 77 | 78 | # In[12]: 79 | 80 | 81 | plotShow(ex0) 82 | 83 | def standRegres(dataSet): 84 | xMat,yMat =get_Mat(dataSet) 85 | xTx = xMat.T*xMat 86 | if np.linalg.det(xTx)==0: #计算矩阵的行列式 87 | print('矩阵为奇异矩阵,无法求逆') 88 | return 89 | ws=xTx.I*(xMat.T*yMat) #xTx.I 求逆 90 | return ws 91 | 92 | 93 | # In[19]: 94 | 95 | 96 | ws = standRegres(ex0) 97 | 98 | 99 | # In[21]: 100 | 101 | 102 | def plotReg(dataSet): 103 | xMat,yMat=get_Mat(dataSet) 104 | plt.scatter(xMat.A[:,1],yMat.A,c='b',s=5) 105 | ws = standRegres(dataSet) 106 | yHat = xMat*ws 107 | plt.plot(xMat[:,1],yHat,c='r') 108 | plt.show() 109 | 110 | 111 | # In[22]: 112 | 113 | 114 | plotReg(ex0) 115 | 116 | 117 | # In[23]: 118 | 119 | 120 | xMat,yMat =get_Mat(ex0) 121 | ws =standRegres(ex0) 122 | yHat = xMat*ws 123 | np.corrcoef(yHat.T,yMat.T) 124 | 125 | 126 | # In[24]: 127 | 128 | 129 | v = np.vstack((yHat.T,yMat.T)) 130 | np.corrcoef(v) 131 | 132 | 133 | # # 二,局部加权线性回归 134 | # ## $ SSE=(y-Xw)^TM(y-Xw) $ 135 | # ## $ M(i,i)=exp\{\frac{|x^i-x|^2}{-2k^2}\} $ 136 | # 137 | 138 | # In[25]: 139 | 140 | 141 | #高斯核函数的图像 142 | xMat,yMat = get_Mat(ex0) 143 | x=0.5 144 | xi = np.arange(0,1.0,0.01) 145 | k1,k2,k3=0.5,0.1,0.01 146 | m1 = np.exp((xi-x)**2/(-2*k1**2)) 147 | m2 = np.exp((xi-x)**2/(-2*k2**2)) 148 | m3 = np.exp((xi-x)**2/(-2*k3**2)) 149 | #创建画布 150 | fig = plt.figure(figsize=(6,8),dpi=120) 151 | #子画布1,原始数据集 152 | fig1 = fig.add_subplot(411) 153 | plt.scatter(xMat.A[:,1],yMat.A,c='b',s=5) 154 | #子画布2,k=0.5 155 | fig2 = fig.add_subplot(412) 156 | plt.plot(xi,m1,color='r') 157 | plt.legend(['k = 0.5']) 158 | #子画布3,k=0.1 159 | fig3 = fig.add_subplot(413) 160 | plt.plot(xi,m2,color='g') 161 | plt.legend(['k = 0.1']) 162 | #子画布4,k=0.01 163 | fig4 = fig.add_subplot(414) 164 | plt.plot(xi,m3,color='orange') 165 | plt.legend(['k = 0.01']) 166 | plt.show() 167 | 168 | 169 | # ## $ M(i,i)=exp\{\frac{|x^i-x|^2}{-2k^2}\} $ 170 | # ## $ \hat{w}=(X^TMX)^{-1}X^TMy $ 171 | # ## $ \hat{y}=X\cdot\hat{w} $ 172 | 173 | # In[26]: 174 | 175 | 176 | def LWLR(testMat,xMat,yMat,k=1.0): 177 | n=testMat.shape[0] 178 | m=xMat.shape[0] 179 | weights =np.mat(np.eye(m)) #生成mxm的对角矩阵 180 | yHat = np.zeros(n) #生成n个0的数组 181 | for i in range(n): 182 | for j in range(m): 183 | diffMat = testMat[i]-xMat[j] # 测试集和训练集距离越近,权重越大 184 | weights[j,j]=np.exp(diffMat*diffMat.T/(-2*k**2)) 185 | xTx = xMat.T*(weights*xMat) 186 | if np.linalg.det(xTx)==0: 187 | print('矩阵为奇异矩阵,不能求逆') 188 | return 189 | ws = xTx.I*(xMat.T*(weights*yMat)) 190 | yHat[i]= testMat[i]*ws 191 | return ws,yHat 192 | 193 | 194 | # In[27]: 195 | 196 | 197 | np.argsort([2,1,3]) 198 | 199 | 200 | # In[28]: 201 | 202 | 203 | xMat,yMat = get_Mat(ex0) 204 | srtInd = xMat[:,1].argsort(0) 205 | xSort=xMat[srtInd][:,0] 206 | xMat[srtInd].shape 207 | 208 | 209 | # In[29]: 210 | 211 | 212 | xSort 213 | 214 | 215 | # In[30]: 216 | 217 | 218 | #计算不同k取值下的y估计值yHat 219 | ws1,yHat1 = LWLR(xMat,xMat,yMat,k=1.0) 220 | ws2,yHat2 = LWLR(xMat,xMat,yMat,k=0.01) 221 | ws3,yHat3 = LWLR(xMat,xMat,yMat,k=0.003) 222 | 223 | 224 | # In[31]: 225 | 226 | 227 | #创建画布 228 | fig = plt.figure(figsize=(6,8),dpi=120) 229 | #子图1绘制k=1.0的曲线 230 | fig1=fig.add_subplot(311) #将画布分割成3行1列,图像画在从左到右从上到下的第1块 231 | plt.scatter(xMat[:,1].A,yMat.A,c='b',s=2) 232 | plt.plot(xSort[:,1],yHat1[srtInd],linewidth=1,color='r') 233 | #plt.plot(xMat[:,1],yHat1,c='r') 234 | plt.title('局部加权回归曲线,k=1.0',size=10,color='r') 235 | #子图2绘制k=0.01的曲线 236 | fig2=fig.add_subplot(312) 237 | plt.scatter(xMat[:,1].A,yMat.A,c='b',s=2) 238 | plt.plot(xSort[:,1],yHat2[srtInd],linewidth=1,color='r') 239 | plt.title('局部加权回归曲线,k=0.01',size=10,color='r') 240 | #子图3绘制k=0.003的曲线 241 | fig3=fig.add_subplot(313) 242 | plt.scatter(xMat[:,1].A,yMat.A,c='b',s=2) 243 | plt.plot(xSort[:,1],yHat3[srtInd],linewidth=1,color='r') 244 | plt.title('局部加权回归曲线,k=0.003',size=10,color='r') 245 | #调整子图的间距 246 | plt.tight_layout(pad=1.2) 247 | plt.show() 248 | 249 | 250 | # In[32]: 251 | 252 | 253 | fig = plt.figure(figsize=(6,3),dpi=100) 254 | plt.scatter(xMat[:,1].A,yMat.A,c='b',s=2) 255 | plt.plot(xMat[:,1],yHat2,linewidth=1,color='r') 256 | plt.title('局部加权回归曲线,k=0.01',size=10,color='r') 257 | plt.show() 258 | 259 | #四种模型相关系数比较 260 | np.corrcoef(yHat.T,yMat.T) #最小二乘法 261 | 262 | 263 | # In[36]: 264 | 265 | 266 | np.corrcoef(yHat1,yMat.T) #k=1.0模型 267 | 268 | 269 | # In[37]: 270 | 271 | 272 | np.corrcoef(yHat2,yMat.T) #k=0.01模型 273 | 274 | 275 | # In[38]: 276 | 277 | 278 | np.corrcoef(yHat3,yMat.T) #k=0.003模型,过拟合 279 | 280 | 281 | # # 三,预测鲍鱼的年龄 282 | 283 | # In[39]: 284 | 285 | 286 | abalone = pd.read_csv('abalone.txt',sep='\t',header=None) 287 | abalone.columns=['性别','长度','直径','高度','整体重量','肉重量','内脏重量','壳重','年龄'] 288 | 289 | 290 | # In[40]: 291 | 292 | 293 | abalone.head() 294 | 295 | 296 | # In[41]: 297 | 298 | 299 | abalone.shape 300 | 301 | 302 | # In[42]: 303 | 304 | 305 | abalone.info() 306 | 307 | 308 | # In[43]: 309 | 310 | 311 | abalone.describe() 312 | 313 | 314 | # ## 数据可视化 315 | 316 | # In[44]: 317 | 318 | 319 | mpl.cm.rainbow(np.linspace(0, 1, 10)) 320 | 321 | 322 | # In[45]: 323 | 324 | 325 | def dataPlot(dataSet): 326 | m,n=dataSet.shape 327 | fig = plt.figure(figsize=(8,20),dpi=100) 328 | colormap = mpl.cm.rainbow(np.linspace(0, 1, n)) 329 | for i in range(n): 330 | fig_ = fig.add_subplot(n,1,i+1) 331 | plt.scatter(range(m),dataSet.iloc[:,i].values,s=2,c=colormap[i]) 332 | plt.title(dataSet.columns[i]) 333 | plt.tight_layout(pad=1.2) 334 | 335 | 336 | # In[46]: 337 | 338 | 339 | dataPlot(abalone) 340 | 341 | 342 | # In[47]: 343 | 344 | 345 | #剔除高度特征中≥0.4的异常值 346 | aba = abalone.loc[abalone['高度']<0.4,:] 347 | dataPlot(aba) 348 | 349 | 350 | # In[48]: 351 | 352 | 353 | def randSplit(dataSet,rate): 354 | l = list(dataSet.index) 355 | random.shuffle(l) 356 | dataSet.index = l 357 | m = dataSet.shape[0] 358 | n = int(m*rate) 359 | train = dataSet.loc[range(n),:] 360 | test = dataSet.loc[range(n,m),:] 361 | test.index=range(test.shape[0]) 362 | dataSet.index =range(dataSet.shape[0]) 363 | return train,test 364 | 365 | 366 | # In[49]: 367 | 368 | 369 | train,test = randSplit(aba,0.8) 370 | 371 | 372 | # In[50]: 373 | 374 | 375 | train.head() 376 | 377 | 378 | # In[51]: 379 | 380 | 381 | train.shape 382 | 383 | 384 | # In[52]: 385 | 386 | 387 | test.shape 388 | 389 | 390 | # In[53]: 391 | 392 | 393 | dataPlot(train) 394 | 395 | 396 | # In[54]: 397 | 398 | 399 | dataPlot(test) 400 | 401 | 402 | # ## 计算误差平方和ESS 403 | 404 | # In[55]: 405 | 406 | 407 | def essCal(dataSet, regres): 408 | xMat,yMat = get_Mat(dataSet) 409 | ws = regres(dataSet) 410 | yHat = xMat*ws 411 | ess = ((yMat.A.flatten() - yHat.A.flatten())**2).sum() 412 | return ess 413 | 414 | 415 | # In[56]: 416 | 417 | 418 | essCal(ex0, standRegres) 419 | 420 | 421 | # ## 计算 $ R^2 $ 422 | 423 | # In[57]: 424 | 425 | 426 | def rSquare(dataSet,regres): 427 | xMat,yMat=get_Mat(dataSet) 428 | ess = essCal(dataSet,regres) 429 | tss = ((yMat.A-yMat.mean())**2).sum() 430 | r2 = 1 - ess / tss 431 | return r2 432 | 433 | 434 | # In[58]: 435 | 436 | 437 | rSquare(ex0, standRegres) 438 | 439 | 440 | # ## 构建加权线性模型 441 | 442 | # In[59]: 443 | 444 | 445 | def essPlot(train,test): 446 | X0,Y0 = get_Mat(train) 447 | X1,Y1 =get_Mat(test) 448 | train_ess = [] 449 | test_ess = [] 450 | for k in np.arange(0.2,10,0.5): 451 | ws1,yHat1 = LWLR(X0[:99],X0[:99],Y0[:99],k) 452 | ess1 = ((Y0[:99].A.T - yHat1)**2).sum() 453 | train_ess.append(ess1) 454 | 455 | ws2,yHat2 = LWLR(X1[:99],X0[:99],Y0[:99],k) 456 | ess2 = ((Y1[:99].A.T - yHat2)**2).sum() 457 | test_ess.append(ess2) 458 | 459 | plt.plot(np.arange(0.2,10,0.5),train_ess,color='b') 460 | plt.plot(np.arange(0.2,10,0.5),test_ess,color='r') 461 | plt.xlabel('不同k取值') 462 | plt.ylabel('ESS') 463 | plt.legend(['train_ess','test_ess']) 464 | 465 | 466 | # In[60]: 467 | 468 | 469 | essPlot(train,test) 470 | 471 | 472 | # In[61]: 473 | 474 | 475 | train,test = randSplit(aba,0.8) 476 | trainX,trainY = get_Mat(train) 477 | testX,testY = get_Mat(test) 478 | ws0,yHat0 = LWLR(testX,trainX,trainY,k=2) 479 | 480 | 481 | # In[62]: 482 | 483 | 484 | y=testY.A.flatten() 485 | plt.scatter(y,yHat0,c='b',s=5); 486 | 487 | 488 | # In[63]: 489 | 490 | 491 | def LWLR_pre(dataSet): 492 | train,test = randSplit(dataSet,0.8) 493 | trainX,trainY = get_Mat(train) 494 | testX,testY = get_Mat(test) 495 | ws,yHat = LWLR(testX,trainX,trainY,k=2) 496 | ess = ((testY.A.T - yHat)**2).sum() 497 | tss = ((testY.A-testY.mean())**2).sum() 498 | r2 = 1 - ess / tss 499 | return ess,r2 500 | 501 | 502 | # In[ ]: 503 | 504 | 505 | LWLR_pre(aba) 506 | 507 | 508 | # # 四,岭回归 509 | 510 | # ## $ \hat{w} = (X^TX+\lambda I )^{-1}X^Ty $ 511 | 512 | # In[ ]: 513 | 514 | 515 | np.eye(5) 516 | 517 | 518 | # In[118]: 519 | 520 | 521 | def ridgeRegres(dataSet, lam=0.2): 522 | xMat,yMat=get_Mat(dataSet) # xMat 行数<列数 523 | xTx = xMat.T * xMat 524 | denom = xTx + np.eye(xMat.shape[1])*lam 525 | ws = denom.I * (xMat.T * yMat) 526 | return ws 527 | 528 | 529 | # In[119]: 530 | 531 | 532 | #回归系数比较 533 | standRegres(aba) #线性回归 534 | 535 | 536 | # In[ ]: 537 | 538 | 539 | ridgeRegres(aba) #岭回归 540 | 541 | 542 | # In[ ]: 543 | 544 | 545 | #相关系数R2比较 546 | rSquare(aba,standRegres) #线性回归 547 | 548 | 549 | # In[ ]: 550 | 551 | 552 | rSquare(aba,ridgeRegres) #岭回归 553 | 554 | 555 | # In[ ]: 556 | 557 | 558 | def ridgeTest(dataSet,k=30): #取30个不同的λ值 559 | xMat,yMat=get_Mat(dataSet) 560 | m,n=xMat.shape 561 | wMat = np.zeros((k,n)) 562 | #特征标准化 563 | yMean = yMat.mean(0) # 0 means 按列计算 564 | xMeans = xMat.mean(0) 565 | xVar = xMat.var(0) 566 | yMat = yMat-yMean 567 | xMat = (xMat-xMeans)/xVar 568 | for i in range(k): 569 | xTx = xMat.T*xMat 570 | lam = np.exp(i-10) 571 | denom = xTx+np.eye(n)*lam #lam增速快 572 | ws=denom.I*(xMat.T*yMat) 573 | wMat[i,:]=ws.T 574 | return wMat 575 | 576 | 577 | # In[ ]: 578 | 579 | 580 | k = np.arange(0,30,1) 581 | lam = np.exp(k-10) 582 | plt.plot(lam); 583 | 584 | 585 | # In[ ]: 586 | 587 | 588 | #回归系数矩阵 589 | wMat = ridgeTest(aba,k=30) 590 | 591 | 592 | # In[ ]: 593 | 594 | 595 | wMat.shape 596 | 597 | 598 | # In[ ]: 599 | 600 | 601 | #绘制岭迹图 602 | plt.plot(np.arange(-10,20,1),wMat) 603 | plt.xlabel('log(λ)') 604 | plt.ylabel('回归系数'); 605 | 606 | 607 | # # 六,lasso 608 | 609 | # In[64]: 610 | 611 | 612 | #lasso是在linear_model下 613 | from sklearn.linear_model import Lasso 614 | 615 | 616 | # In[65]: 617 | 618 | 619 | las = Lasso(alpha = 0.05) #alpha为惩罚系数,值越大惩罚力度越大 620 | las.fit(aba.iloc[:, :-1], aba.iloc[:, -1]) 621 | 622 | 623 | # In[67]: 624 | 625 | 626 | las.coef_ 627 | 628 | 629 | # In[68]: 630 | 631 | 632 | def regularize(xMat,yMat): 633 | inxMat = xMat.copy() #数据拷贝 634 | inyMat = yMat.copy() 635 | yMean = yMat.mean(0) #行与行操作,求均值 636 | inyMat = inyMat - yMean #数据减去均值 637 | xMeans = inxMat.mean(0) #行与行操作,求均值 638 | xVar = inxMat.var(0) #行与行操作,求方差 639 | inxMat = (inxMat - xMeans) / xVar #数据减去均值除以方差实现标准化 640 | return inxMat, inyMat 641 | 642 | 643 | # In[69]: 644 | 645 | 646 | def rssError(yMat, yHat): 647 | ess = ((yMat.A-yHat.A)**2).sum() 648 | return ess 649 | 650 | 651 | # ## 向前逐步回归 652 | 653 | # In[70]: 654 | 655 | 656 | def stageWise(dataSet, eps = 0.01, numIt = 100): 657 | xMat0,yMat0 = get_Mat(dataSet) 658 | xMat,yMat = regularize(xMat0, yMat0) #数据标准化 659 | m, n = xMat.shape 660 | wsMat = np.zeros((numIt, n)) #初始化numIt次迭代的回归系数矩阵 661 | ws = np.zeros((n, 1)) #初始化回归系数矩阵 662 | wsTest = ws.copy() 663 | wsMax = ws.copy() 664 | for i in range(numIt): #迭代numIt次 665 | # print(ws.T) #打印当前回归系数矩阵 666 | lowestError = np.inf #正无穷 667 | for j in range(n): #遍历每个特征的回归系数 668 | for sign in [-1, 1]: 669 | wsTest = ws.copy() 670 | wsTest[j] += eps * sign #微调回归系数 671 | yHat = xMat * wsTest #计算预测值 672 | ess = rssError(yMat, yHat) #计算平方误差 673 | if ess < lowestError: #如果误差更小,则更新当前的最佳回归系数 674 | lowestError = ess 675 | wsMax = wsTest 676 | ws = wsMax.copy() 677 | wsMat[i,:] = ws.T #记录numIt次迭代的回归系数矩阵 678 | return wsMat 679 | 680 | 681 | # In[74]: 682 | 683 | 684 | stageWise(aba, eps = 0.01, numIt = 200) 685 | 686 | 687 | # In[78]: 688 | 689 | 690 | wsMat= stageWise(aba, eps = 0.001, numIt = 5000) 691 | wsMat 692 | 693 | 694 | # In[75]: 695 | 696 | 697 | def standRegres0(dataSet): 698 | xMat0,yMat0 =get_Mat(dataSet) 699 | xMat,yMat = regularize(xMat0, yMat0) #增加标准化这一步 700 | xTx = xMat.T*xMat 701 | if np.linalg.det(xTx)==0: 702 | print('矩阵为奇异矩阵,无法求逆') 703 | return 704 | ws=xTx.I*(xMat.T*yMat) 705 | yHat = xMat*ws 706 | return ws 707 | 708 | 709 | # In[76]: 710 | 711 | 712 | standRegres0(aba).T 713 | 714 | 715 | # In[79]: 716 | 717 | 718 | wsMat[-1] 719 | 720 | 721 | # In[80]: 722 | 723 | 724 | plt.plot(wsMat) 725 | plt.xlabel('迭代次数') 726 | plt.ylabel('回归系数'); 727 | 728 | -------------------------------------------------------------------------------- /9.随机森林/C4.5.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Thu Nov 21 12:15:19 2019 4 | 5 | @author: 刘萌萌 6 | """ 7 | 8 | 9 | from matplotlib.font_manager import FontProperties 10 | import matplotlib.pyplot as plt 11 | from math import log 12 | import operator 13 | 14 | 15 | 16 | def createDataSet(): 17 | dataSet = [[0, 0, 0, 0, 'no'], # 创建数据集 18 | [0, 0, 0, 1, 'no'], 19 | [0, 1, 0, 1, 'yes'], 20 | [0, 1, 1, 0, 'yes'], 21 | [0, 0, 0, 0, 'no'], 22 | [1, 0, 0, 0, 'no'], 23 | [1, 0, 0, 1, 'no'], 24 | [1, 1, 1, 1, 'yes'], 25 | [1, 0, 1, 2, 'yes'], 26 | [1, 0, 1, 2, 'yes'], 27 | [2, 0, 1, 2, 'yes'], 28 | [2, 1, 0, 1, 'yes'], 29 | [2, 1, 0, 2, 'yes'], 30 | [2, 0, 1, 1, 'yes'], 31 | [2, 0, 0, 0, 'no']] 32 | labels = ['年龄', '有工作', '有自己的房子', '信贷情况'] # 分类属性 33 | return dataSet, labels # 返回数据集和分类属性 34 | 35 | 36 | """ 37 | 函数说明:计算给定数据集的经验熵(香农熵) 38 | Parameters: 39 | dataSet - 数据集 40 | Returns: 41 | shannonEnt - 经验熵(香农熵) 42 | """ 43 | 44 | 45 | def calcShannonEnt(dataSet): 46 | namEntices = len(dataSet) # 返回数据集的行数 47 | labelCounts = {} # 保存每个标签(Label)出现次数的字典 48 | for featVec in dataSet: # 对每组特征向量进行统计 49 | currentLabel = featVec[-1] # 提取标签(Label)信息 50 | if currentLabel not in labelCounts.keys(): # 如果标签(Label)没有放入统计次数的字典,添加进去 51 | labelCounts[currentLabel] = 0 52 | labelCounts[currentLabel] += 1 # Label计数 53 | shannonEnt = 0.0 # 经验熵(香农熵) 54 | for key in labelCounts: # 计算香农熵 55 | prob = float(labelCounts[key]) / namEntices # 选择该标签(Label)的概率 56 | shannonEnt -= prob * log(prob, 2) # 利用公式计算 57 | return shannonEnt # 返回经验熵(香农熵) 58 | 59 | 60 | # if __name__ == '__main__': 61 | # dataSet, features = createDataSet() 62 | # print(dataSet) 63 | # print(calcShannonEnt(dataSet)) 64 | 65 | """ 66 | 函数说明:按照给定特征划分数据集 67 | Parameters: 68 | dataSet - 待划分的数据集 69 | axis - 划分数据集的特征 70 | value - 需要返回的特征的值 71 | """ 72 | 73 | 74 | def splitDataSet(dataSet, axis, value): 75 | retDataSet = [] # 创建返回的数据集列表 76 | for featVec in dataSet: # 遍历数据集 77 | if featVec[axis] == value: 78 | reducedFeatVec = featVec[:axis] # 去掉axis特征 79 | reducedFeatVec.extend(featVec[axis + 1:]) # 将符合条件的添加到返回的数据集 80 | retDataSet.append(reducedFeatVec) 81 | return retDataSet # 返回划分后的数据集 82 | 83 | 84 | """ 85 | 函数说明:选择最优特征 86 | Parameters: 87 | dataSet - 数据集 88 | Returns: 89 | bestFeature - 信息增益最大的(最优)特征的索引值 90 | """ 91 | 92 | #C4.5算法过程和ID3算法一样,只是选择特征的方法由信息增益改成信息增益比 93 | def chooseBestFeatureToSplit(dataSet,is_input=False): 94 | numFeatures = len(dataSet[0]) - 1 # 特征数量 95 | baseEntropy = calcShannonEnt(dataSet) # 计算数据集的香农熵 96 | bestInfoGainratio = 0.0 # 信息增益 97 | bestFeature = -1 # 最优特征的索引值 98 | 99 | for i in range(numFeatures): # 遍历所有特征 100 | # 获取dataSet的第i个所有特征 101 | featList = [example[i] for example in dataSet] 102 | uniqueVals = set(featList) # 创建set集合{},元素不可重复 103 | newEntropy = 0.0 # 经验条件熵 104 | splitinfo = 0.0 105 | 106 | for value in uniqueVals: # 计算信息增益 107 | subDataSet = splitDataSet(dataSet, i, value) # subDataSet划分后的子集 108 | prob = len(subDataSet) / float(len(dataSet)) # 计算子集的概率 109 | newEntropy += prob * calcShannonEnt(subDataSet) # 根据公式计算经验条件熵 110 | splitinfo += -prob * log(prob, 2) 111 | infoGain = baseEntropy - newEntropy # 信息增益,g(D,A)=H(D)-H(D|A) 112 | 113 | if splitinfo == 0: # fix the overflow bug 114 | continue 115 | info_gain_ratio = infoGain / splitinfo 116 | # 最大信息增益比 117 | if info_gain_ratio > bestInfoGainratio: 118 | bestInfoGainratio = info_gain_ratio 119 | bestFeature = i 120 | return bestFeature 121 | 122 | 123 | 124 | # if __name__ == '__main__': 125 | # dataSet, features = createDataSet() 126 | # print("最优特征索引值:" + str(chooseBestFeatureToSplit(dataSet, True))) 127 | 128 | """ 129 | 函数说明:统计classList中出现此处最多的元素(类标签) 130 | Parameters: 131 | classList - 类标签列表 132 | Returns: 133 | sortedClassCount[0][0] - 出现此处最多的元素(类标签) 134 | """ 135 | 136 | 137 | def majorityCnt(classList): 138 | classCount = {} 139 | for vote in classList: # 统计classList中每个元素出现的次数 140 | if vote not in classCount.keys(): 141 | classCount[vote] = 0 142 | classCount[vote] += 1 143 | sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True) 144 | # classCount.iteritems()将classCount字典分解为元组列表,operator.itemgetter(1)按照第二个元素的次序对元组进行排序,reverse = True是逆序,即按照从大到小的顺序排列 145 | return sortedClassCount[0][0] # 返回classList中出现次数最多的元素 146 | 147 | 148 | """ 149 | 函数说明:创建决策树 150 | Parameters: 151 | dataSet - 训练数据集 152 | labels - 分类属性标签 153 | featLabels - 存储选择的最优特征标签 154 | Returns: 155 | myTree - 决策树 156 | """ 157 | 158 | 159 | def createTree(dataSet, labels, featLabels): 160 | classList = [example[-1] for example in dataSet] # 取分类标签(是否放贷:yes or no) 161 | if classList.count(classList[0]) == len(classList): # 如果类别完全相同则停止继续划分 162 | return classList[0] 163 | if len(dataSet[0]) == 1 or len(labels) == 0: # 遍历完所有特征时返回出现次数最多的类标签 164 | return majorityCnt(classList) 165 | bestFeat = chooseBestFeatureToSplit(dataSet) # 选择最优特征 166 | bestFeatLabel = labels[bestFeat] # 最优特征的标签 167 | featLabels.append(bestFeatLabel) 168 | myTree = {bestFeatLabel: {}} # 根据最优特征的标签生成树 169 | del (labels[bestFeat]) # 删除已经使用特征标签 170 | featValues = [example[bestFeat] for example in dataSet] # 得到训练集中所有最优特征的属性值 171 | uniqueVals = set(featValues) # 去掉重复的属性值 172 | for value in uniqueVals: # 遍历特征,创建决策树。 173 | subLabels = labels[:] 174 | myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels, featLabels) 175 | return myTree 176 | 177 | 178 | """ 179 | 函数说明:获取决策树叶子结点的数目 180 | Parameters: 181 | myTree - 决策树 182 | Returns: 183 | numLeafs - 决策树的叶子结点的数目 184 | """ 185 | 186 | 187 | def getNumLeafs(myTree): 188 | numLeafs = 0 # 初始化叶子 189 | firstStr = next(iter(myTree)) # 获取决策树结点,python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0] 190 | secondDict = myTree[firstStr] # 获取下一组字典 191 | for key in secondDict.keys(): 192 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 193 | numLeafs += getNumLeafs(secondDict[key]) 194 | else: 195 | numLeafs += 1 196 | return numLeafs 197 | 198 | 199 | """ 200 | 函数说明:获取决策树的层数 201 | Parameters: 202 | myTree - 决策树 203 | Returns: 204 | maxDepth - 决策树的层数 205 | """ 206 | 207 | 208 | def getTreeDepth(myTree): 209 | maxDepth = 0 # 初始化决策树深度 210 | firstStr = next(iter(myTree)) # python3中myTree.keys()返回的是dict_keys,不在是list,所以不能使用myTree.keys()[0]的方法获取结点属性,可以使用list(myTree.keys())[0] 211 | secondDict = myTree[firstStr] # 获取下一个字典 212 | for key in secondDict.keys(): 213 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 214 | thisDepth = 1 + getTreeDepth(secondDict[key]) 215 | else: 216 | thisDepth = 1 217 | if thisDepth > maxDepth: 218 | maxDepth = thisDepth # 更新层数 219 | return maxDepth 220 | 221 | 222 | """ 223 | 函数说明:绘制结点 224 | Parameters: 225 | nodeTxt - 结点名 226 | centerPt - 文本位置 227 | parentPt - 标注的箭头位置 228 | nodeType - 结点格式 229 | """ 230 | 231 | 232 | def plotNode(nodeTxt, centerPt, parentPt, nodeType): 233 | arrow_args = dict(arrowstyle="<-") # 定义箭头格式 234 | font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14) # 设置中文字体 235 | createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction', # 绘制结点 236 | xytext=centerPt, textcoords='axes fraction', 237 | va="center", ha="center", bbox=nodeType, arrowprops=arrow_args, 238 | FontProperties=font) 239 | 240 | 241 | """ 242 | 函数说明:标注有向边属性值 243 | Parameters: 244 | cntrPt、parentPt - 用于计算标注位置 245 | txtString - 标注的内容 246 | """ 247 | 248 | 249 | def plotMidText(cntrPt, parentPt, txtString): 250 | xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0] # 计算标注位置 251 | yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1] 252 | createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30) 253 | 254 | 255 | """ 256 | 函数说明:绘制决策树 257 | Parameters: 258 | myTree - 决策树(字典) 259 | parentPt - 标注的内容 260 | nodeTxt - 结点名 261 | """ 262 | 263 | 264 | def plotTree(myTree, parentPt, nodeTxt): 265 | decisionNode = dict(boxstyle="sawtooth", fc="0.8") # 设置结点格式 266 | leafNode = dict(boxstyle="round4", fc="0.8") # 设置叶结点格式 267 | numLeafs = getNumLeafs(myTree) # 获取决策树叶结点数目,决定了树的宽度 268 | depth = getTreeDepth(myTree) # 获取决策树层数 269 | firstStr = next(iter(myTree)) # 下个字典 270 | cntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff) # 中心位置 271 | plotMidText(cntrPt, parentPt, nodeTxt) # 标注有向边属性值 272 | plotNode(firstStr, cntrPt, parentPt, decisionNode) # 绘制结点 273 | secondDict = myTree[firstStr] # 下一个字典,也就是继续绘制子结点 274 | plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalD # y偏移 275 | for key in secondDict.keys(): 276 | if type(secondDict[key]).__name__ == 'dict': # 测试该结点是否为字典,如果不是字典,代表此结点为叶子结点 277 | plotTree(secondDict[key], cntrPt, str(key)) # 不是叶结点,递归调用继续绘制 278 | else: # 如果是叶结点,绘制叶结点,并标注有向边属性值 279 | plotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalW 280 | plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode) 281 | plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key)) 282 | plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD 283 | 284 | 285 | """ 286 | 函数说明:创建绘制面板 287 | Parameters: 288 | inTree - 决策树(字典) 289 | Returns: 290 | 无 291 | """ 292 | 293 | 294 | def createPlot(inTree): 295 | fig = plt.figure(1, facecolor='white') # 创建fig 296 | fig.clf() # 清空fig 297 | axprops = dict(xticks=[], yticks=[]) 298 | createPlot.ax1 = plt.subplot(111, frameon=False, **axprops) # 去掉x、y轴 299 | plotTree.totalW = float(getNumLeafs(inTree)) # 获取决策树叶结点数目 300 | plotTree.totalD = float(getTreeDepth(inTree)) # 获取决策树层数 301 | plotTree.xOff = -0.5 / plotTree.totalW; 302 | plotTree.yOff = 1.0; # x偏移 303 | plotTree(inTree, (0.5, 1.0), '') # 绘制决策树 304 | plt.show() # 显示绘制结果 305 | 306 | 307 | if __name__ == '__main__': 308 | dataSet, labels = createDataSet() 309 | featLabels = [] 310 | myTree = createTree(dataSet, labels, featLabels) 311 | print(myTree) 312 | createPlot(myTree) 313 | 314 | 315 | -------------------------------------------------------------------------------- /9.随机森林/CART.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | """ 4 | 5 | CART分类树,是一颗二叉树,以某个特征以及该特征对应的一个值为节点,故相对ID3算法,最大的不同就是特征可以使用多次 6 | 7 | """ 8 | 9 | from collections import Counter, defaultdict 10 | 11 | 12 | 13 | import numpy as np 14 | 15 | 16 | 17 | 18 | 19 | class node: 20 | 21 | def __init__(self, fea=-1, val=None, res=None, right=None, left=None): 22 | 23 | self.fea = fea # 特征 24 | 25 | self.val = val # 特征对应的值 26 | 27 | self.res = res # 叶节点标记 28 | 29 | self.right = right 30 | 31 | self.left = left 32 | 33 | 34 | 35 | 36 | 37 | class CART_CLF: 38 | 39 | def __init__(self, epsilon=1e-3, min_sample=1): 40 | 41 | self.epsilon = epsilon 42 | 43 | self.min_sample = min_sample # 叶节点含有的最少样本数 44 | 45 | self.tree = None 46 | 47 | 48 | 49 | def getGini(self, y_data): 50 | 51 | # 计算基尼指数 52 | 53 | c = Counter(y_data) 54 | 55 | return 1 - sum([(val / y_data.shape[0]) ** 2 for val in c.values()]) 56 | 57 | 58 | 59 | def getFeaGini(self, set1, set2): 60 | 61 | # 计算某个特征及相应的某个特征值组成的切分节点的基尼指数 62 | 63 | num = set1.shape[0] + set2.shape[0] 64 | 65 | return set1.shape[0] / num * self.getGini(set1) + set2.shape[0] / num * self.getGini(set2) 66 | 67 | 68 | 69 | def bestSplit(self, splits_set, X_data, y_data): 70 | 71 | # 返回所有切分点的基尼指数,以字典形式存储。键为split,是一个元组,第一个元素为最优切分特征,第二个为该特征对应的最优切分值 72 | 73 | pre_gini = self.getGini(y_data) 74 | 75 | subdata_inds = defaultdict(list) # 切分点以及相应的样本点的索引 76 | 77 | for split in splits_set: 78 | 79 | for ind, sample in enumerate(X_data): 80 | 81 | if sample[split[0]] == split[1]: 82 | 83 | subdata_inds[split].append(ind) 84 | 85 | min_gini = 1 86 | 87 | best_split = None 88 | 89 | best_set = None 90 | 91 | for split, data_ind in subdata_inds.items(): 92 | 93 | set1 = y_data[data_ind] # 满足切分点的条件,则为左子树 94 | 95 | set2_inds = list(set(range(y_data.shape[0])) - set(data_ind)) 96 | 97 | set2 = y_data[set2_inds] 98 | 99 | if set1.shape[0] < 1 or set2.shape[0] < 1: 100 | 101 | continue 102 | 103 | now_gini = self.getFeaGini(set1, set2) 104 | 105 | if now_gini < min_gini: 106 | 107 | min_gini = now_gini 108 | 109 | best_split = split 110 | 111 | best_set = (data_ind, set2_inds) 112 | 113 | if abs(pre_gini - min_gini) < self.epsilon: # 若切分后基尼指数下降未超过阈值则停止切分 114 | 115 | best_split = None 116 | 117 | return best_split, best_set, min_gini 118 | 119 | 120 | 121 | def buildTree(self, splits_set, X_data, y_data): 122 | 123 | if y_data.shape[0] < self.min_sample: # 数据集小于阈值直接设为叶节点 124 | 125 | return node(res=Counter(y_data).most_common(1)[0][0]) 126 | 127 | best_split, best_set, min_gini = self.bestSplit(splits_set, X_data, y_data) 128 | 129 | if best_split is None: # 基尼指数下降小于阈值,则终止切分,设为叶节点 130 | 131 | return node(res=Counter(y_data).most_common(1)[0][0]) 132 | 133 | else: 134 | 135 | splits_set.remove(best_split) 136 | 137 | left = self.buildTree(splits_set, X_data[best_set[0]], y_data[best_set[0]]) 138 | 139 | right = self.buildTree(splits_set, X_data[best_set[1]], y_data[best_set[1]]) 140 | 141 | return node(fea=best_split[0], val=best_split[1], right=right, left=left) 142 | 143 | 144 | 145 | def fit(self, X_data, y_data): 146 | 147 | # 训练模型,CART分类树与ID3最大的不同是,CART建立的是二叉树,每个节点是特征及其对应的某个值组成的元组 148 | 149 | # 特征可以多次使用 150 | 151 | splits_set = [] 152 | 153 | for fea in range(X_data.shape[1]): 154 | 155 | unique_vals = np.unique(X_data[:, fea]) 156 | 157 | if unique_vals.shape[0] < 2: 158 | 159 | continue 160 | 161 | elif unique_vals.shape[0] == 2: # 若特征取值只有2个,则只有一个切分点,非此即彼 162 | 163 | splits_set.append((fea, unique_vals[0])) 164 | 165 | else: 166 | 167 | for val in unique_vals: 168 | 169 | splits_set.append((fea, val)) 170 | 171 | self.tree = self.buildTree(splits_set, X_data, y_data) 172 | 173 | return 174 | 175 | 176 | 177 | def predict(self, x): 178 | 179 | def helper(x, tree): 180 | 181 | if tree.res is not None: # 表明到达叶节点 182 | 183 | return tree.res 184 | 185 | else: 186 | 187 | if x[tree.fea] == tree.val: # "是" 返回左子树 188 | 189 | branch = tree.left 190 | 191 | else: 192 | 193 | branch = tree.right 194 | 195 | return helper(x, branch) 196 | 197 | 198 | 199 | return helper(x, self.tree) 200 | 201 | 202 | 203 | def disp_tree(self): 204 | 205 | # 打印树 206 | 207 | self.disp_helper(self.tree) 208 | 209 | return 210 | 211 | 212 | 213 | def disp_helper(self, current_node): 214 | 215 | # 前序遍历 216 | 217 | print(current_node.fea, current_node.val, current_node.res) 218 | 219 | if current_node.res is not None: 220 | 221 | return 222 | 223 | self.disp_helper(current_node.left) 224 | 225 | self.disp_helper(current_node.right) 226 | 227 | return 228 | 229 | 230 | 231 | 232 | 233 | if __name__ == '__main__': 234 | 235 | from sklearn.datasets import load_iris 236 | 237 | 238 | 239 | X_data = load_iris().data 240 | 241 | y_data = load_iris().target 242 | 243 | 244 | 245 | from machine_learning_algorithm.cross_validation import validate 246 | 247 | 248 | 249 | g = validate(X_data, y_data, ratio=0.2) 250 | 251 | for item in g: 252 | 253 | X_data_train, y_data_train, X_data_test, y_data_test = item 254 | 255 | clf = CART_CLF() 256 | 257 | clf.fit(X_data_train, y_data_train) 258 | 259 | score = 0 260 | 261 | for X, y in zip(X_data_test,y_data_test): 262 | 263 | if clf.predict(X) == y: 264 | 265 | score += 1 266 | 267 | print(score / len(y_data_test)) 268 | 269 | 270 | 271 | #原文链接:https://blog.csdn.net/slx_share/article/details/79992846 -------------------------------------------------------------------------------- /9.随机森林/ID3,C4.5,CART理论部分.DOC: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/9.随机森林/ID3,C4.5,CART理论部分.DOC -------------------------------------------------------------------------------- /9.随机森林/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /9.随机森林/随机森林.py: -------------------------------------------------------------------------------- 1 | from numpy import inf 2 | from numpy import zeros 3 | import numpy as np 4 | from sklearn.model_selection import train_test_split 5 | 6 | 7 | # 生成数据集。数据集包括标签,全包含在返回值的dataset上 8 | def get_Datasets(): 9 | from sklearn.datasets import make_classification 10 | dataSet, classLabels = make_classification(n_samples=200, n_features=100, n_classes=2) 11 | return np.concatenate((dataSet, classLabels.reshape((-1, 1))), axis=1) 12 | print(dataSet.shape, classLables.shape) 13 | 14 | ####################步骤一############################## 15 | # 构建n个子集 bootstrap 16 | def get_subsamples(dataSet, n): 17 | subDataSet = [] 18 | for i in range(n): 19 | index = [] # 每次都重新选择k个 索引 20 | for k in range(len(dataSet)): # 长度是k 21 | index.append(np.random.randint(len(dataSet))) # (0,len(dataSet)) 内的一个整数 22 | subDataSet.append(dataSet[index, :]) 23 | return subDataSet 24 | ############################步骤二、三################################ 25 | # 根据某个特征及值对数据进行分类 26 | def binSplitDataSet(dataSet, feature, value): 27 | mat0 = dataSet[np.nonzero(dataSet[:, feature] > value)[0], :] 28 | mat1 = dataSet[np.nonzero(dataSet[:, feature] < value)[0], :] 29 | return mat0, mat1 30 | 31 | 32 | # 计算方差,回归时使用 33 | def regErr(dataSet): 34 | return np.var(dataSet[:, -1]) * np.shape(dataSet)[0] 35 | 36 | 37 | # 计算平均值,回归时使用 38 | def regLeaf(dataSet): 39 | return np.mean(dataSet[:, -1]) 40 | 41 | #将分类结果最多的分类作为最终结果 42 | def MostNumber(dataSet): 43 | # number=set(dataSet[:,-1]) 44 | #np.nonzero函数是numpy中用于得到数组array中非零元素的位置(数组索引)的函数。 45 | len0 = len(np.nonzero(dataSet[:, -1] == 0)[0]) 46 | len1 = len(np.nonzero(dataSet[:, -1] == 1)[0]) 47 | if len0 > len1: 48 | return 0 49 | else: 50 | return 1 51 | 52 | 53 | # 计算基尼指数 一个随机选中的样本在子集中被分错的可能性 是被选中的概率乘以被分错的概率 54 | def gini(dataSet): 55 | corr = 0.0 56 | for i in set(dataSet[:, -1]): # i 是这个特征下的 某个特征值 57 | corr += (len(np.nonzero(dataSet[:, -1] == i)[0]) / len(dataSet)) ** 2 58 | return 1 - corr 59 | 60 | #在每个节点进行分类时选择一个最优的属性进行分裂 61 | def select_best_feature(dataSet, m, alpha="huigui"): 62 | f = dataSet.shape[1] # 看这个数据集有多少个特征,即f个 63 | index = [] 64 | bestS = inf; 65 | bestfeature = 0; 66 | bestValue = 0; 67 | if alpha == "huigui": 68 | S = regErr(dataSet) 69 | else: 70 | S = gini(dataSet) 71 | 72 | for i in range(m): 73 | index.append(np.random.randint(f)) # 在f个特征里随机选择m个特征,然后在这m个特征里选择一个合适的分类特征。 74 | 75 | for feature in index: 76 | for splitVal in set(dataSet[:, feature]): # set() 函数创建一个无序不重复元素集,用于遍历这个特征下所有的值 77 | mat0, mat1 = binSplitDataSet(dataSet, feature, splitVal) 78 | if alpha == "huigui": 79 | newS = regErr(mat0) + regErr(mat1) # 计算每个分支的回归方差 80 | else: 81 | newS = gini(mat0) + gini(mat1) # 计算基尼系数 82 | if bestS > newS: 83 | bestfeature = feature 84 | bestValue = splitVal 85 | bestS = newS 86 | if (S - bestS) < 0.001 and alpha == "huigui": # 对于回归来说,方差足够小了,那就取这个分支的均值 87 | return None, regLeaf(dataSet) 88 | elif (S - bestS) < 0.001: 89 | return None, MostNumber(dataSet) # 对于分类来说,被分错率足够小了,那这个分支的分类就是大多数所在的类。 90 | # mat0,mat1=binSplitDataSet(dataSet,feature,splitVal) 91 | return bestfeature, bestValue 92 | 93 | # 实现决策树,使用20个特征,深度为10, 94 | def createTree(dataSet, alpha="huigui", m=20, max_level=10): 95 | bestfeature, bestValue = select_best_feature(dataSet, m, alpha=alpha) 96 | if bestfeature == None: 97 | return bestValue 98 | retTree = {} 99 | max_level -= 1 100 | if max_level < 0: # 控制深度 101 | return regLeaf(dataSet) 102 | retTree['bestFeature'] = bestfeature 103 | retTree['bestVal'] = bestValue 104 | lSet, rSet = binSplitDataSet(dataSet, bestfeature,bestValue) 105 | # lSet是根据特征bestfeature分到左边的向量,rSet是根据特征bestfeature分到右边的向量 106 | retTree['right'] = createTree(rSet, alpha, m, max_level) 107 | retTree['left'] = createTree(lSet, alpha, m, max_level) # 每棵树都是二叉树,往下分类都是一分为二。 108 | return retTree 109 | #########################步骤四################################# 110 | #随机森林 111 | def RondomForest(dataSet, n, alpha="huigui"): 112 | Trees = [] # 设置一个空树集合 113 | for i in range(n): 114 | X_train, X_test, y_train, y_test = train_test_split(dataSet[:, :-1], dataSet[:, -1], test_size=0.33,random_state=42) 115 | X_train = np.concatenate((X_train, y_train.reshape((-1, 1))), axis=1) 116 | Trees.append(createTree(X_train, alpha=alpha)) 117 | return Trees # 生成很多树 118 | 119 | 120 | ################################################################### 121 | 122 | # 预测单个数据样本 如何利用已经训练好的一棵树对单个样本进行 回归或分类 123 | def treeForecast(trees, data, alpha="huigui"): 124 | if alpha == "huigui": 125 | if not isinstance(trees, dict): # isinstance() 函数来判断一个对象是否是一个已知的类型 126 | return float(trees) 127 | 128 | if data[trees['bestFeature']] > trees['bestVal']: # 如果数据的这个特征大于阈值,那就调用左支 129 | if type(trees['left']) == 'float': # 如果左支已经是节点了,就返回数值。如果左支还是字典结构,那就继续调用, 用此支的特征和特征值进行选支。 130 | return trees['left'] 131 | else: 132 | return treeForecast(trees['left'], data, alpha) 133 | else: 134 | if type(trees['right']) == 'float': 135 | return trees['right'] 136 | else: 137 | return treeForecast(trees['right'], data, alpha) 138 | else: 139 | if not isinstance(trees, dict): # 分类和回归是同一道理 140 | return int(trees) 141 | 142 | if data[trees['bestFeature']] > trees['bestVal']: 143 | if type(trees['left']) == 'int': 144 | return trees['left'] 145 | else: 146 | return treeForecast(trees['left'], data, alpha) 147 | else: 148 | if type(trees['right']) == 'int': 149 | return trees['right'] 150 | else: 151 | return treeForecast(trees['right'], data, alpha) 152 | 153 | #对data集合进行预测 调用treeForecast加循环 154 | def createForeCast(trees, test_dataSet, alpha="huigui"): 155 | cm = len(test_dataSet) 156 | yhat = np.mat(zeros((cm, 1))) 157 | for i in range(cm): 158 | yhat[i, 0] = treeForecast(trees, test_dataSet[i, :], alpha) # 159 | return yhat 160 | 161 | 162 | # 随机森林预测 163 | def predictTree(Trees, test_dataSet, alpha="huigui"): # Trees 是已经训练好的随机森林 164 | cm = len(test_dataSet) 165 | yhat = np.mat(zeros((cm, 1))) 166 | for trees in Trees: 167 | yhat += createForeCast(trees, test_dataSet, alpha) # 把每次的预测结果相加 168 | if alpha == "huigui": 169 | yhat /= len(Trees) # 如果是回归的话,每棵树的结果应该是回归值,相加后取平均 170 | else: 171 | for i in range(len(yhat)): # 如果是分类的话,每棵树的结果是一个投票向量,相加后, 172 | # 看每类的投票是否超过半数,超过半数就确定为1 173 | if yhat[i, 0] > len(Trees) / 2: 174 | yhat[i, 0] = 1 175 | else: 176 | yhat[i, 0] = 0 177 | return yhat 178 | 179 | #####调用上述函数###### 180 | if __name__ == '__main__': 181 | dataSet = get_Datasets() 182 | print(dataSet[:, -1].T) # 打印标签,与后面预测值对比 .T其实就是对一个矩阵的转置 183 | RomdomTrees = RondomForest(dataSet, 4, alpha="fenlei") # 这里训练好了很多树的集合,就组成了随机森林。一会一棵一棵的调用。 184 | print("---------------------RomdomTrees------------------------") 185 | test_dataSet = dataSet # 得到数据集和标签 186 | yhat = predictTree(RomdomTrees, test_dataSet, alpha="fenlei") # 调用训练好的那些树。综合结果,得到预测值。 187 | print(yhat.T) 188 | print(dataSet[:, -1].T - yhat.T) 189 | 190 | #原文链接:https: // blog.csdn.net / qq_40514570 / article / details / 92720952 -------------------------------------------------------------------------------- /9.随机森林/随机森林理论部分.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/9.随机森林/随机森林理论部分.docx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 机器学习讨论班 2 | 3 | [![指导教师](https://img.shields.io/badge/%E6%8C%87%E5%AF%BC%E6%95%99%E5%B8%88-%E5%87%A4%E4%B8%BD%E6%B4%B2-blue)](http://tongji.tjufe.edu.cn/info/1069/1217.htm) 4 | ## 日程安排 5 | 主题 | 主讲人 | 时间 6 | :----: | :----: | :----: 7 | [3.决策树](https://github.com/QinY-Stat/Seminar-MachineLearning/tree/master/3.%E5%86%B3%E7%AD%96%E6%A0%91) | [蔡承真](https://github.com/ccz-123) | ~~2019-10-16~~ 8 | [4.朴素贝叶斯](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/4.%E6%9C%B4%E7%B4%A0%E8%B4%9D%E5%8F%B6%E6%96%AF) | [韩琳琳](https://github.com/SA5233) | ~~2019-10-16~~ 9 | [5.Logistic回归](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/5.%E9%80%BB%E8%BE%91%E5%9B%9E%E5%BD%92) | [贾晓磊](https://github.com/dexterlee1993) | ~~2019-10-31~~ 10 | [6.支持向量机](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/6.%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA) | [郭嘉琪](https://github.com/ordinary-precious)、[韩琳琳](https://github.com/SA5233) | ~~2019-10-23~~ 11 | [7.Adaboost](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/7.Adaboost) | [李嘉](https://github.com/lijia2019310)、[范莹莹](https://github.com/Nicefyy) | ~~2019-11-07~~ 12 | [8.回归](https://github.com/QinY-Stat/Seminar-MachineLearning/tree/master/8.%E5%9B%9E%E5%BD%92) | [韩琳琳](https://github.com/SA5233) | ~~2019-11-15~~ 13 | [9.随机森林](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/9.%E9%9A%8F%E6%9C%BA%E6%A3%AE%E6%9E%97) | [刘萌萌](https://github.com/Mengmengliu6)、[王禹童](https://github.com/wangyutong-97) | ~~2019-11-21~~ 14 | 10.Apriori | [郭嘉琪](https://github.com/ordinary-precious) | **2019-11-28** 15 | 11.EM | [蔡承真](https://github.com/ccz-123) | 2019-12-05 16 | [[扩]谱聚类](https://github.com/TUFE-I307/Seminar-MachineLearning/tree/master/%E8%B0%B1%E8%81%9A%E7%B1%BB) | [覃悦](https://github.com/QinY-Stat) | ~~2019-10-24~~ 17 | 18 | ## 学习资料 19 | 一些机器学习领域的经典学习资料. 20 | 虽然这些资料均可在网络上找到,但许多人在入门时较为迷茫,而每本书都有其侧重点, 21 | 故将本人在学习期间接触到的认为较好的资料总结在此,希望能够帮助到后来的师弟师妹. 22 | **同时欢迎大家不断补充自己认为较好的资料,前人栽树,后人乘凉.** 23 | 24 | 资料类型 | 资料名称 | 作者 | 备注说明 25 | :----: | :----: | :----: | :----: | 26 | Book | [《统计学习方法》](https://github.com/QinY-Stat/Seminar-MachineLearning/blob/master/%E5%AD%A6%E4%B9%A0%E8%B5%84%E6%96%99/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0%E6%96%B9%E6%B3%95(%E7%AC%AC1%E7%89%88).pdf) | 李航 | 机器学习经典中文书籍 27 | Book | 《机器学习》| 周志华 | 西瓜书,机器学习经典中文书籍 28 | Book | [《Pattern Recognition and Machine Learning》](https://github.com/QinY-Stat/Seminar-MachineLearning/blob/master/%E5%AD%A6%E4%B9%A0%E8%B5%84%E6%96%99/Pattern%20Recognition%20and%20Machine%20Learning.pdf) | Christopher M. Bishop | PRML,机器学习圣经,适合进阶 29 | Book | [《The Elements of Statistical Learning》](https://github.com/QinY-Stat/Seminar-MachineLearning/blob/master/%E5%AD%A6%E4%B9%A0%E8%B5%84%E6%96%99/The%20Elements%20of%20Statistical%20Learning(2nd).pdf) | Trevor Hastie, et al. | 适合进阶 30 | Mooc | [深度学习工程师](https://mooc.study.163.com/smartSpec/detail/1001319001.htm) | Andrew NG | 内容包括神经网络基础与CNN、RNN 31 | Mooc | [人工智能实践:Tensorflow笔记](https://www.icourse163.org/course/PKU-1002536002) | 曹健 | 学习tensorflow的非常好的教程 32 | -------------------------------------------------------------------------------- /学习资料/Pattern Recognition and Machine Learning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/学习资料/Pattern Recognition and Machine Learning.pdf -------------------------------------------------------------------------------- /学习资料/The Elements of Statistical Learning(2nd).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/学习资料/The Elements of Statistical Learning(2nd).pdf -------------------------------------------------------------------------------- /学习资料/统计学习方法(第1版).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUFE-I307/Seminar-MachineLearning/c637c55a9e411451709908f512cab44cc79665c3/学习资料/统计学习方法(第1版).pdf -------------------------------------------------------------------------------- /谱聚类/README.md: -------------------------------------------------------------------------------- 1 | # 谱聚类(Spectral Clustering) 2 | + 复杂网络与社交网络中的谱聚类算法是一类无监督社区发现算法,通过事先给定社区数目k,将网络的拉普拉斯矩阵(Laplacian Matrix)的前k个最小特征值对于的特征向量作为输入,利用K-MEANS算法进行聚类。 3 | + 该部分仅介绍谱聚类算法的步骤与当社区数目为2时的算法有效性推导 4 | ## 1.算法步骤 5 | **Algorithm : Spectral Clustering** 6 | **Input : 网络G=(V,E), 社区数目k** 7 | **Output : 社区划分** 8 | 1.计算G的邻接矩阵A与度矩阵D; 9 | 2.计算G的拉普拉斯矩阵L=D-A; 10 | 3.计算规范化拉普拉斯矩阵![](http://latex.codecogs.com/gif.latex?L{}'=D^{-1/2}LD^{-1/2}); 11 | 4.计算![](http://latex.codecogs.com/gif.latex?L{}')的特征值与特征向量, 并按特征值的大小排序; 12 | 5.选取![](http://latex.codecogs.com/gif.latex?L{}')的前k个非0最小特征值的特征向量, 记为X=[v1, v2, ..., vk]; 13 | 6.将X作为输入, 使用K-MEANS算法进行聚类; 14 | 7.K-MEANS得到的k个类即为谱聚类算法的结果。 15 | ## 2.谱聚类算法的有效性推导(k=2) 16 | 从上述算法步骤中可以看出,当社区数目为2时,谱聚类算法等价于利用规范化拉普拉斯矩阵![](http://latex.codecogs.com/gif.latex?L{}')的第二小特征值所对应的特征向量(**也称为费德勒向量,Fiedler vector**)中的元素符号来将节点进行分类。 17 | 这里容易产生两个问题: 18 | **1.为什么要使用Fiedler vector?** 19 | **2.为什么使用Fiedler vector中的元素符号来划分节点社区(类别)是一个有效的方案?** 20 | 首先,我们要对图论中的划分准则进行一定的说明,[这里](https://wenku.baidu.com/view/549bfe7a66ec102de2bd960590c69ec3d4bbdb46.html)给出了几种常见的划分准则,此处我们只以比例割集准则为例进行推导。 21 | 首先,假设将网络(图)G=(V,E)分割为k个不相交的子图![](http://latex.codecogs.com/gif.latex?C_{1})、![](http://latex.codecogs.com/gif.latex?C_{2})...![](http://latex.codecogs.com/gif.latex?C_{k}),将这些子图称之为网络的**社区**(**Community**)。将连接不同社区节点的边称为**桥**(**bridge**),将连接![](http://latex.codecogs.com/gif.latex?C_{i})和![](http://latex.codecogs.com/gif.latex?C_{j})的桥的数量记为 22 | ![](http://latex.codecogs.com/gif.latex?W(C_{i},C_{j})=\sum_{i\in\C_{i},j\in\C_{j}}a_{ij}) 23 | 则G的桥的数量为 24 | ![](http://latex.codecogs.com/gif.latex?bridge(C_{1},C_{2},...,C_{k})=\frac{1}{2}\sum_{i=1}^{k}W(C_{i},\bar{C_{i}})) 25 | 其中![](http://latex.codecogs.com/gif.latex?\bar{C_{i}})称为![](http://latex.codecogs.com/gif.latex?C_{i})的补图,即 26 | ![](http://latex.codecogs.com/gif.latex?C_{i}\cup\bar{C_{i}}=G) 27 | 由于此处仅考虑k=2的情况,则此时有![](http://latex.codecogs.com/gif.latex?bridge(C_{1},C_{2})=W(C_{1},C_{2}))。 28 | 根据G的拉普拉斯矩阵L=D-A,对![](http://latex.codecogs.com/gif.latex?\forall\textbf{x}\in\textbf{R}^{n})有 29 | ![](http://latex.codecogs.com/gif.latex?\textbf{x}^{T}\textbf{L}\textbf{x}=\textbf{x}^{T}(\textbf{D}-\textbf{A})\textbf{x}=\textbf{x}^{T}\textbf{D}\textbf{x}-\textbf{x}^{T}\textbf{A}\textbf{x}=\sum_{i=1}^{n}d_{i}x_{i}^{2}-\sum_{i,j=1}^{n}a_{ij}x_{i}x_{j}) 30 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\left[\sum_{i=1}^{n}d_{i}x_{i}^{2}-2\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{i}x_{j}+\sum_{i=1}^{n}d_{i}x_{i}^{2}\right]) 31 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\left[\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{i}^{2}-2\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{i}x_{j}+\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{j}^{2}\right]) 32 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}(x_{i}-x_{j})^{2}) 33 | 此处我们假设网络G在实际中确实存在一个最优的二分方案,若用一个向量来表示该方案,则属于同一社区的节点在该向量中的值与符号应当相等。在比例割集准则下,我们不妨令该向量为列向量,且向量中的元素为 34 | 若![](http://latex.codecogs.com/gif.latex?i\in)![](http://latex.codecogs.com/gif.latex?C_{1}),![](http://latex.codecogs.com/gif.latex?f_{i}=\sqrt{\frac{\left|C_{2}\right|}{\left|C_{1}\right|}});若![](http://latex.codecogs.com/gif.latex?i\in)![](http://latex.codecogs.com/gif.latex?C_{2}),![](http://latex.codecogs.com/gif.latex?f_{i}=-\sqrt{\frac{\left|C_{1}\right|}{\left|C_{2}\right|}})。 35 | 代入上式,则有 36 | ![](http://latex.codecogs.com/gif.latex?f^{T}\textbf{L}f=\frac{1}{2}\sum_{i,j=1}^{n}a_{ij}(f_{i}-f_{j})^{2}) 37 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\sum_{C_{1},C_{2}}a_{ij}(f_{i}-f_{j})^{2}+\frac{1}{2}\sum_{C_{2},C{1}}a_{ij}(f_{i}-f_{j})^{2}) 38 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\sum_{C_{1},C_{2}}a_{ij}\left(\sqrt{\frac{\left|C_{2}\right|}{\left|C_{1}\right|}}+\sqrt{\frac{\left|C_{1}\right|}{\left|C_{2}\right|}}\right)^{2}+\frac{1}{2}\sum_{C_{2},C{1}}a_{ij}\left(-\sqrt{\frac{\left|C_{1}\right|}{\left|C_{2}\right|}}-\sqrt{\frac{\left|C_{2}\right|}{\left|C_{1}\right|}}\right)^{2}) 39 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\sum_{C_{1},C_{2}}a_{ij}\left(\frac{\left|C_{2}\right|}{\left|C_{1}\right|}+\frac{\left|C_{1}\right|}{\left|C_{2}\right|}+2\right)) 40 | ![](http://latex.codecogs.com/gif.latex?=\frac{1}{2}\sum_{C_{1},C_{2}}a_{ij}\left(\frac{\left|C_{2}\right|+\left|C_{1}\right|}{\left|C_{1}\right|}+\frac{\left|C_{1}\right|+\left|C_{2}\right|}{\left|C_{2}\right|}\right)) 41 | ![](http://latex.codecogs.com/gif.latex?=\left(\left|C_{1}\right|+\left|C_{2}\right|\right)\left(\frac{1}{\left|C_{1}\right|}+\frac{1}{\left|C_{2}\right|}\right)\sum_{C_{1},C_{2}}a_{ij}) 42 | ![](http://latex.codecogs.com/gif.latex?=n\left(\frac{1}{\left|C_{1}\right|}+\frac{1}{\left|C_{2}\right|}\right)bridge(C_{1},C_{2})) 43 | 其中![](http://latex.codecogs.com/gif.latex?\sum_{C_{1},C_{2}}a_{ij})表示![](http://latex.codecogs.com/gif.latex?i)在![](http://latex.codecogs.com/gif.latex?C_{1})中、![](http://latex.codecogs.com/gif.latex?j)在![](http://latex.codecogs.com/gif.latex?C_{2})中。应当注意的是,![](http://latex.codecogs.com/gif.latex?f)具有两个性质 44 | **性质1:** ![](http://latex.codecogs.com/gif.latex?\sum_{i=1}^{n}f_{i}=\sum_{C_{1}}f_{i}+\sum_{C_{2}}f_{i}=\left|C_{1}\right|\cdot\sqrt{\frac{\left|C_{2}\right|}{\left|C_{1}\right|}}-\left|C_{2}\right|\cdot\sqrt{\frac{\left|C_{2}\right|}{\left|C_{1}\right|}}=0) 45 | **性质2:** ![](http://latex.codecogs.com/gif.latex?\left||f\right||_{2}=\sum_{i=1}^{n}f_{i}^{2}=\sum_{C_{1}}\frac{\left|C_{2}\right|}{\left|C_{1}\right|}+\sum_{C_{2}}\frac{\left|C_{1}\right|}{\left|C_{2}\right|}=\left|C_{2}\right|+\left|C_{1}\right|=n) 46 | 即![](http://latex.codecogs.com/gif.latex?f^{T}\cdot\textbf{1}=0)且![](http://latex.codecogs.com/gif.latex?\left||f\right||_{2}=n) 47 | 根据**复杂网络社区结构**的定义————“社区内节点联系紧密,社区间节点联系稀疏”,社区划分应当满足: 48 | 1) 最优的社区结构应当使得![](http://latex.codecogs.com/gif.latex?bridge(C_{1},C_{2}))尽可能小; 49 | 2) 为了避免出现仅包含1个节点的小社区,应当控制两个社区的规模(![](http://latex.codecogs.com/gif.latex?\left|C_{1}\right|)和![](http://latex.codecogs.com/gif.latex?\left|C_{2}\right|))的差异不宜过大,即使得![](http://latex.codecogs.com/gif.latex?\frac{1}{\left|C_{1}\right|}+\frac{1}{\left|C_{2}\right|})尽可能小。 50 | 综上所述,要找到最优的社区划分,即找到向量![](http://latex.codecogs.com/gif.latex?f=argminf^{T}\textbf{L}f)且满足![](http://latex.codecogs.com/gif.latex?f^{T}\cdot\textbf{1}=0)与![](http://latex.codecogs.com/gif.latex?\left||f\right||_{2}=n)。 51 | 由于一个连同图G的拉普拉斯矩阵![](http://latex.codecogs.com/gif.latex?\textbf{L})是个半正定矩阵,![](http://latex.codecogs.com/gif.latex?r(\textbf{L})=n-1),且![](http://latex.codecogs.com/gif.latex?\textbf{L1}=0),其中![](http://latex.codecogs.com/gif.latex?\textbf{1}=(1,1,...,1))不满足![](http://latex.codecogs.com/gif.latex?f)的两个性质,所以![](http://latex.codecogs.com/gif.latex?f)即为![](http://latex.codecogs.com/gif.latex?\textbf{L})的第二小特征值对应的特诊向量,即**Fiedler向量**。 52 | ## Reference 53 | [Luxburg U V. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395-416.](https://arxiv.org/abs/0711.0189) 54 | -------------------------------------------------------------------------------- /谱聚类/SpectralClustering.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | import numpy as np 3 | from sklearn.cluster import KMeans 4 | import scipy.linalg as linalg 5 | 6 | 7 | def partition(G, k, normalized=False): 8 | A = nx.to_numpy_array(G) 9 | D = degree_matrix(G) 10 | L = D - A 11 | Dn = np.power(np.linalg.matrix_power(D, -1), 0.5) 12 | L = np.dot(np.dot(Dn, L), Dn) 13 | if normalized: 14 | pass 15 | eigvals, eigvecs = linalg.eig(L) 16 | n = len(eigvals) 17 | 18 | dict_eigvals = dict(zip(eigvals, range(0, n))) 19 | k_eigvals = np.sort(eigvals)[0:k] 20 | eigval_indexs = [dict_eigvals[k] for k in k_eigvals] 21 | k_eigvecs = eigvecs[:, eigval_indexs] 22 | result = KMeans(n_clusters=k).fit_predict(k_eigvecs) 23 | return result 24 | 25 | 26 | def degree_matrix(G): 27 | n = G.number_of_nodes() 28 | V = [node for node in G.nodes()] 29 | D = np.zeros((n, n)) 30 | for i in range(n): 31 | node = V[i] 32 | d_node = G.degree(node) 33 | D[i][i] = d_node 34 | return np.array(D) 35 | 36 | 37 | if __name__ == '__main__': 38 | G = nx.Graph() 39 | node_list = [i for i in range(1, 17)] 40 | edge_list = [(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (3, 6), (4, 5), 41 | (7, 8), (7, 9), (7, 10), (7, 11), (8, 9), (8, 10), (9, 10), 42 | (12, 13), (12, 14), (12, 15), (12, 16), (13, 14), (13, 15), (13, 16), (14, 15), (14, 16), (15, 16), 43 | (5, 7), (1, 8), (1, 12), (2, 13), (7, 13), (8, 12)] 44 | G.add_nodes_from(node_list) 45 | G.add_edges_from(edge_list) 46 | 47 | k = 3 48 | res = partition(G, k) 49 | --------------------------------------------------------------------------------