├── README.md ├── kNN求绝对和相对密度.py ├── 任务打包+价格修改+测试.py ├── 决策树-深度为6.png ├── 决策树分类.py ├── 原始信息.xlsx ├── 多元线性回归.py ├── 测试最佳价格方案.py ├── 用于决策树与回归的密度数据.xlsx ├── 第三问编程数据.xlsx └── 第四问编程数据.xlsx /README.md: -------------------------------------------------------------------------------- 1 | # CUMCMCoding 2017 2 | 3 | ## 一、运行说明 4 | 5 | 1. 环境: 6 | + macOS Sierra 10.12.5 7 | + Python 3.6.2 8 | + IPython 6.2.0 9 | + Anaconda Navigator 1.6.4, spyder 3.2.3 10 | 1. 使用的第三方库:(请先使用pip install 安装) 11 | + matplotlib (2.0.2) 12 | + scikit-learn (0.19.0) 13 | + graphviz (0.8) 14 | + pydot (1.2.3) 15 | + xlrd (1.1.0) 16 | + xlwt (1.3.0) 17 | 1. 为确保程序读取数据无误,若使用命令行执行代码,请先确保进入当前目录。 18 | 1. graphviz可能需要使用brew安装,或其他办法解决路径问题。 19 | 20 | ## 二、py源程序文件说明 21 | 1. 统一说明: 22 | + 程序内含有计时模块,因此程序运行完后会打印运行时间。 23 | + 对于运行时间较长的程序已经在代码开头以注释方式写明大约运行时长。该时间受环境影响。 24 | 1. ***kNN求绝对和相对密度.py*** 25 | + 用到**原始信息.xlsx**的数据,并会把结果在本目录下输出为一个```.xls```文件,该文件会含有两个sheet。 26 | + ```main()```函数中的oldOrNew为0则对已完成任务进行处理。oldOrNew为1则对新任务进行处理。因此运行前先自行修改。 27 | 1. ***决策树分类.py*** 28 | + 用到**用于决策树与回归的密度数据.xlsx**的数据,并会把结果在本目录下输出一个中间文件和**DecideTree.pdf**文件,该文件是决策树的可视化。 29 | + 决策树的默认最大深度(即不剪枝处理)。也可以设置最大深度,在```main()```函数中有相应注释可以修改。 30 | 1. ***多元线性回归.py*** 31 | + 用到**用于决策树与回归的密度数据.xlsx**的数据。 32 | + 结果输出回归系数,以及10折交叉验证的均方差和均方根差。 33 | 1. ***测试最佳价格方案.py*** 34 | + 用到**用于决策树与回归的密度数据.xlsx**的数据。 35 | + 测试方法分两种,一种是给出文件让程度读入并测试,一种是根据前题结果计算价格并测试。默认使用第二种,使用枚举方法获得最佳系数。 36 | + 若只用文件数据测试,则文件格式和读入数据文件格式一致即可,并修改main()中相应注释的代码。 37 | 1. ***任务打包+价格修改+测试.py*** 38 | + 用到**第三问编程数据.xlsx**或**第四问编程数据.xlsx**的数据。 39 | + ```main()```函数中的oldOrNew为0则对已完成任务进行处理。oldOrNew为1则对新任务进行处理,并分别导入不用的xlsx数据。因此运行前先自行修改。 40 | + 对于oldOrNew=1时,若想查看任务不打包时的测试结果,怎在```main()```函数中注释```kNNUpdatePrice(bigTable, k, p/10)```即可。此时输出结果也为多个,但不受p值影响,因此是同一个结果。 41 | 42 | ## 三、xlsx文件说明 43 | 1. ***原始信息.xlsx*** 是对附件1~3的精简处理,里面共对应三个sheet。 44 | 1. ***用于决策树与回归的密度数据.xlsx*** 是 **kNN求绝对和相对密度.py** 程序运行后数据整合的结果。 45 | 1. ***用于决策树与回归的密度数据.xlsx*** 用于生成决策树和多元线性回归,由2点文件整合而成。 46 | 1. ***第三问编程数据.xlsx*** 和 ***第四问编程数据.xlsx*** 也是由 **原始信息.xlsx** 和 **用于决策树与回归的密度数据.xlsx** 整合而成。 47 | 48 | ## 四、日志 49 | 1. 比赛结束当晚9.17初次完成,并用[MaHua](https://github.com/jserme/mahua)转化成```.html```,提交版是```.html```文档。后因考虑到查重所以删除了仓库。 50 | 2. 10.24,省赛结果已出,遗憾省二。重新整理回```.md```文件并commit。 51 | 52 | ## 五、参考资料 53 | >排名不分先后 54 | 1. [python操作Excel读写--使用xlrd - lhj588 - 博客园](http://www.cnblogs.com/lhj588/archive/2012/01/06/2314181.html) 55 | 1. [Python--matplotlib绘图可视化知识点整理 - 潘凌昀的兴趣技术杂货铺 - CSDN博客](http://blog.csdn.net/panda1234lee/article/details/52311593) 56 | 1. [python中NumPy和Pandas工具包中的函数使用笔记(方便自己查找) - baoyan2015的博客 - CSDN博客](http://blog.csdn.net/baoyan2015/article/details/53503073) 57 | 1. [python中的sum函数.sum(axis=1) - yyxayz - 博客园](http://www.cnblogs.com/yyxayz/p/4033736.html) 58 | 1. [python实现根据两点经纬度计算实际距离 - TH_NUM的博客 - CSDN博客](http://blog.csdn.net/TH_NUM/article/details/51841052) 59 | 1. [Python 字典 列表 嵌套 复杂排序大全 - 木木_Ray的专栏 - CSDN博客](http://blog.csdn.net/ray_up/article/details/42084863) 60 | 1. [用Python实现K-近邻算法 - Python - 伯乐在线](http://python.jobbole.com/83794/) 61 | 1. [Python中numpy模块的tile()方法简单说明 - wy的点滴 - CSDN博客](http://blog.csdn.net/wy250229163/article/details/52453201) 62 | 1. [Python数据分析与挖掘实战--读书笔记 - 简书](http://www.jianshu.com/p/597dfcc3b448) 63 | 1. ["RuntimeError: Make sure the Graphviz executables are on your system's path" after installing Graphviz 2.38 | Stackoverflow Help | Query Starter](https://www.questarter.com/q/-quot-runtimeerror-make-sure-the-graphviz-executables-are-on-your-system-39-s-path-quot-after-installing-graphviz-2-38-27_35064304.html) 64 | 1. [Numpy and Scipy Documentation — Numpy and Scipy documentation](https://docs.scipy.org/doc/) 65 | 1. [1.10. Decision Trees — scikit-learn 0.19.0 documentation](http://scikit-learn.org/stable/modules/tree.html#tree-classification) 66 | 1. [sklearn.linear_model.LogisticRegression — scikit-learn 0.19.0 documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) 67 | 1. [用scikit-learn和pandas学习线性回归 - 刘建平Pinard - 博客园](http://www.cnblogs.com/pinard/p/6016029.html) 68 | 1. [Sklearn-train_test_split随机划分训练集和测试集 - Cherzhoucheer的博客 - CSDN博客](http://blog.csdn.net/cherdw/article/details/54881167) 69 | 1. [使用graphviz画关系图](http://freewind.in/posts/1745-use-graphviz-to-draw-relationship/) 70 | 1. [详细记录python的range()函数用法 - xxd - 博客园](http://www.cnblogs.com/buro79xxd/archive/2011/05/23/2054493.html) 71 | 1. [Python如何克隆或复制列表(list)? - 共享笔记](https://gxnotes.com/article/8850.html) 72 | 1. [python - Can't catch mocked exception because it doesn't inherit BaseException - Stack Overflow](https://stackoverflow.com/questions/31713054/cant-catch-mocked-exception-because-it-doesnt-inherit-baseexception) 73 | 1. [Python补充05 字符串格式化 (%操作符) - Vamei - 博客园](http://www.cnblogs.com/vamei/archive/2013/03/12/2954938.html) 74 | 1. [MaHua 在线markdown编辑器](http://mahua.jser.me/) 75 | 1. [Python--matplotlib绘图可视化知识点整理 - 止战 - 博客园](http://www.cnblogs.com/zhizhan/p/5615947.html) 76 | -------------------------------------------------------------------------------- /kNN求绝对和相对密度.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | import numpy as np 5 | import xlrd 6 | import xlwt 7 | import time 8 | from math import * 9 | 10 | ''' 11 | 预计运行时间: 12 | 已完成任务:132.96701199999995s 13 | 新任务:223.91732699999997s 14 | ''' 15 | 16 | '''根据经纬度计算两点实际距离km''' 17 | def CulDisFromLL(pointA, pointB): 18 | radlat1 = radians(pointA[0]) 19 | radlat2 = radians(pointB[0]) 20 | a = radlat1-radlat2 21 | b = radians(pointA[1]) - radians(pointB[1]) 22 | s = 2 * np.arcsin(sqrt(pow(np.sin(a/2),2) + cos(radlat1) * cos(radlat2) * pow(sin(b/2),2))) 23 | earth_radius=6378.137 24 | s = s * earth_radius 25 | if(s < 0): 26 | return -s 27 | else: 28 | return s 29 | 30 | ''' 31 | 读取文件坐标 32 | oldOrNew参数:0:‘已完成任务’,1:‘新任务’ 33 | ''' 34 | def ReadFileLocation(fileName, oldOrNew): 35 | # 已完成任务表 36 | data = xlrd.open_workbook(fileName) 37 | if(oldOrNew == 0): 38 | tableA = data.sheet_by_name('已完成任务') 39 | else: 40 | tableA = data.sheet_by_name('新任务') 41 | WeiList = tableA.col_values(1) #纬度 42 | JingList = tableA.col_values(2) #经度 43 | nrows = tableA.nrows 44 | APointList = [] 45 | for i in range(1, nrows): 46 | tempX = float(WeiList[i]) 47 | tempY = float(JingList[i]) 48 | APointList.append([tempX, tempY]) 49 | 50 | WeiList = [] 51 | JingList = [] 52 | # 会员信息表 53 | tableB = data.sheet_by_name('会员信息') 54 | WeiList = tableB.col_values(1) #纬度 55 | JingList = tableB.col_values(2) #经度 56 | listWeight = [] 57 | nrows = tableB.nrows 58 | BPointList = [] 59 | for i in range(1, nrows): 60 | tempX = float(WeiList[i]) 61 | tempY = float(JingList[i]) 62 | BPointList.append([tempX, tempY]) 63 | return np.array(APointList), np.array(BPointList) 64 | 65 | ''' 66 | 读取文件属性列 67 | weightB参数用于获取会员信息表中附加属性,即kNN中出现频率的量度属性 68 | 3-任务限额 69 | 4-任务开始时间 70 | 5-信誉值 71 | ''' 72 | def ReadFileWeight(fileName, weightB): 73 | # 已完成任务表 74 | data = xlrd.open_workbook(fileName) 75 | # 会员信息表 76 | tableB = data.sheet_by_name('会员信息') 77 | listWeight = tableB.col_values(weightB)[1:] #相对时间 78 | return np.array(listWeight) 79 | 80 | '''默认出现频率衡量度为距离的kNN''' 81 | def AbskNNAlgorithm(X, y, k): 82 | kNN = [] 83 | ySize = y.shape[0] 84 | XSize = X.shape[0] 85 | #对X(已完成任务)中每个元素进行knn 86 | for i in range(XSize): 87 | distances = [] 88 | for j in range(ySize): 89 | distances.append(float(CulDisFromLL(X[i], y[j]))) 90 | distances.sort()#升序 91 | Ksum = 0 92 | for l in range(1, k+1): 93 | Ksum = Ksum + distances[l] 94 | if(Ksum == 0): #为了防止除以0 95 | kNN.append(float('inf')) 96 | continue 97 | kNN.append(k/Ksum) 98 | return kNN 99 | 100 | '''带出现频率衡量度的绝对密度''' 101 | def AbskNNAlgorithmWithWeight(X, y, listWeight, k): 102 | kNN = [] 103 | ySize = y.shape[0] 104 | XSize = X.shape[0] 105 | #对X(已完成任务)中每个元素进行knn 106 | for i in range(XSize): 107 | distances = [] 108 | for j in range(ySize): 109 | distances.append([float(CulDisFromLL(X[i], y[j])), listWeight[j]]) 110 | distances.sort(key=lambda x: x[0])#对距离升序 111 | Ksum = 0 112 | for l in range(k): 113 | Ksum = Ksum + distances[l][1] 114 | kNN.append(k/Ksum) 115 | return kNN 116 | 117 | '''相对密度的分子部分,即对密度求kNN''' 118 | def RelakNN(dataSet, k): 119 | kNN = [] 120 | dataSetSize = dataSet.shape[0] 121 | for i in range(dataSetSize): 122 | distances = [] 123 | for j in range(dataSetSize): 124 | distances.append([float(CulDisFromLL(dataSet[i][0], dataSet[j][0])), dataSet[j][1]]) 125 | distances.sort(key=lambda x: x[0]) 126 | Ksum = 0 127 | for l in range(1, k+1): 128 | Ksum = Ksum + distances[l][1] 129 | kNN.append(Ksum/k) 130 | return kNN 131 | 132 | ''' 133 | 把数据写入xls 134 | oldOrNew参数:0:‘已完成任务’,1:‘新任务’ 135 | ''' 136 | def WriteDataInXls(abskNN, relkNN, oldOrNew): 137 | workbook = xlwt.Workbook() 138 | sheet1 = workbook.add_sheet("绝对密度") 139 | sheet1.write(0, 0, "绝对任务密度") 140 | sheet1.write(0, 1, "绝对会员密度") 141 | sheet1.write(0, 2, "绝对限额密度") 142 | sheet1.write(0, 3, "绝对时间密度") 143 | sheet1.write(0, 4, "绝对信誉度密度") 144 | 145 | # kNN列表是一个5行*很多列的矩阵 146 | for j in range(len(abskNN)):#循环次数为5 147 | for i in range(len(abskNN[j])): 148 | sheet1.write(i+1, j, abskNN[j][i]) 149 | 150 | sheet2 = workbook.add_sheet("相对密度") 151 | sheet2.write(0, 0, "相对任务密度") 152 | sheet2.write(0, 1, "相对会员密度") 153 | sheet2.write(0, 2, "相对限额密度") 154 | sheet2.write(0, 3, "相对时间密度") 155 | sheet2.write(0, 4, "相对信誉度密度") 156 | for j in range(len(relkNN)): 157 | for i in range(len(relkNN[j])): 158 | sheet2.write(i+1, j, relkNN[j][i]) 159 | 160 | if(oldOrNew == 0): 161 | workbook.save("./kNN求密度数据(已完成任务).xls") 162 | else: 163 | workbook.save("./kNN求密度数据(新任务).xls") 164 | 165 | def main(): 166 | strFilePath = './原始信息.xlsx' 167 | k = 7 #k不能为0! 168 | oldOrNew = 1 #oldOrNew参数:0:‘已完成任务’,1:‘新任务’ 169 | 170 | a, b = ReadFileLocation(strFilePath, oldOrNew) 171 | 172 | # 求绝对密度 173 | kNNabs = [] #0任务,1会员,2限额,3时间,4信誉 174 | kNNabs.append(AbskNNAlgorithm(a, a, k)) 175 | kNNabs.append(AbskNNAlgorithm(a, b, k)) 176 | for i in range(3, 6): 177 | c = ReadFileWeight(strFilePath, i) 178 | kNNabs.append(AbskNNAlgorithmWithWeight(a, b, c, k)) 179 | 180 | # kNN列表是5*多维的矩阵 181 | 182 | #求相对密度 183 | kNNrels = [] #0任务,1会员,2限额,3时间,4信誉 184 | for i in range(len(kNNabs)): 185 | kNNrel = [] 186 | tempDataSet = [] 187 | for j in range(len(kNNabs[i])): 188 | tempDataSet.append([a[j], kNNabs[i][j]]) 189 | tempkNNRel = RelakNN(np.array(tempDataSet), k) 190 | for j in range(len(kNNabs[i])): 191 | if(kNNabs[i][j]==float('inf')): 192 | kNNrel.append(0.0) 193 | continue 194 | kNNrel.append(tempkNNRel[j]/kNNabs[i][j]) 195 | kNNrels.append(kNNrel) 196 | 197 | WriteDataInXls(kNNabs, kNNrels, oldOrNew) 198 | 199 | start = time.clock() 200 | main() 201 | elapsed = (time.clock()-start) 202 | print("run time: "+str(elapsed)+" s") 203 | -------------------------------------------------------------------------------- /任务打包+价格修改+测试.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | import numpy as np 5 | from sklearn import tree 6 | import xlrd 7 | import time 8 | from math import * 9 | 10 | ''' 11 | 预计运行时间: 12 | ordOrNew = 0,已完成任务:67.26664800000003 s 13 | ordOrNew = 1,新任务:210.70983800000002 s 14 | ''' 15 | 16 | '''根据经纬度计算两点实际距离km''' 17 | def CulDisFromLL(pointA, pointB): 18 | radlat1 = radians(pointA[0]) 19 | radlat2 = radians(pointB[0]) 20 | a = radlat1 - radlat2 21 | b = radians(pointA[1]) - radians(pointB[1]) 22 | s = 2 * np.arcsin(sqrt(pow(np.sin(a/2),2) + cos(radlat1) * cos(radlat2) * pow(sin(b/2),2))) 23 | earth_radius=6378.137 24 | s = s * earth_radius 25 | if(s < 0): 26 | return -s 27 | else: 28 | return s 29 | 30 | '''读取文件数据''' 31 | def ReadFileLocation(fileName, oldOrNew): 32 | data = xlrd.open_workbook(fileName) 33 | sheet1 = data.sheets()[0] 34 | nrows = sheet1.nrows 35 | 36 | bigTable = [] 37 | for i in range(1, nrows): 38 | tempDenPri = sheet1.row_values(i)[1:7] #密度和价格 39 | if(oldOrNew == 1):#新任务需要先算价格 40 | tempDenPri[5] = 4.4*(tempDenPri[2]-0.589) + 3.5*(tempDenPri[3]-0.884) + 67.25 41 | tempLoc = sheet1.row_values(i)[8:10] #纬经度 42 | bigTable.append([tempDenPri, tempLoc]) 43 | # [[[密度和价格],[坐标]], [[密度和价格],[坐标]], ..., [[密度和价格],[坐标]] ] 44 | 45 | return bigTable 46 | 47 | ''' 48 | 默认出现频率衡量度为距离的kNN 49 | bigTable的格式:[[[密度和价格],[坐标]], [[密度和价格],[坐标]], ..., [[密度和价格],[坐标]] ] 50 | ''' 51 | def kNNUpdatePrice(bigTable, k, p): 52 | nrows = len(bigTable) 53 | for i in range(nrows): 54 | X = bigTable[i] #X的格式:[[密度和价格],[坐标]] 55 | if(X[0][0] < p): 56 | distances = [] 57 | for i in range(nrows): 58 | distances.append([float(CulDisFromLL(X[1], bigTable[i][1])), bigTable[i][0][5]]) 59 | distances.sort(key=lambda x: x[0])#对距离升序 60 | tempPrice = 0 61 | for i in range(1, k+1): #自身距离为0 62 | tempPrice += distances[i][1] 63 | newPrice = (X[0][5] + tempPrice) / (k + 1) 64 | X[0][5] = newPrice #修改价格 65 | 66 | def DecisionTree(): 67 | filePath = './用于决策树与回归的密度数据.xlsx' 68 | data = xlrd.open_workbook(filePath) 69 | table = data.sheet_by_name('相对密度') 70 | comList = table.col_values(7)[1:] #执行情况 71 | nrows = table.nrows 72 | datasRel = [] 73 | for i in range(1, nrows): 74 | datasRel.append(table.row_values(i)[1:7])#相对密度和标价 75 | 76 | reldata = np.array(datasRel) 77 | target = np.array(comList) 78 | 79 | clfrel = tree.DecisionTreeClassifier() 80 | clfrel.fit(np.array(reldata), np.array(target)) 81 | return clfrel 82 | 83 | def main(): 84 | strFilePath = '' 85 | rangeList = [] 86 | k = 7 #k不能为0! 87 | oldOrNew = 1 #oldOrNew参数:0:‘已完成任务’,1:‘新任务’ 88 | clf = DecisionTree() 89 | 90 | if(oldOrNew == 0): 91 | strFilePath = './第三问编程数据.xlsx' 92 | rangeList = list(range(8, 31))#实际为[0.8,3.0] 93 | else: 94 | strFilePath = './第四问编程数据.xlsx' 95 | rangeList = list(range(1, 21))#实际为[0.1,2.0] 96 | 97 | for p in rangeList: 98 | bigTable = ReadFileLocation(strFilePath, oldOrNew) 99 | nrows = len(bigTable) 100 | kNNUpdatePrice(bigTable, k, p/10) 101 | testList = [] 102 | for i in range(nrows): 103 | testList.append(bigTable[i][0]) 104 | res = clf.predict(np.array(testList)) 105 | count = 0 106 | for i in range(len(res)): 107 | if(res[i]==1): 108 | count += 1 109 | print("此时p为%f,测试完成数为%d,完成率为:%f" % (p/10, count, count/nrows)) 110 | 111 | start = time.clock() 112 | main() 113 | elapsed = (time.clock()-start) 114 | print("run time: "+str(elapsed)+" s") 115 | -------------------------------------------------------------------------------- /决策树-深度为6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LittleSec/2017CUMCMCoding/ce88d2a575dc9a21c07f6175edf8a5098d4bbb24/决策树-深度为6.png -------------------------------------------------------------------------------- /决策树分类.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | from sklearn import tree 5 | import numpy as np 6 | import xlrd 7 | import graphviz 8 | import time 9 | 10 | '''读取文件坐标''' 11 | def ReadFile(filePath): 12 | # 表一:已完成的表 13 | data = xlrd.open_workbook(filePath) 14 | table = data.sheet_by_name('相对密度') 15 | 16 | comList = table.col_values(7)[1:] #执行情况 17 | nrows = table.nrows 18 | datasRel = [] 19 | for i in range(1, nrows): 20 | datasRel.append(table.row_values(i)[1:7])#相对密度和标价 21 | return np.array(datasRel), np.array(comList) 22 | 23 | def main(): 24 | strFilePath = './用于决策树与回归的密度数据.xlsx' 25 | 26 | reldata, target = ReadFile(strFilePath) 27 | clfrel = tree.DecisionTreeClassifier(max_depth=6)#可以设置最大深度 28 | clfrel.fit(np.array(reldata), np.array(target)) 29 | 30 | dot_data_rel = tree.export_graphviz(clfrel, out_file=None, class_names = ['未完成','完成'], special_characters=True) 31 | graph_rel = graphviz.Source(dot_data_rel) 32 | graph_rel.render("./DecideTree") 33 | 34 | start = time.clock() 35 | main() 36 | elapsed = (time.clock()-start) 37 | print("run time: "+str(elapsed)+" s") -------------------------------------------------------------------------------- /原始信息.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LittleSec/2017CUMCMCoding/ce88d2a575dc9a21c07f6175edf8a5098d4bbb24/原始信息.xlsx -------------------------------------------------------------------------------- /多元线性回归.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | import numpy as np 5 | import time 6 | import xlrd 7 | from sklearn.cross_validation import train_test_split 8 | from sklearn.linear_model import LinearRegression 9 | from sklearn import metrics 10 | from sklearn.model_selection import cross_val_predict 11 | 12 | '''读取数据''' 13 | def ReadFile(filePath): 14 | data = xlrd.open_workbook(filePath) 15 | table = data.sheet_by_name('相对密度') 16 | 17 | priceList = table.col_values(6)[1:] #任务标价 18 | nrows = table.nrows 19 | datasRel = [] 20 | for i in range(1, nrows): 21 | datasRel.append(table.row_values(i)[1:6])#相对密度 22 | return np.array(datasRel), np.array(priceList) 23 | 24 | 25 | def main(): 26 | strFilePath = './用于决策树与回归的密度数据.xlsx' 27 | 28 | X, y = ReadFile(strFilePath) 29 | #X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) 30 | linreg = LinearRegression() 31 | #linreg.fit(X_train, y_train) 32 | linreg.fit(X, y) 33 | print("回归系数0:"+str(linreg.intercept_)) 34 | print("回归系数:"+str(linreg.coef_)) 35 | ''' 36 | y_pred = linreg.predict(X_test) 37 | # 用scikit-learn计算MSE 38 | print ("MSE:"+str(metrics.mean_squared_error(y_test, y_pred))) 39 | # 用scikit-learn计算RMSE 40 | print ("RMSE:"+str(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))) 41 | ''' 42 | # 10折交叉验证 43 | predicted = cross_val_predict(linreg, X, y, cv=10) 44 | # 用scikit-learn计算MSE 45 | print ("均方差MSE: "+str(metrics.mean_squared_error(y, predicted))) 46 | # 用scikit-learn计算RMSE 47 | print ("均方根差RMSE: "+str(np.sqrt(metrics.mean_squared_error(y, predicted)))) 48 | 49 | start = time.clock() 50 | main() 51 | elapsed = (time.clock()-start) 52 | print("run time: "+str(elapsed)+" s") -------------------------------------------------------------------------------- /测试最佳价格方案.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | ''' 5 | 遍历找最有优k1,k2的测试方案预估时间:290.6369049999994 s 6 | ''' 7 | 8 | from sklearn import tree 9 | import numpy as np 10 | import xlrd 11 | import time 12 | import copy 13 | 14 | '''读取文件坐标''' 15 | def ReadFile(filePath): 16 | # 表一:已完成的表 17 | data = xlrd.open_workbook(filePath) 18 | table = data.sheet_by_name('相对密度') 19 | 20 | comList = table.col_values(7)[1:] #执行情况 21 | nrows = table.nrows 22 | datasRel = [] 23 | for i in range(1, nrows): 24 | datasRel.append(table.row_values(i)[1:7])#相对密度和标价 25 | return np.array(datasRel), np.array(comList) 26 | 27 | '''直接读取测试数据读取''' 28 | def ReadFile2(filePath): 29 | # 表一:已完成的表 30 | data = xlrd.open_workbook(filePath) 31 | table = data.sheet_by_name('相对密度') 32 | 33 | nrows = table.nrows 34 | datasRel = [] 35 | for i in range(1, nrows): 36 | datasRel.append(table.row_values(i)[1:7])#相对密度和标价 37 | return np.array(datasRel) 38 | 39 | '''读取X2,X3用于计算价格,其他同取''' 40 | def ReadFile3(filePath): 41 | # 表一:已完成的表 42 | data = xlrd.open_workbook(filePath) 43 | table = data.sheet_by_name('相对密度') 44 | 45 | x2 = table.col_values(3)[1:] #限额 46 | x3 = table.col_values(4)[1:] #时间 47 | 48 | nrows = table.nrows 49 | datasRel = [] 50 | for i in range(1, nrows): 51 | datasRel.append(table.row_values(i)[1:6])#相对密度,标价自己设 52 | return datasRel, np.array(x2), np.array(x3)#dataRel不要返回narray,因为后期要追加 53 | 54 | def main(): 55 | strFilePath = './用于决策树与回归的密度数据.xlsx' 56 | 57 | reldata, target = ReadFile(strFilePath) 58 | clfrel = tree.DecisionTreeClassifier()#可以设置最大深度 59 | clfrel.fit(np.array(reldata), np.array(target)) 60 | 61 | result_max = 0 62 | result_max_k = [] 63 | 64 | date_no_price, x2, x3 = ReadFile3(strFilePath) 65 | #若计数相同则记录最后一个k 66 | for k1 in range(1, 201):#实际区间是(0,20],刻度为0.1 67 | for k2 in range(1, 201): 68 | datetest = [] 69 | datetest = copy.deepcopy(date_no_price)#不能直接等于,列表时引用类型 70 | xx2 = k1/10 * (x2 - 0.589) 71 | xx3 = k2/10 * (x3 - 0.884) 72 | z = [] 73 | z = xx2 + xx3 + 67.25 74 | for i in range(len(z)): 75 | datetest[i].append(z[i]) 76 | res = clfrel.predict(np.array(datetest))#预测 77 | #统计 78 | count = 0 79 | for i in range(len(res)): 80 | if(res[i]==1): 81 | count += 1 82 | print("k1=%f, k2=%f, count=%d" % (k1/10, k2/10, count)) 83 | if(count > result_max): 84 | result_max = count 85 | result_max_k = [k1/10, k2/10] 86 | print("测试集中完成的人数:%d" % (result_max)) 87 | print("此时k1,k2值分别为:%r" % (result_max_k)) 88 | 89 | 90 | '''已经给出数据表的测试处理 91 | path1= '/Users/littlesec/Downloads/已结束项目处理数据2.xlsx' 92 | testList = ReadFile2(path1) 93 | res = clfrel.predict(testList) 94 | #print(res) 95 | count = 0 96 | for i in range(len(res)): 97 | if(res[i]==1): 98 | count += 1 99 | print(count) 100 | ''' 101 | 102 | start = time.clock() 103 | main() 104 | elapsed = (time.clock()-start) 105 | print("run time: "+str(elapsed)+" s") -------------------------------------------------------------------------------- /用于决策树与回归的密度数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LittleSec/2017CUMCMCoding/ce88d2a575dc9a21c07f6175edf8a5098d4bbb24/用于决策树与回归的密度数据.xlsx -------------------------------------------------------------------------------- /第三问编程数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LittleSec/2017CUMCMCoding/ce88d2a575dc9a21c07f6175edf8a5098d4bbb24/第三问编程数据.xlsx -------------------------------------------------------------------------------- /第四问编程数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LittleSec/2017CUMCMCoding/ce88d2a575dc9a21c07f6175edf8a5098d4bbb24/第四问编程数据.xlsx --------------------------------------------------------------------------------