├── result.jpg ├── iris_test_data.jpg ├── iris_test_data.txt ├── README.md ├── iris_data.txt └── bayes_iris.py /result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Erikfather/bayes-python/HEAD/result.jpg -------------------------------------------------------------------------------- /iris_test_data.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Erikfather/bayes-python/HEAD/iris_test_data.jpg -------------------------------------------------------------------------------- /iris_test_data.txt: -------------------------------------------------------------------------------- 1 | 4.8,3.0,1.4,0.3,Iris-setosa 2 | 5.1,3.8,1.6,0.2,Iris-setosa 3 | 5.7,3.0,4.2,1.2,Iris-versicolor 4 | 6.3,2.5,5.0,1.9,Iris-virginica 5 | 5.7,2.9,4.2,1.3,Iris-versicolor 6 | 4.6,3.2,1.4,0.2,Iris-setosa 7 | 5.3,3.7,1.5,0.2,Iris-setosa 8 | 6.5,3.0,5.2,2.0,Iris-virginica 9 | 5.0,3.3,1.4,0.2,Iris-setosa 10 | 6.2,2.9,4.3,1.3,Iris-versicolor 11 | 5.1,2.5,3.0,1.1,Iris-versicolor 12 | 5.7,2.8,4.1,1.3,Iris-versicolor 13 | 6.7,3.0,5.2,2.3,Iris-virginica 14 | 6.2,3.4,5.4,2.3,Iris-virginica 15 | 5.9,3.0,5.1,1.8,Iris-virginica -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # bayes-python 2 | ### 具体代码见:bayes_iris.py 3 | ### 我直接用了iris_data数据集,每种花我选取前45条数据当做训练集,剩下5条数据另外存入测试集iris_test_data,并将数据随机手动打乱 4 | #### 测试集如下: 5 | ![image](https://github.com/Erikfather/bayes-python/blob/master/iris_test_data.jpg) 6 | #### 因为这个数据集是连续性属性,所以需要利用概率密度函数。 7 | #### 具体实验步骤为: 8 | #### (1)先读取数据集 9 | #### (2)计算训练数据集上每个类别的各个特征属性上的均值和方差 10 | #### (3)开始对测试数据集进行分类 11 | #### (4)首先估计先验概率,这里我每个类别所占整体数据集的比例是一样的 12 | #### (5)利用概率密度函数,计算测试数据集上各个属性在每个类别上的条件概率 13 | #### (6)计算后验概率=先验概率*条件概率 14 | #### (7)比较在各个类别上的后验概率,取最大值,则分为这个类别 15 | 16 | #### 结果如下: 17 | ![image](https://github.com/Erikfather/bayes-python/blob/master/result.jpg) 18 | #### 我们将结果与测试集比较发现结果完全正确! 19 | -------------------------------------------------------------------------------- /iris_data.txt: -------------------------------------------------------------------------------- 1 | 5.1,3.5,1.4,0.2,Iris-setosa 2 | 4.9,3.0,1.4,0.2,Iris-setosa 3 | 4.7,3.2,1.3,0.2,Iris-setosa 4 | 4.6,3.1,1.5,0.2,Iris-setosa 5 | 5.0,3.6,1.4,0.2,Iris-setosa 6 | 5.4,3.9,1.7,0.4,Iris-setosa 7 | 4.6,3.4,1.4,0.3,Iris-setosa 8 | 5.0,3.4,1.5,0.2,Iris-setosa 9 | 4.4,2.9,1.4,0.2,Iris-setosa 10 | 4.9,3.1,1.5,0.1,Iris-setosa 11 | 5.4,3.7,1.5,0.2,Iris-setosa 12 | 4.8,3.4,1.6,0.2,Iris-setosa 13 | 4.8,3.0,1.4,0.1,Iris-setosa 14 | 4.3,3.0,1.1,0.1,Iris-setosa 15 | 5.8,4.0,1.2,0.2,Iris-setosa 16 | 5.7,4.4,1.5,0.4,Iris-setosa 17 | 5.4,3.9,1.3,0.4,Iris-setosa 18 | 5.1,3.5,1.4,0.3,Iris-setosa 19 | 5.7,3.8,1.7,0.3,Iris-setosa 20 | 5.1,3.8,1.5,0.3,Iris-setosa 21 | 5.4,3.4,1.7,0.2,Iris-setosa 22 | 5.1,3.7,1.5,0.4,Iris-setosa 23 | 4.6,3.6,1.0,0.2,Iris-setosa 24 | 5.1,3.3,1.7,0.5,Iris-setosa 25 | 4.8,3.4,1.9,0.2,Iris-setosa 26 | 5.0,3.0,1.6,0.2,Iris-setosa 27 | 5.0,3.4,1.6,0.4,Iris-setosa 28 | 5.2,3.5,1.5,0.2,Iris-setosa 29 | 5.2,3.4,1.4,0.2,Iris-setosa 30 | 4.7,3.2,1.6,0.2,Iris-setosa 31 | 4.8,3.1,1.6,0.2,Iris-setosa 32 | 5.4,3.4,1.5,0.4,Iris-setosa 33 | 5.2,4.1,1.5,0.1,Iris-setosa 34 | 5.5,4.2,1.4,0.2,Iris-setosa 35 | 4.9,3.1,1.5,0.1,Iris-setosa 36 | 5.0,3.2,1.2,0.2,Iris-setosa 37 | 5.5,3.5,1.3,0.2,Iris-setosa 38 | 4.9,3.1,1.5,0.1,Iris-setosa 39 | 4.4,3.0,1.3,0.2,Iris-setosa 40 | 5.1,3.4,1.5,0.2,Iris-setosa 41 | 5.0,3.5,1.3,0.3,Iris-setosa 42 | 4.5,2.3,1.3,0.3,Iris-setosa 43 | 4.4,3.2,1.3,0.2,Iris-setosa 44 | 5.0,3.5,1.6,0.6,Iris-setosa 45 | 5.1,3.8,1.9,0.4,Iris-setosa 46 | 4.8,3.0,1.4,0.3,Iris-setosa 47 | 5.1,3.8,1.6,0.2,Iris-setosa 48 | 4.6,3.2,1.4,0.2,Iris-setosa 49 | 5.3,3.7,1.5,0.2,Iris-setosa 50 | 5.0,3.3,1.4,0.2,Iris-setosa 51 | 7.0,3.2,4.7,1.4,Iris-versicolor 52 | 6.4,3.2,4.5,1.5,Iris-versicolor 53 | 6.9,3.1,4.9,1.5,Iris-versicolor 54 | 5.5,2.3,4.0,1.3,Iris-versicolor 55 | 6.5,2.8,4.6,1.5,Iris-versicolor 56 | 5.7,2.8,4.5,1.3,Iris-versicolor 57 | 6.3,3.3,4.7,1.6,Iris-versicolor 58 | 4.9,2.4,3.3,1.0,Iris-versicolor 59 | 6.6,2.9,4.6,1.3,Iris-versicolor 60 | 5.2,2.7,3.9,1.4,Iris-versicolor 61 | 5.0,2.0,3.5,1.0,Iris-versicolor 62 | 5.9,3.0,4.2,1.5,Iris-versicolor 63 | 6.0,2.2,4.0,1.0,Iris-versicolor 64 | 6.1,2.9,4.7,1.4,Iris-versicolor 65 | 5.6,2.9,3.6,1.3,Iris-versicolor 66 | 6.7,3.1,4.4,1.4,Iris-versicolor 67 | 5.6,3.0,4.5,1.5,Iris-versicolor 68 | 5.8,2.7,4.1,1.0,Iris-versicolor 69 | 6.2,2.2,4.5,1.5,Iris-versicolor 70 | 5.6,2.5,3.9,1.1,Iris-versicolor 71 | 5.9,3.2,4.8,1.8,Iris-versicolor 72 | 6.1,2.8,4.0,1.3,Iris-versicolor 73 | 6.3,2.5,4.9,1.5,Iris-versicolor 74 | 6.1,2.8,4.7,1.2,Iris-versicolor 75 | 6.4,2.9,4.3,1.3,Iris-versicolor 76 | 6.6,3.0,4.4,1.4,Iris-versicolor 77 | 6.8,2.8,4.8,1.4,Iris-versicolor 78 | 6.7,3.0,5.0,1.7,Iris-versicolor 79 | 6.0,2.9,4.5,1.5,Iris-versicolor 80 | 5.7,2.6,3.5,1.0,Iris-versicolor 81 | 5.5,2.4,3.8,1.1,Iris-versicolor 82 | 5.5,2.4,3.7,1.0,Iris-versicolor 83 | 5.8,2.7,3.9,1.2,Iris-versicolor 84 | 6.0,2.7,5.1,1.6,Iris-versicolor 85 | 5.4,3.0,4.5,1.5,Iris-versicolor 86 | 6.0,3.4,4.5,1.6,Iris-versicolor 87 | 6.7,3.1,4.7,1.5,Iris-versicolor 88 | 6.3,2.3,4.4,1.3,Iris-versicolor 89 | 5.6,3.0,4.1,1.3,Iris-versicolor 90 | 5.5,2.5,4.0,1.3,Iris-versicolor 91 | 5.5,2.6,4.4,1.2,Iris-versicolor 92 | 6.1,3.0,4.6,1.4,Iris-versicolor 93 | 5.8,2.6,4.0,1.2,Iris-versicolor 94 | 5.0,2.3,3.3,1.0,Iris-versicolor 95 | 5.6,2.7,4.2,1.3,Iris-versicolor 96 | 5.7,3.0,4.2,1.2,Iris-versicolor 97 | 5.7,2.9,4.2,1.3,Iris-versicolor 98 | 6.2,2.9,4.3,1.3,Iris-versicolor 99 | 5.1,2.5,3.0,1.1,Iris-versicolor 100 | 5.7,2.8,4.1,1.3,Iris-versicolor 101 | 6.3,3.3,6.0,2.5,Iris-virginica 102 | 5.8,2.7,5.1,1.9,Iris-virginica 103 | 7.1,3.0,5.9,2.1,Iris-virginica 104 | 6.3,2.9,5.6,1.8,Iris-virginica 105 | 6.5,3.0,5.8,2.2,Iris-virginica 106 | 7.6,3.0,6.6,2.1,Iris-virginica 107 | 4.9,2.5,4.5,1.7,Iris-virginica 108 | 7.3,2.9,6.3,1.8,Iris-virginica 109 | 6.7,2.5,5.8,1.8,Iris-virginica 110 | 7.2,3.6,6.1,2.5,Iris-virginica 111 | 6.5,3.2,5.1,2.0,Iris-virginica 112 | 6.4,2.7,5.3,1.9,Iris-virginica 113 | 6.8,3.0,5.5,2.1,Iris-virginica 114 | 5.7,2.5,5.0,2.0,Iris-virginica 115 | 5.8,2.8,5.1,2.4,Iris-virginica 116 | 6.4,3.2,5.3,2.3,Iris-virginica 117 | 6.5,3.0,5.5,1.8,Iris-virginica 118 | 7.7,3.8,6.7,2.2,Iris-virginica 119 | 7.7,2.6,6.9,2.3,Iris-virginica 120 | 6.0,2.2,5.0,1.5,Iris-virginica 121 | 6.9,3.2,5.7,2.3,Iris-virginica 122 | 5.6,2.8,4.9,2.0,Iris-virginica 123 | 7.7,2.8,6.7,2.0,Iris-virginica 124 | 6.3,2.7,4.9,1.8,Iris-virginica 125 | 6.7,3.3,5.7,2.1,Iris-virginica 126 | 7.2,3.2,6.0,1.8,Iris-virginica 127 | 6.2,2.8,4.8,1.8,Iris-virginica 128 | 6.1,3.0,4.9,1.8,Iris-virginica 129 | 6.4,2.8,5.6,2.1,Iris-virginica 130 | 7.2,3.0,5.8,1.6,Iris-virginica 131 | 7.4,2.8,6.1,1.9,Iris-virginica 132 | 7.9,3.8,6.4,2.0,Iris-virginica 133 | 6.4,2.8,5.6,2.2,Iris-virginica 134 | 6.3,2.8,5.1,1.5,Iris-virginica 135 | 6.1,2.6,5.6,1.4,Iris-virginica 136 | 7.7,3.0,6.1,2.3,Iris-virginica 137 | 6.3,3.4,5.6,2.4,Iris-virginica 138 | 6.4,3.1,5.5,1.8,Iris-virginica 139 | 6.0,3.0,4.8,1.8,Iris-virginica 140 | 6.9,3.1,5.4,2.1,Iris-virginica 141 | 6.7,3.1,5.6,2.4,Iris-virginica 142 | 6.9,3.1,5.1,2.3,Iris-virginica 143 | 5.8,2.7,5.1,1.9,Iris-virginica 144 | 6.8,3.2,5.9,2.3,Iris-virginica 145 | 6.7,3.3,5.7,2.5,Iris-virginica 146 | 6.7,3.0,5.2,2.3,Iris-virginica 147 | 6.3,2.5,5.0,1.9,Iris-virginica 148 | 6.5,3.0,5.2,2.0,Iris-virginica 149 | 6.2,3.4,5.4,2.3,Iris-virginica 150 | 5.9,3.0,5.1,1.8,Iris-virginica 151 | 152 | 153 | -------------------------------------------------------------------------------- /bayes_iris.py: -------------------------------------------------------------------------------- 1 | #coding:utf-8 2 | import math 3 | 4 | Iris_setosa_data=[] 5 | Iris_versicolor_data=[] 6 | Iris_virginica_data=[] 7 | #读取训练数据集,这里我将每种花取前45条数据,剩下的5条数据另外存入测试数据集 8 | def read_train_data(filename): 9 | f=open(filename,'r') 10 | all_lines=f.readlines() 11 | for line in all_lines[0:45]: 12 | line=line.strip().split(',') 13 | Iris_setosa_data.append(line[0:4]) 14 | #Iris_setosa_label+=1 15 | for line in all_lines[51:95]: 16 | line=line.strip().split(',') 17 | Iris_versicolor_data.append(line[0:4]) 18 | #Iris_versicolor_label+=1 19 | for line in all_lines[101:145]: 20 | line=line.strip().split(',') 21 | Iris_virginica_data.append(line[0:4]) 22 | #Iris_virginica_label+=1 23 | return Iris_setosa_data,Iris_versicolor_data,Iris_virginica_data 24 | 25 | test_data=[] 26 | #读取测试数据集 27 | def read_test_data(testname): 28 | f=open(testname,'r') 29 | all_lines=f.readlines() 30 | for line in all_lines[0:]: 31 | line=line.strip().split(',') #以逗号为分割符拆分列表 32 | test_data.append(line) 33 | return test_data 34 | 35 | #计算均值和方差 36 | def calculate_junzhi_and_fangcha(train_data): 37 | x1_sum=0.0 38 | x2_sum=0.0 39 | x3_sum=0.0 40 | x4_sum=0.0 41 | 42 | for x in train_data: #计算各个特征的和 43 | x1_sum+=float(x[0]) 44 | x2_sum+=float(x[1]) 45 | x3_sum+=float(x[2]) 46 | x4_sum+=float(x[3]) 47 | #print(x[0],x[1],x[2],x[3]) 48 | #计算样本在各个属性上取值的均值 49 | u_x1=x1_sum/45 50 | u_x2=x2_sum/45 51 | u_x3=x3_sum/45 52 | u_x4=x4_sum/45 53 | 54 | k1=0.0 55 | k2=0.0 56 | k3=0.0 57 | k4=0.0 58 | #计算各类样本在第i个属性上的方差 59 | for x in train_data: 60 | k1+=(float(x[0])-u_x1)**2 61 | k2+=(float(x[1])-u_x2)**2 62 | k3+=(float(x[2])-u_x3)**2 63 | k4+=(float(x[3])-u_x4)**2 64 | variance_x1=k1/45 65 | variance_x2=k2/45 66 | variance_x3=k3/45 67 | variance_x4=k4/45 68 | 69 | return u_x1,u_x2,u_x3,u_x4,variance_x1,variance_x2,variance_x3,variance_x4 70 | 71 | #计算每个属性估计条件概率 72 | def calculate_P_xi_c(u_x1,u_x2,u_x3,u_x4,variance_x1,variance_x2,variance_x3,variance_x4,line_data): 73 | p_x1_c=(1/math.sqrt(2*math.pi))*math.exp(-(float(line_data[0])-u_x1)**2/(2*variance_x1)) 74 | p_x2_c=(1/math.sqrt(2*math.pi))*math.exp(-(float(line_data[1])-u_x2)**2/(2*variance_x2)) 75 | p_x3_c=(1/math.sqrt(2*math.pi))*math.exp(-(float(line_data[2])-u_x3)**2/(2*variance_x3)) 76 | p_x4_c=(1/math.sqrt(2*math.pi))*math.exp(-(float(line_data[3])-u_x4)**2/(2*variance_x4)) 77 | 78 | return p_x1_c,p_x2_c,p_x3_c,p_x4_c 79 | 80 | 81 | 82 | if __name__ == '__main__': 83 | filename='iris_data.txt' 84 | testname='iris_test_data.txt' 85 | Iris_setosa_data,Iris_versicolor_data,Iris_virginica_data=read_train_data(filename) 86 | 87 | #Iris_setosa种类的各个特征属性上的均值和方差 88 | Iris_setosa_u_x1,Iris_setosa_u_x2,Iris_setosa_u_x3,Iris_setosa_u_x4,\ 89 | Iris_setosa_variance_x1,Iris_setosa_variance_x2,Iris_setosa_variance_x3,\ 90 | Iris_setosa_variance_x4=calculate_junzhi_and_fangcha(Iris_setosa_data) 91 | #Iris_versicolor种类的各个特征属性上的均值和方差 92 | Iris_versicolor_u_x1,Iris_versicolor_u_x2,Iris_versicolor_u_x3,Iris_versicolor_u_x4,\ 93 | Iris_versicolor_variance_x1,Iris_versicolor_variance_x2,Iris_versicolor_variance_x3,\ 94 | Iris_versicolor_variance_x4=calculate_junzhi_and_fangcha(Iris_versicolor_data) 95 | #Iris_virginica种类的各个特征属性上的均值和方差 96 | Iris_virginica_u_x1,Iris_virginica_u_x2,Iris_virginica_u_x3,Iris_virginica_u_x4,\ 97 | Iris_virginica_variance_x1,Iris_virginica_variance_x2,Iris_virginica_variance_x3,\ 98 | Iris_virginica_variance_x4=calculate_junzhi_and_fangcha(Iris_virginica_data) 99 | 100 | '''开始测试''' 101 | test_data=read_test_data(testname) 102 | #print ('test_data',test_data) 103 | #估计类先验概率 104 | p1=len(Iris_setosa_data)/(len(Iris_versicolor_data)+len(Iris_virginica_data)+len(Iris_setosa_data)) 105 | p2=len(Iris_versicolor_data)/(len(Iris_versicolor_data)+len(Iris_virginica_data)+len(Iris_setosa_data)) 106 | p3=len(Iris_virginica_data)/(len(Iris_versicolor_data)+len(Iris_virginica_data)+len(Iris_setosa_data)) 107 | for x in test_data: 108 | #在Iris_setosa种类上的各个特征属性的条件概率 109 | P_x1_Iris_setosa,P_x2_Iris_setosa,P_x3_Iris_setosa,P_x4_Iris_setosa=calculate_P_xi_c(Iris_setosa_u_x1,Iris_setosa_u_x2,Iris_setosa_u_x3,Iris_setosa_u_x4,\ 110 | Iris_setosa_variance_x1,Iris_setosa_variance_x2,Iris_setosa_variance_x3,Iris_setosa_variance_x4,x) 111 | #print(P_x1_Iris_setosa,P_x2_Iris_setosa,P_x3_Iris_setosa,P_x4_Iris_setosa) 112 | 113 | #在Iris_versicolor种类上的各个特征属性的条件概率 114 | P_x1_Iris_versicolor,P_x2_Iris_versicolor,P_x3_Iris_versicolor,P_x4_Iris_versicolor=calculate_P_xi_c(Iris_versicolor_u_x1,Iris_versicolor_u_x2,Iris_versicolor_u_x3,Iris_versicolor_u_x4,\ 115 | Iris_versicolor_variance_x1,Iris_versicolor_variance_x2,Iris_versicolor_variance_x3,Iris_versicolor_variance_x4,x) 116 | #print(P_x1_Iris_versicolor,P_x2_Iris_versicolor,P_x3_Iris_versicolor) 117 | 118 | #在Iris_virginica种类上的各个特征属性的条件概率 119 | P_x1_Iris_virginica,P_x2_Iris_virginica,P_x3_Iris_virginica,P_x4_Iris_virginica=calculate_P_xi_c(Iris_virginica_u_x1,Iris_virginica_u_x2,Iris_virginica_u_x3,Iris_virginica_u_x4,\ 120 | Iris_virginica_variance_x1,Iris_virginica_variance_x2,Iris_virginica_variance_x3,Iris_virginica_variance_x4,x) 121 | #print(P_x1_Iris_virginica,P_x2_Iris_virginica,P_x3_Iris_virginica,P_x4_Iris_virginica) 122 | 123 | #计算各个种类上的后验概率 124 | P_Iris_setosa=p1*P_x1_Iris_setosa*P_x2_Iris_setosa*P_x3_Iris_setosa*P_x4_Iris_setosa 125 | #print( P_Iris_setosa) 126 | P_Iris_versicolor=p2*P_x1_Iris_versicolor*P_x2_Iris_versicolor*P_x3_Iris_versicolor*P_x4_Iris_versicolor 127 | #print( P_Iris_versicolor) 128 | P_Iris_virginica=p3*P_x1_Iris_virginica*P_x2_Iris_virginica*P_x3_Iris_virginica*P_x4_Iris_virginica 129 | #print( P_Iris_virginica) 130 | 131 | if P_Iris_setosa>P_Iris_versicolor and P_Iris_setosa>P_Iris_virginica: 132 | print(x[0],x[1],x[2],x[3],":这行数据属于Iris_setosa类") 133 | if P_Iris_versicolor>P_Iris_setosa and P_Iris_versicolor>P_Iris_virginica: 134 | print(x[0],x[1],x[2],x[3],":这行数据属于Iris_versicolor类") 135 | if P_Iris_virginica>P_Iris_setosa and P_Iris_virginica>P_Iris_versicolor: 136 | print(x[0],x[1],x[2],x[3],":这行数据属于Iris_virginica类") 137 | 138 | 139 | 140 | 141 | 142 | 143 | --------------------------------------------------------------------------------