├── README.md
├── RF_MexStandalone-v0.02.zip
├── data
    ├── description
    ├── 乳腺癌数据集.xlsx
    └── 乳腺癌数据集简介.xlsx
└── main.m


/README.md:
--------------------------------------------------------------------------------
 1 | # Random-Forest-MATLAB
 2 | 
 3 | 随机森林工具包-MATLAB版
 4 | 
 5 | using MATLAB to achieve RF algorithm,and the decision tree is ID3 , C4.5 and CART.
 6 | 
 7 | I had achieve these by different ways.
 8 | 
 9 | 此处复现的是 《MATLAB神经网络43个案例分析》中的第30章，基于随机森林思想的组合分类器设计（-乳腺癌诊断）中的随机森林实现
10 | 
11 | 包括威斯康辛大学医学院的乳腺癌数据集，共包括569个病例，其中，良性357例，恶性212例。该次实现随机选取500组数据作为训练集，剩余69组作为测试集。
12 | 
13 | 包括科罗拉多大学博尔德分校Abhishek Jaiantilal 开发的randomforest-matlab开源工具箱（下载地址https://code.google.com/p/randomforest-matlab/）
14 | ，其复现代码见 main.m 函数。
15 | 
16 | 调用格式为：
17 | 
18 | model = classRF_train(X,Y,ntree,mtry,extra_options)
19 | 
20 | 其中， X 为训练集的输入样本矩阵，其每一列表示一个变量（属性〉，其每一行表示一个样本； Y
21 | 为训练集的输出样本向量，其每一行表示 X 中对应的样本所属的类别 z ntree 为随机森林中决
22 | 策树的个数（默认值为 SOO) ;mtry 为分裂属性集中的属性个数（默认值 m =l v'MJ .M 为总的
23 | 属性个数，符号L . 」表示向下取整） ; extra_options 为可选的参数； model 为创建好的随机森林
24 | 分类器。
25 | 
26 | [Y-hat, votes] = classRF_predict(X,model,extra_options)
27 | 
28 | 其中， X 为待预测样本的输入矩阵，其每一列表示一个变量（属性〉，其每一行表示一个样本；
29 | model 为创建好的随机森林分类器； extra_options 为可选的参数 ； Y_hat 为待预测样本对应的
30 | 所属类别； votes 为朱格式化的待预测样本输出类别权重，即将待预测样本预测为各个类别的
31 | 决策树个数。
32 | 


--------------------------------------------------------------------------------
/RF_MexStandalone-v0.02.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/QyqByte/Random-Forest-MATLAB/733462aaefa992e28344c8cf9d2cab6e76ba8c1e/RF_MexStandalone-v0.02.zip


--------------------------------------------------------------------------------
/data/description:
--------------------------------------------------------------------------------
1 | This part is the original data. 
2 | 


--------------------------------------------------------------------------------
/data/乳腺癌数据集.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/QyqByte/Random-Forest-MATLAB/733462aaefa992e28344c8cf9d2cab6e76ba8c1e/data/乳腺癌数据集.xlsx


--------------------------------------------------------------------------------
/data/乳腺癌数据集简介.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/QyqByte/Random-Forest-MATLAB/733462aaefa992e28344c8cf9d2cab6e76ba8c1e/data/乳腺癌数据集简介.xlsx


--------------------------------------------------------------------------------
/main.m:
--------------------------------------------------------------------------------
 1 | clear 
 2 | clc
 3 | warning off 
 4 | % 导入数据
 5 | load data.mat
 6 | a = randperm(569);
 7 | Train = data(a(1:500),:);
 8 | Test = data(a(501:end),:);
 9 | % 分配训练数据
10 | P_train = Train(:,3:end);
11 | T_train = Train(:,2);
12 | % 测试数据
13 | P_test = Test(:,3:end);
14 | T_test = Test(:,2);
15 | % 创建随机森林分类器
16 | model = classRF_train(P_train,T_train,500,3);
17 | % 仿真测试
18 | [T_sim, votes] = classRF_predict(P_test, model);
19 | 
20 | % 结果分析
21 | count_B = length(find(T_train == 1));
22 | count_M = length(find(T_train == 2));
23 | total_B = length(find(data(:,2) == 1));
24 | total_M = length(find(data(:,2) == 2));
25 | number_B = length(find(T_test == 1));
26 | number_M = length(find(T_test == 2));
27 | number_B_sim = length(find(T_sim == 1& T_test == 1));
28 | number_M_sim = length(find(T_sim == 2& T_test == 2));
29 | 
30 | disp(['病例总数：' num2str(569) ...
31 | '    良性：' num2str(total_B) '    恶性：' num2str(total_M)]);
32 | disp(['训练集病例总数：' num2str(500) '    良性：' num2str(count_B) ...
33 | '    恶性：' num2str(count_M)]);
34 | disp(['测试集病例总数：' num2str(69) '    良性：' num2str(number_B) ...
35 | '    恶性：' num2str(number_M)]);
36 | disp(['良性乳腺肿瘤确诊：' num2str(number_B_sim) '    误诊：' num2str(number_B - number_B_sim) ...
37 | '    确诊率：' num2str(number_B_sim/number_B*100) '%']);
38 | disp(['恶性乳腺肿瘤确诊：' num2str(number_M_sim) '    误诊：' num2str(number_M - number_M_sim) ...
39 | '    确诊率：' num2str(number_M_sim/number_M*100) '%']);
40 | 
41 | % 绘图
42 | figure
43 | 
44 | index = find(T_sim ~= T_test);
45 | plot(votes(index,1),votes(index,2),'r*') % r 表示red，即用红色*
46 | hold on
47 | 
48 | index = find(T_sim == T_test);
49 | plot(votes(index,1),votes(index,2),'bo') % b 表示blue，即用绿色o
50 | hold on
51 | 
52 | legend('错误分类样本', '正确分类样本')
53 | 
54 | plot(0:500,500:-1:0,'y-.')
55 | hold on
56 | 
57 | plot(0:500,0:500,'y-.')
58 | hold on 
59 | 
60 | line([100 400 400 100 100],[100 100 400 400 100])
61 | xlabel('输出为类别1的决策树颗数')
62 | ylabel('输出为类别2的决策树颗数')
63 | title('决策树颗数和分类准确率的关系')
64 | % 
65 | % % 随机森林中决策树颗数对性能的影响
66 | % Accuracy = zeros(1,20); % 返回一个1*20 的全为零的矩阵
67 | % for i = 50:50:1000
68 | %     
69 | %     % 每种情况，运行10次，取其平均值
70 | %     accuracy = zeros(1,100);
71 | %     for k = 1:10
72 | %         % 创建随机森林
73 | %         model = classRF_train(P_train,T_train,i);
74 | %         % 仿真测试
75 | %         T_sim = classRF_predict(P_test,model);
76 | %         accuracy(k) = length(find(T_sim == T_test))/length(T_test);
77 | %     end
78 | %     Accuracy(i/50) = mean(accuracy);
79 | % end
80 | % % 绘图
81 | % figure
82 | % plot(50:50:1000,Accuracy);
83 | % xlabel('随机森林中决策树颗数');
84 | % ylabel('分类正确率');
85 | % title('随机森林中决策树颗数对性能的影响');
86 | % 
87 | 


--------------------------------------------------------------------------------