├── README.md ├── dbscan.m ├── dist.m ├── dpeak_auto.m ├── dpeak_manual.m ├── drawshapes.m ├── epsilon.m ├── kmeans.m └── results ├── original_scatter.png ├── result_1.png ├── result_2.png ├── result_3.png ├── result_iris.png ├── result_spiral.png └── result_wine.png /README.md: -------------------------------------------------------------------------------- 1 | # Clustering 2 | 3 | > 三种聚类算法的比较,在不同数据分布的数据集上测试这三个算法。 4 | 5 | 6 | 7 | ## Dataset Download 8 | 9 | Click [this](https://pan.baidu.com/s/1UNRLugkOKTIfHOzgdH1VEQ) to get the dataset. 10 | 11 | 12 | 13 | ## Description 14 | 15 | 该项目包含3个聚类算法,分别是K-Means, DBSCAN, DPeak算法,其中datasets文件包含了6个数据集,它们的散点图对应如下: 16 | 17 | ![](./results/original_scatter.png) 18 | 19 | 20 | 21 | ## Results 22 | 23 | ```matlab 24 | >>kmeans('datasets/1/', 7, 0.01) 25 | >>dbscan('datasets/1/', 4, 6) 26 | >>dpeak_auto('datasets/1/', 4, 'gaussian',7) 27 | ``` 28 | 29 | ![1](./results/result_1.png) 30 | 31 | ```matlab 32 | >>kmeans('datasets/2/', 5, 0.01) 33 | >>dbscan('datasets/2/', 10, 3) 34 | >>dpeak_auto('datasets/2/', 3, 'gaussian',5) 35 | ``` 36 | 37 | ![2](./results/result_2.png) 38 | 39 | ```matlab 40 | >>kmeans('datasets/3/', 5, 0.0275) 41 | >>dbscan('datasets/3/', 8, 0.023) 42 | >>dpeak_auto('datasets/3/', 4, 'gaussian',5) 43 | ``` 44 | 45 | ![3](./results/result_3.png) 46 | 47 | ```matlab 48 | >>kmeans('datasets/iris/', 3, 0.01) 49 | >>dbscan('datasets/iris/', 1, 0.078) 50 | >>dpeak_auto('datasets/iris/', 4, 'gaussian',3) 51 | ``` 52 | 53 | ![iris](./results/result_iris.png) 54 | 55 | ```matlab 56 | >>kmeans('datasets/wine/', 3, 0.01) 57 | >>dbscan('datasets/wine/', 1, 0.13) 58 | >>dpeak_auto('datasets/wine/', 5.64, 'gaussian',3) 59 | ``` 60 | 61 | ![wine](./results/result_wine.png) 62 | 63 | ```matlab 64 | >>kmeans('datasets/spiral/', 3, 0.05) 65 | >>dbscan('datasets/spiral/', 15, 3.5689) 66 | >>dpeak_manual('datasets/spiral/', 5.64, 'gaussian') 67 | ``` 68 | 69 | ![spiral](./results/result_spiral.png) 70 | 71 | -------------------------------------------------------------------------------- /dbscan.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/dbscan.m -------------------------------------------------------------------------------- /dist.m: -------------------------------------------------------------------------------- 1 | function [D] = dist(i, x) 2 | % Aimed at Calculating the Euclidean distances between the i-th object and all objects in x 3 | % ------------------------------------------------------------------------- 4 | % Input: 5 | % i - an object (1,n) 6 | % x - data matrix (m,n); m-objects, n-variables 7 | % ------------------------------------------------------------------------- 8 | % Output: 9 | % D - Euclidean distance (m,1) 10 | 11 | [m,n]=size(x); 12 | D = sqrt(sum((((ones(m,1)*i)-x).^2)')); 13 | 14 | if n == 1 15 | D = abs((ones(m,1) * i - x))'; 16 | end 17 | 18 | end -------------------------------------------------------------------------------- /dpeak_auto.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/dpeak_auto.m -------------------------------------------------------------------------------- /dpeak_manual.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/dpeak_manual.m -------------------------------------------------------------------------------- /drawshapes.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/drawshapes.m -------------------------------------------------------------------------------- /epsilon.m: -------------------------------------------------------------------------------- 1 | function [Eps] = epsilon(x, k) 2 | % Analytical way of estimating neighborhood radius for DBSCAN 3 | % ------------------------------------------------------------------------- 4 | % Input: 5 | % x - data matrix (m,n); m-objects, n-variables 6 | % k - number of objects in a neighborhood of an object 7 | % (minimal number of objects considered as a cluster) 8 | % ------------------------------------------------------------------------- 9 | % Output: 10 | % Eps - neighborhood radius, if not known avoid this parameter or put [] 11 | 12 | [m,n] = size(x); 13 | Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n); 14 | 15 | end -------------------------------------------------------------------------------- /kmeans.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/kmeans.m -------------------------------------------------------------------------------- /results/original_scatter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/original_scatter.png -------------------------------------------------------------------------------- /results/result_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_1.png -------------------------------------------------------------------------------- /results/result_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_2.png -------------------------------------------------------------------------------- /results/result_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_3.png -------------------------------------------------------------------------------- /results/result_iris.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_iris.png -------------------------------------------------------------------------------- /results/result_spiral.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_spiral.png -------------------------------------------------------------------------------- /results/result_wine.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/imaginespark/clustering/6306ff2adab0ec2f1a9214000bb2eaeb4750f2c1/results/result_wine.png --------------------------------------------------------------------------------