├── .gitignore ├── Cluster Analysis.ipynb ├── README.md ├── Silhouette Analysis.ipynb ├── datasets ├── Country clusters.csv └── customer_information.csv └── images ├── centroid.png └── euclidean.png /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cluster Analysis 2 | 3 | Cluster analysis plays an important role in exploratory data analysis, data preprocessing, and unsupervised learning tasks. Here, we'll find two notebooks: [Cluster Analysis.ipynb](https://github.com/alexandrehsd/Cluster-Analysis/blob/master/Cluster%20Analysis.ipynb) and [Silhouette Analysis.ipynb](https://github.com/alexandrehsd/Cluster-Analysis/blob/master/Silhouette%20Analysis.ipynb) 4 | 5 | [Cluster Analysis.ipynb](https://github.com/alexandrehsd/Cluster-Analysis/blob/master/Cluster%20Analysis.ipynb) addresses questions like: 6 | 7 | 1. How to perform cluster analysis using the K-Means technique? 8 | 2. How to find the optimal number of clusters? 9 | 3. How to identify appropriate features? 10 | 4. Why and when do we need standardize the data? 11 | 5. Which are the pros and cons of using K-Means? 12 | 6. How to interpret the results? 13 | 14 | [Silhouette Analysis.ipynb](https://github.com/alexandrehsd/Cluster-Analysis/blob/master/Silhouette%20Analysis.ipynb) talks about alternative ways of choosing the optimal number of clusters for the K-Means algorithm. More specifically, it shows how to perform silhouette analysis and plot the decision boundaries of K-Means for 2-dimensional data. 15 | -------------------------------------------------------------------------------- /datasets/Country clusters.csv: -------------------------------------------------------------------------------- 1 | Country,Latitude,Longitude,Language 2 | USA,44.97,-103.77,English 3 | Canada,62.4,-96.8,English 4 | France,46.75,2.4,French 5 | UK,54.01,-2.53,English 6 | Germany,51.15,10.4,German 7 | Australia,-25.45,133.11,English 8 | -------------------------------------------------------------------------------- /datasets/customer_information.csv: -------------------------------------------------------------------------------- 1 | Satisfaction,Loyalty 2 | 2,-0.30721628440590587 3 | 9,1.8734767587775603 4 | 5,0.13607977936588078 5 | 6,2.392348615866672 6 | 1,-0.8011036080997473 7 | 1,1.156677520593523 8 | 4,-0.5275938159138471 9 | 5,-0.9449110483792662 10 | 0,0.18873059419133087 11 | 5,-2.400935099170117 12 | 7,0.6394879286798751 13 | 4,0.5747132207694152 14 | 0,-0.9544690435861138 15 | 6,1.0287194261030335 16 | 5,1.1171158215726769 17 | 9,2.1296167688701333 18 | 4,-1.337787608117793 19 | 9,0.5510207029381178 20 | 3,-0.2681519020581371 21 | 1,0.712699142645814 22 | 0,-2.0743459157706443 23 | 3,-2.5459207099497503 24 | 3,-0.019552939458741925 25 | 6,1.105755440888321 26 | 2,2.5181398885511093 27 | 2,0.0722567754476593 28 | 7,0.5619811481323445 29 | 5,2.715024960173098 30 | 3,-0.870223860688915 31 | 2,0.2197233400323544 32 | 0,0.24760503427805647 33 | 4,-0.37923673149001846 34 | 8,-2.1872418924556465 35 | 2,0.049740244352934226 36 | 3,1.4131780399286682 37 | 2,-1.342757494310959 38 | 7,1.1059630860114316 39 | 5,2.4214786950513765 40 | 7,-0.8950884385005224 41 | 7,-0.7402360324165902 42 | 3,-0.1454274286871189 43 | 5,-0.6473458575929039 44 | 7,-1.6763536118019813 45 | 6,-1.4151850104043673 46 | 6,-0.8776683835216703 47 | 7,-1.0396765339870804 48 | 7,-1.5330394143804202 49 | 1,1.5667997972922167 50 | 7,2.971337007254373 51 | 0,-0.7364256273365193 52 | -------------------------------------------------------------------------------- /images/centroid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrehsd/cluster-analysis/3f1178caad47a567dfeac7ffd7ecd5a0d05a280a/images/centroid.png -------------------------------------------------------------------------------- /images/euclidean.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrehsd/cluster-analysis/3f1178caad47a567dfeac7ffd7ecd5a0d05a280a/images/euclidean.png --------------------------------------------------------------------------------