├── LICENSE ├── README.md ├── kmeans.vba └── kmeans.xlsm /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 bquanttrading, asmquant, gpolic 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # k-means 2 | 3 | K-means is an algorithm for cluster analysis (clustering). It is the process of partitioning a set of data into related groups / clusters. 4 | K-means clustering is useful for Data Mining and Business Intelligence. 5 | 6 | Here is k-means in plain English: 7 | https://hackerbits.com/data/k-means-data-mining-algorithm/ 8 | 9 | This script is based on the work of bquanttrading. His blog on market modelling and market analytics: 10 | 11 | https://asmquantmacro.com 12 | 13 | 14 | # What does it do 15 | 16 | k-means will classify each record in your data, placing it into a group (cluster). You do not need to specify the properties of each group, k-means will decide for the groups. However, usually we need to provide the number of groups that we want in the output. 17 | 18 | The records in the same cluster are similar to each other. Records in different clusters are dissimilar. 19 | 20 | Each row of your Excel data, should be a record/observation with one or more features. Each column is a feature in the observation. 21 | 22 | As an example, here is a data set with the height and weight of 25,000 children in Hong Kong : http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html 23 | 24 | Each row in the data represents a person. Each column is a feature of the person. 25 | 26 | Currently the script works _only_ with numerical data. 27 | 28 | 29 | 30 | # How does it work 31 | 32 | * Enter your data in a new Excel worksheet 33 | * Enter the name of the worksheet in cell C4, and the range of the data at C5 34 | * Enter the worksheet for the output to be placed, at C6 (you can use the one where your data is) 35 | * Enter the cell where the output will be updated at C7 36 | * Number of groups in your data at C8 37 | * Click the button to start 38 | * Check the Result 39 | 40 | If you do not know the number of clusters/groups contained in your data, try different values for example 1 up to 10. 41 | Execute the script several times and observe the GAP figure. 42 | At the point where GAP reaches its maximum value, it indicates that the number of clusters is efficient for this data set. 43 | 44 | As an example, changing the number of clusters and calculating with the IRIS data set, GAP will maximize when we have 3 clusters. 45 | 46 | The original paper that describes the GAP calculation: https://web.stanford.edu/~hastie/Papers/gap.pdf 47 | 48 | # The results 49 | 50 | The result is a number assigned on each record, that indicates the group/cluster the record belongs to. 51 | 52 | The Result sheet contains information on the clusters, along with the cluster centers. 53 | 54 | 55 | # Performance 56 | 57 | When the "Distance" value is minimized, it indicates the output accuracy is higher. 58 | 59 | Execute the algorithm several times to find the best results. 60 | 61 | The script will stop execution when the clusters are normalized or when the maximum iterations are reached (whichever comes first). You can increase the number of iterations for better results. 62 | 63 | Unfortunately Excel VBA runs on a single thread, therefore it does not take full advantage of your current CPU's 64 | 65 | # Why is this different? 66 | 67 | The script calculates the initial centroids using _k-means++_ algorithm. You do not have to provide the initial centroids. 68 | It also provides an indication of the number of groups contained in the data, using the GAP calculation. 69 | 70 | # More info 71 | 72 | This is implementing David Arthur and Sergei Vassilvitski k-means++ algorithm, which chooses the initial centroids. 73 | https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf 74 | 75 | The example dataset provided in kmeans.xlsx is _IRIS_ from UC Irvine Machine Learning Repository: https://archive.ics.uci.edu/dataset/53/iris 76 | 77 | 78 | -------------------------------------------------------------------------------- /kmeans.vba: -------------------------------------------------------------------------------- 1 | Option Base 1 2 | Option Explicit 3 | 4 | Public Sub kmeans() 5 | Dim wkSheet As Worksheet 6 | Set wkSheet = ActiveWorkbook.Worksheets("Start") 7 | 8 | Dim MaximumIterations As Integer: MaximumIterations = wkSheet.Range("MaximumIterations").Value 9 | Dim DataSht As String: DataSht = wkSheet.Range("InputSheet").Value 10 | Dim DataRange As String: DataRange = wkSheet.Range("InputRange").Value 11 | Dim DataRecords As Variant: DataRecords = Worksheets(DataSht).Range(DataRange) 12 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 13 | Dim NUMCLUSTERS As Integer: NUMCLUSTERS = wkSheet.Range("Clusters").Value 14 | Dim ClusterIndexes As Variant, Centroids As Variant, InitialCentroidsCalc As Variant 15 | Dim ClustersUpdated As Integer, counter As Integer: counter = 1 16 | Dim StartTime As Double 17 | 18 | StartTime = Timer 19 | Application.StatusBar = " [ Initialize. ]" 20 | 21 | ' initialize centroids with kmeans++ method 22 | InitialCentroidsCalc = ComputeInitialCentroidsCalc(DataRecords, NUMCLUSTERS) 23 | 24 | Application.StatusBar = " [ Start.. ]" 25 | 'Application.ScreenUpdating = False 26 | 27 | ' First pass. Assign each record(observation) in a initial cluster. ClusterIndexes is updated 28 | ClustersUpdated = FindClosestCentroid(DataRecords, InitialCentroidsCalc, ClusterIndexes) 29 | 30 | ' The result returned from FindClosestCentroid is not relevant right now 31 | ClustersUpdated = 1 32 | 33 | ' We will process k-means until it is normalized or MaximumIterationserations reached 34 | While counter <= MaximumIterations And ClustersUpdated > 0 35 | Application.StatusBar = " [ Pass: " + CStr(counter) + " ]" 36 | 37 | ' calculate new centroids for each cluster 38 | Centroids = ComputeCentroids(DataRecords, ClusterIndexes, NUMCLUSTERS) 39 | 40 | ' assign each record in a cluster based on the new centroids 41 | ClustersUpdated = FindClosestCentroid(DataRecords, Centroids, ClusterIndexes) 42 | counter = counter + 1 43 | Wend 44 | 45 | Application.StatusBar = " Completed after " + CStr(counter - 1) + " iterations" 46 | 'Application.ScreenUpdating = True 47 | 48 | ' show the clusters assigned in the output sheet/range 49 | Dim ClusterOutputSht As String: ClusterOutputSht = wkSheet.Range("OutputSheet").Value 50 | Dim ClusterOutputRange As String: ClusterOutputRange = wkSheet.Range("OutputRange").Value 51 | Worksheets(ClusterOutputSht).Range(ClusterOutputRange).Resize(NUMBER_OF_RECORDS, 1).Value = WorksheetFunction.Transpose(ClusterIndexes) 52 | 53 | Call ShowResult(DataRecords, ClusterIndexes, Centroids, NUMCLUSTERS) 54 | 55 | ' show more results 56 | Dim Distance As Double, ExpO As Double, Wk As Double 57 | 58 | Distance = CalculateDistances(DataRecords, Centroids, ClusterIndexes) 59 | ExpO = CalculateExpectation(DataRecords, NUMCLUSTERS) 60 | Wk = (1 / (2 * NUMBER_OF_RECORDS)) * Distance 61 | 62 | wkSheet.Range("C16").Value = Distance 63 | wkSheet.Range("C17").Value = ExpO - Log(Wk) 64 | 'wkSheet.Range("C18").Value = ExpO 65 | 'wkSheet.Range("C19").Value = Wk 66 | 67 | 'MsgBox "Time elapsed " & Round(Timer - StartTime, 2) & " seconds", vbInformation 68 | End Sub 69 | 70 | 71 | Function CalculateDistances(ByRef DataRecords As Variant, ByRef Centroids As Variant, ByRef Cluster_Indexes As Variant) As Variant 72 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 73 | Dim NUMBER_OF_COLUMNS As Integer: NUMBER_OF_COLUMNS = UBound(DataRecords, 2) 74 | Dim NUMCLUSTERS As Integer: NUMCLUSTERS = UBound(Centroids, 1) 75 | Dim DistanceInCluster() As Variant: ReDim DistanceInCluster(NUMCLUSTERS) 76 | Dim clusterCounter, recordCounter, recordsInCluster As Integer 77 | Dim DistanceSum As Double: DistanceSum = 0 78 | 79 | For clusterCounter = 1 To NUMCLUSTERS 80 | 81 | recordsInCluster = 0 82 | For recordCounter = 1 To NUMBER_OF_RECORDS 83 | 84 | If Cluster_Indexes(recordCounter) = clusterCounter Then 85 | DistanceInCluster(clusterCounter) = DistanceInCluster(clusterCounter) + _ 86 | EuclideanDistance(Application.Index(Centroids, clusterCounter, 0), Application.Index(DataRecords, recordCounter, 0), NUMBER_OF_COLUMNS) 87 | recordsInCluster = recordsInCluster + 1 88 | End If 89 | 90 | Next recordCounter 91 | 92 | 'DistanceSum = DistanceSum + Sqr(DistanceInCluster(clusterCounter) / recordsInCluster) 93 | DistanceSum = DistanceSum + DistanceInCluster(clusterCounter) 94 | Next clusterCounter 95 | 96 | CalculateDistances = DistanceSum 97 | End Function 98 | 99 | 100 | Function CalculateExpectation(ByRef DataRecords As Variant, NUMCLUSTERS As Integer) As Double 101 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 102 | Dim NUMBER_OF_COLUMNS As Integer: NUMBER_OF_COLUMNS = UBound(DataRecords, 2) 103 | 104 | CalculateExpectation = Log((NUMBER_OF_RECORDS * NUMBER_OF_COLUMNS) / 12) - ((2 / NUMBER_OF_COLUMNS) * Log(NUMCLUSTERS)) 105 | End Function 106 | 107 | 108 | ' Select initial centroids 109 | ' 110 | Function ComputeInitialCentroidsCalc(ByRef DataRecords As Variant, NUMCLUSTERS As Integer) As Variant 111 | 112 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 113 | Dim NUMBER_OF_COLUMNS As Integer: NUMBER_OF_COLUMNS = UBound(DataRecords, 2) 114 | Dim Taken() As Variant: ReDim Taken(NUMBER_OF_RECORDS) 115 | 116 | Dim InitialCentroidsCalc As Variant: ReDim InitialCentroidsCalc(NUMCLUSTERS, NUMBER_OF_COLUMNS) As Variant 117 | Dim minDistSquared As Variant: ReDim minDistSquared(NUMBER_OF_RECORDS) 118 | Dim counter As Integer, CentroidsFound As Integer, FirstCentroidIndex As Integer 119 | Dim dist As Double 120 | Dim preventLoop As Boolean: preventLoop = True 121 | Dim FirstCentroid As Variant: ReDim FirstCentroid(NUMBER_OF_COLUMNS) 122 | 123 | 124 | FirstCentroidIndex = Int(Rnd * NUMBER_OF_RECORDS) + 1 ' The first centroid is random ! 125 | 126 | ' Change the kmeans++ standard algorithm. We choose the first centroid with the mean values, not by random selection 127 | ' First Centroid - Choose the record that is closer to the mean 128 | ' ------------------------------------------------------------------ 129 | ' Dim colCounter As Integer 130 | ' For colCounter = 1 To NUMBER_OF_COLUMNS 131 | ' For counter = 1 To NUMBER_OF_RECORDS 132 | ' FirstCentroid(colCounter) = FirstCentroid(colCounter) + DataRecords(counter, colCounter) 133 | ' Next counter 134 | ' FirstCentroid(colCounter) = FirstCentroid(colCounter) / NUMBER_OF_RECORDS ' find the mean 135 | ' Next colCounter 136 | ' 137 | ' Dim MinimumDistance As Double: MinimumDistance = 99999999 138 | ' Dim MinRecord As Variant 139 | ' Dim recordNumber As Integer 140 | ' For recordNumber = 1 To NUMBER_OF_RECORDS ' calculate distance to all records and select the record closer to the mean 141 | ' dist = EuclideanDistance(Application.Index(DataRecords, recordNumber, 0), FirstCentroid, NUMBER_OF_COLUMNS) 142 | ' If dist < MinimumDistance Then 143 | ' FirstCentroidIndex = recordNumber ' the record with lowest distance to the means will be 1st centroid 144 | ' MinimumDistance = dist 145 | ' End If 146 | ' Next recordNumber ' check with next data record 147 | ' ------------------------------------------------------------------ 148 | 149 | For counter = 1 To NUMBER_OF_COLUMNS 150 | ' put this data record in FirstCentroid 151 | FirstCentroid(counter) = DataRecords(FirstCentroidIndex, counter) 152 | 153 | ' and put it also in the array of results 154 | InitialCentroidsCalc(1, counter) = FirstCentroid(counter) 155 | Next counter 156 | 157 | ' mark point as Taken. We have one cluster center 158 | Taken(FirstCentroidIndex) = 1 159 | CentroidsFound = 1 160 | 161 | For counter = 1 To NUMBER_OF_RECORDS 162 | 163 | If Not counter = FirstCentroidIndex Then 164 | dist = EuclideanDistance(FirstCentroid, Application.Index(DataRecords, counter, 0), NUMBER_OF_COLUMNS) 165 | minDistSquared(counter) = dist * dist 166 | End If 167 | 168 | Next counter 169 | 170 | ' main loop 171 | Do While CentroidsFound < NUMCLUSTERS And preventLoop = True 172 | 173 | ' sum all the squared distances of the points not already taken 174 | Dim distSqSum As Double: distSqSum = 0 175 | For counter = 1 To NUMBER_OF_RECORDS 176 | 177 | If Not Taken(counter) = 1 Then 178 | distSqSum = distSqSum + minDistSquared(counter) 179 | End If 180 | 181 | Next counter 182 | 183 | ' add one new point. each point is chosen with probability proportional to D(x)2 184 | Dim R As Double 185 | R = Rnd * distSqSum 186 | 187 | ' the index of the next point to be added as cluster center 188 | Dim nextpoint As Integer 189 | nextpoint = -1 190 | 191 | 192 | ' scan through the dist squared distances until sum > R 193 | Dim sum As Double: sum = 0 194 | For counter = 1 To NUMBER_OF_RECORDS 195 | 196 | If Not Taken(counter) = 1 Then 197 | sum = sum + minDistSquared(counter) 198 | 199 | If sum > R Then 200 | nextpoint = counter 201 | Exit For 202 | End If 203 | 204 | End If 205 | 206 | Next counter 207 | 208 | ' if a new point was not found yet, just pick the last available data record 209 | If nextpoint = -1 Then 210 | For counter = NUMBER_OF_RECORDS To 1 Step -1 211 | 212 | If Not Taken(counter) = 1 Then 213 | nextpoint = counter 214 | End If 215 | 216 | Next counter 217 | End If 218 | 219 | If nextpoint >= 0 Then 220 | 221 | ' we found the next cluster center! Mark the data record as Taken 222 | CentroidsFound = CentroidsFound + 1 223 | Taken(nextpoint) = 1 224 | 225 | ' copy the data in the array to our result 226 | For counter = 1 To NUMBER_OF_COLUMNS 227 | InitialCentroidsCalc(CentroidsFound, counter) = DataRecords(nextpoint, counter) 228 | Next counter 229 | 230 | ' need to find more centroids. we will adjust the minSqDistance 231 | If CentroidsFound < NUMCLUSTERS Then 232 | 233 | For counter = 1 To NUMBER_OF_RECORDS 234 | 235 | If Not Taken(counter) = 1 Then 236 | 237 | ' find the distance to the new centroid 238 | Dim dista As Double, distSquared As Double 239 | 240 | dista = EuclideanDistance(Application.Index(InitialCentroidsCalc, CentroidsFound, 0), Application.Index(DataRecords, counter, 0), NUMBER_OF_COLUMNS) 241 | distSquared = dista * dista 242 | 243 | ' if the distance to the new centroid is lower than the previous, then use it 244 | If distSquared < minDistSquared(counter) Then 245 | minDistSquared(counter) = distSquared 246 | End If 247 | End If 248 | 249 | Next counter 250 | 251 | End If 252 | 253 | Else ' there is no cluster center found 254 | preventLoop = False ' make sure that the while loop can terminate 255 | End If 256 | Loop 257 | 258 | ComputeInitialCentroidsCalc = InitialCentroidsCalc 259 | End Function 260 | 261 | 262 | Public Function EuclideanDistance(X As Variant, Y As Variant, NumberOfObservations As Integer) As Double 263 | Dim counter As Integer 264 | Dim RunningSumSqr As Double: RunningSumSqr = 0 265 | 266 | For counter = 1 To NumberOfObservations 267 | RunningSumSqr = RunningSumSqr + ((X(counter) - Y(counter)) ^ 2) 268 | Next counter 269 | 270 | EuclideanDistance = Sqr(RunningSumSqr) 271 | End Function 272 | 273 | 274 | 275 | ' For each record in Data Records, find the closest Centroid (cluster) 276 | ' The result is calculated and placed in Cluster_Indexes() 277 | ' This number is the cluster were we placed the record. This is more effective than creating new Arrays with Clusters 278 | ' 279 | Public Function FindClosestCentroid(ByRef DataRecords As Variant, ByRef Centroids As Variant, ByRef Cluster_Indexes As Variant) As Integer 280 | Dim NUMCLUSTERS As Integer: NUMCLUSTERS = UBound(Centroids, 1) 281 | Dim NUMBER_OF_COLUMNS As Integer: NUMBER_OF_COLUMNS = UBound(Centroids, 2) 282 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 283 | Dim idx() As Variant: ReDim idx(NUMBER_OF_RECORDS) As Variant 284 | Dim recordsCounter As Integer, clusterCounter As Integer 285 | Dim changeCounter As Integer: changeCounter = 0 286 | 287 | For recordsCounter = 1 To NUMBER_OF_RECORDS 288 | 289 | Dim MinimumDistance As Double: MinimumDistance = 99999999 290 | Dim MinCluster As Integer 291 | Dim dist As Double: dist = 0 292 | 293 | ' calculate distance to all centroids and assign to the minimum distance cluster 294 | For clusterCounter = 1 To NUMCLUSTERS 295 | dist = EuclideanDistance(Application.Index(DataRecords, recordsCounter, 0), Application.Index(Centroids, clusterCounter, 0), NUMBER_OF_COLUMNS) 296 | If dist < MinimumDistance Then 297 | 298 | ' this record will be assigned to cluster MinCluster when we find the min distance 299 | MinCluster = clusterCounter 300 | MinimumDistance = dist 301 | End If 302 | Next clusterCounter 303 | 304 | ' change the cluster index to the closest cluster 305 | idx(recordsCounter) = MinCluster 306 | 307 | ' During the first run Cluster Indexes is Empty 308 | If Not (IsEmpty(Cluster_Indexes)) Then 309 | 310 | ' If the old cluster index is not the same as the new one 311 | If Not (Cluster_Indexes(recordsCounter) = idx(recordsCounter)) Then 312 | 313 | ' indicate that a change occured 314 | changeCounter = changeCounter + 1 315 | End If 316 | 317 | End If 318 | 319 | Next recordsCounter ' next record 320 | 321 | FindClosestCentroid = changeCounter 322 | 323 | ' update the clusters 324 | Cluster_Indexes = idx() 325 | End Function 326 | 327 | 328 | 329 | ' Show the results in the Result sheet 330 | ' 331 | Public Sub ShowResult(ByRef DataRecords As Variant, ByRef Cluster_Indexes As Variant, ByRef Centroids, NUMCLUSTERS As Integer) 332 | Dim resultSheet As Worksheet 333 | Dim lRowLast As Integer, lColLast As Integer, counter As Integer 334 | Dim Rng As Range 335 | Dim ClusterObjects() As Variant: ReDim ClusterObjects(NUMCLUSTERS) As Variant 336 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 337 | 338 | Set resultSheet = ActiveWorkbook.Worksheets("Result") 339 | 340 | 341 | ' clear the old data in Result sheet 342 | With resultSheet 343 | lRowLast = .UsedRange.Row + .UsedRange.Rows.Count - 1 344 | lColLast = .UsedRange.Column + .UsedRange.Columns.Count - 1 345 | Set Rng = .Range(.Range("B4"), .Cells(lRowLast, lColLast)) 346 | End With 347 | Rng.ClearContents 348 | 349 | ' initialize Cluster object count 350 | For counter = 1 To NUMCLUSTERS 351 | ClusterObjects(counter) = 0 352 | resultSheet.Cells(4, 1 + counter).Value = counter 353 | Next counter 354 | 355 | ' for every record in this cluster, increase the counter 356 | For counter = 1 To NUMBER_OF_RECORDS 357 | ClusterObjects(Cluster_Indexes(counter)) = ClusterObjects(Cluster_Indexes(counter)) + 1 358 | Next counter 359 | 360 | ' Show the final centroids in the results 361 | resultSheet.Range("B5").Resize(1, NUMCLUSTERS).Value = ClusterObjects 362 | resultSheet.Range("B9").Resize(UBound(Centroids, 1), UBound(Centroids, 2)).Value = Centroids 363 | 364 | End Sub 365 | 366 | 367 | ' This will sum all the records in a cluster, and average the values. The calculated averages will form the new Centroids 368 | ' 369 | Public Function ComputeCentroids(DataRecords As Variant, ClusterIdx As Variant, Number_Of_Clusters As Integer) As Variant 370 | Dim NUMBER_OF_RECORDS As Integer: NUMBER_OF_RECORDS = UBound(DataRecords, 1) 371 | Dim NUMBER_OF_FEATURES As Integer: NUMBER_OF_FEATURES = UBound(DataRecords, 2) 372 | Dim clusterNumber As Integer, columnNumber As Integer, recordNumber As Integer, counter As Integer 373 | Dim tempSum() As Variant: ReDim tempSum(Number_Of_Clusters, NUMBER_OF_FEATURES) As Variant 374 | Dim Centroids() As Variant: ReDim Centroids(Number_Of_Clusters, NUMBER_OF_FEATURES) As Variant 375 | 376 | For clusterNumber = 1 To Number_Of_Clusters 377 | 378 | For columnNumber = 1 To NUMBER_OF_FEATURES 379 | 380 | counter = 0 381 | For recordNumber = 1 To NUMBER_OF_RECORDS 382 | If ClusterIdx(recordNumber) = clusterNumber Then 383 | 384 | ' if this record is part of the cluster then add 385 | Centroids(clusterNumber, columnNumber) = Centroids(clusterNumber, columnNumber) + DataRecords(recordNumber, columnNumber) 386 | counter = counter + 1 387 | End If 388 | Next recordNumber 389 | 390 | If counter > 0 Then 391 | 392 | ' compute the new centroid averaging all records in the cluster 393 | Centroids(clusterNumber, columnNumber) = Centroids(clusterNumber, columnNumber) / counter 394 | Else 395 | Centroids(clusterNumber, columnNumber) = 0 396 | End If 397 | 398 | Next columnNumber 399 | 400 | Next clusterNumber 401 | 402 | ComputeCentroids = Centroids 403 | End Function 404 | 405 | 406 | 407 | 408 | -------------------------------------------------------------------------------- /kmeans.xlsm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpolic/kmeans-excel/99282c3334806aed801e2038e8a6f23b2c9ef65d/kmeans.xlsm --------------------------------------------------------------------------------