上QQ阅读APP看书，第一时间看更新

K-means clustering

The goal of this K-means clustering algorithm is to find K groups in the data, with each group having similar data points. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.

The K value is assigned randomly at the beginning of the algorithm and different variations of results could be obtained by altering the K value. Once the algorithm sequence of activities is initiated after the selection of K, as depicted in the following points, we find that there are two major steps that keep repeating, until there is no further scope for changes in the clusters.

The two major steps that get repeated are Step 2 and Step 3, depicted as follows:

Step 2: Assigning the data point from the dataset to any of the K clusters. This is done by calculating the distance of the data point from the cluster centroid. As specified, any one of the distance functions that we discussed already could be used for this calculation.
Step 3: Here again, recalibration of the centroid occurs. This is done by taking the mean of all data points assigned to that centroid cluster.

The final output of the algorithm is K clusters that have similar data points:

Select k-seeds d(k_i,kj) > d_min
Assign points to clusters according to minimum distance:

Compute new cluster centroids:

Reassign points to the cluster (as in Step 2)
Iterate until no points change the cluster.

Here are some areas where clustering algorithms are used:

City planning
Earthquake studies
Insurance
Marketing
Medicine, for the analysis of antimicrobial activity and medical imaging
Crime analysis
Robotics, for anomaly detection and natural language processing

本周热推：

业务数智化：从数字化到数智化的体系化解决方案一本书读懂大数据 Python金融大数据分析（第2版）SQL必知必会（第5版）实现领域驱动设计