Abstract:Cluster analysis is an important technique for data mining. In the 5G era, massive data has high dimensions and large data sets. The K-means algorithm is susceptible to outliers, and the k value and the selection of initial clustering centers affect the stability and accuracy of the clustering result. It even causes the clustering to fall into the local optimum, so the improvement of the K-means algorithm has attracted the attention of many researchers. This article mainly summarizes the current research status of K-means clustering. Firstly, it introduces the principle of K-means algorithm. Secondly, according to the selection of the initial clustering center point, the determination of the K value, and the outliers, the existing improved algorithms are classified and summarized based on density and distance, and the advantages and disadvantages of each improved algorithm are analyzed. Finally, the K-means algorithm is analyzed and prospects for possible future research directions and trends are discussed.