Abstract:Abstract:Unsupervised clustering algorithms are an important part of machine learning, among which K-Means is a widely used and efficient unsupervised clustering algorithm. Studies have found that when dealing with high-dimensional data sets or data sets with uneven density distribution, the K-Means algorithm has differences in selecting the number of clustering clusters and difficulties in selecting the initial centroid. 【Objective】In order to deeply understand and improve the problems of initialization centroid and number of cluster selection of the K-Means algorithm, 【Method】An unsupervised clustering algorithm based on knowledge induction drives useless centers is proposed. The algorithm first introduces the concept of high-density knowledge points, calculates the high-density knowledge points in the data set, and stores them in the pre-initialized center of mass set. Then, using the Gaussian distribution concept to analyze the data set to obtain the closest optimal cluster number, and then uses the useless center strategy to process the pre-initialized center of mass set, and selects the optimal initialized center of mass. Finally, the selected initialized centroid is clustered as the initialized centroid of the K-Means algorithm.【Result】Experimental results on 5 different types of data sets show that the clustering performance of the algorithm proposed is better than the other 5 comparison algorithms.【Conclusion】The clustering algorithm has a good effect on data sets with uneven density.