Abstract:Ball k-means, as an accurate acceleration algorithm for k-means, improves efficiency while guaranteeing consistent clustering results. However, its acceleration effect diminishes on high?dimensional data, and the neighbor?searching step involves high complexity. To enhance its adaptability to high?dimensional data while preserving its acceleration advantages in large?k clustering, this paper proposes an optimized algorithm named Ball k?means?G*. In the data point assignment step, a primal?geometry structure is introduced by projecting data points and candidate centroids into two?dimensional space, where a lower bound for distances is established to skip redundant high?dimensional computations. In the neighbor?cluster search, a forced?point mechanism is incorporated: centroids are reduced to three dimensions, and combined with the original algorithm’s pruning conditions to eliminate non?neighbor distance calculations early. Experiments on real datasets with different scales and dimensions show that the proposed algorithm significantly outperforms five existing exact accelerated k?means methods. Compared with the original Ball k?means, it achieves an average efficiency improvement of approximately 20.66% across datasets of various dimensions. The results demonstrate that Ball k?means?G* effectively alleviates the performance degradation of Ball k?means in high?dimensional settings and is applicable to clustering tasks across various data dimensions and with large k values.