基于原始几何概念的Ball k-means聚类算法
DOI:
作者:
作者单位:

华东交通大学理学院

作者简介:

通讯作者:

中图分类号:

基金项目:

江西省自然科学(20242BAB25005)


Ball k-means Clustering Algorithm Based on Primitive Geometric Concepts
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    Ball k-means作为k-means的精确加速算法,能在保证聚类结果一致的同时提升效率,但高维数据下加速效果衰减且邻居簇查找复杂度高。【目的】为提升其高维数据适应能力并保留其在大k聚类中的加速优势,本文提出一种基于原始几何概念的优化算法Ball k-means-G*。【方法】在数据点分配步骤中引入原始几何结构,将数据点与候选质心降维至二维,确定距离下界以跳过冗余高维计算;在邻居簇查找中引入强迫点机制,将质心降至三维,结合原算法判断条件提前排除非邻居簇距离计算。【结果】在不同规模、维度的真实数据集实验显示,该算法性能显著优于5种对比精确加速k-means算法。与原始Ball k-means比,各维度数据集效率平均提升约20.66%。【结论】Ball k-means-G*有效缓解了Ball k-means在高维数据下的性能衰减问题,适用于各类维度与大k聚类任务。

    Abstract:

    Ball k-means, as an accurate acceleration algorithm for k-means, improves efficiency while guaranteeing consistent clustering results. However, its acceleration effect diminishes on high?dimensional data, and the neighbor?searching step involves high complexity. To enhance its adaptability to high?dimensional data while preserving its acceleration advantages in large?k clustering, this paper proposes an optimized algorithm named Ball k?means?G*. In the data point assignment step, a primal?geometry structure is introduced by projecting data points and candidate centroids into two?dimensional space, where a lower bound for distances is established to skip redundant high?dimensional computations. In the neighbor?cluster search, a forced?point mechanism is incorporated: centroids are reduced to three dimensions, and combined with the original algorithm’s pruning conditions to eliminate non?neighbor distance calculations early. Experiments on real datasets with different scales and dimensions show that the proposed algorithm significantly outperforms five existing exact accelerated k?means methods. Compared with the original Ball k?means, it achieves an average efficiency improvement of approximately 20.66% across datasets of various dimensions. The results demonstrate that Ball k?means?G* effectively alleviates the performance degradation of Ball k?means in high?dimensional settings and is applicable to clustering tasks across various data dimensions and with large k values.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-14
  • 最后修改日期:2026-01-28
  • 录用日期:2026-02-15
  • 在线发布日期: 2026-03-20
  • 出版日期:
关闭