基于知识诱导驱动无用中心的无监督聚类算法
DOI:
作者:
作者单位:

1.华东交通大学理学院;2.华东交通大学

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助(12361004)


Unsupervised Clustering Algorithm based on Knowledge-Induced Drive Useless Center
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    摘要:无监督聚类算法是机器学习的重要组成部分,其中K-Means是一种被广泛使用的高效无监督聚类算法。研究发现,在处理高维度数据集或者密度分布不均匀的数据集时,K-Means算法在选择聚类簇数量参数时存在分歧,并且在选取初始化质心方面也存在困难。【目的】为了深度了解并改善K-Means算法的初始化质心和聚类簇数量问题,【方法】提出了一种基于知识诱导驱动无用中心的无监督聚类算法。该算法首先引入高密度知识点的概念,计算出数据集中的高密度知识点,并将其存入预初始化质心集合中,再利用高斯分布思想理念分析数据集得出最接近最优聚类簇数量,然后使用无用中心策略,处理预初始化质心集合,选出最优初始化质心。最后将选取的初始化质心作为K-Means算法的初始化质心进行聚类。【结果】在5个不同类型的数据集上的实验结果表明,所提出的算法聚类性能要优于其余5个对比算法。【结论】该聚类算法在密度不均匀的数据集上有较好的效果。

    Abstract:

    Abstract:Unsupervised clustering algorithms are an important part of machine learning, among which K-Means is a widely used and efficient unsupervised clustering algorithm. Studies have found that when dealing with high-dimensional data sets or data sets with uneven density distribution, the K-Means algorithm has differences in selecting the number of clustering clusters and difficulties in selecting the initial centroid. 【Objective】In order to deeply understand and improve the problems of initialization centroid and number of cluster selection of the K-Means algorithm, 【Method】An unsupervised clustering algorithm based on knowledge induction drives useless centers is proposed. The algorithm first introduces the concept of high-density knowledge points, calculates the high-density knowledge points in the data set, and stores them in the pre-initialized center of mass set. Then, using the Gaussian distribution concept to analyze the data set to obtain the closest optimal cluster number, and then uses the useless center strategy to process the pre-initialized center of mass set, and selects the optimal initialized center of mass. Finally, the selected initialized centroid is clustered as the initialized centroid of the K-Means algorithm.【Result】Experimental results on 5 different types of data sets show that the clustering performance of the algorithm proposed is better than the other 5 comparison algorithms.【Conclusion】The clustering algorithm has a good effect on data sets with uneven density.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-03-25
  • 最后修改日期:2025-04-26
  • 录用日期:2025-05-29
  • 在线发布日期: 2026-06-05
  • 出版日期:
关闭