kFuse: A novel density based agglomerative clustering

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing agglomerative clustering methods rely on multi-parameter tuning and suffer from unstable connectivity distance computation, leading to poor generalizability across datasets. This paper proposes kFuse, a density-driven agglomerative clustering method requiring only the desired number of final clusters as input. Its key contributions are: (1) automatic subcluster partitioning via natural neighbors, eliminating manual specification of local density thresholds; (2) a dual-criterion merging mechanism integrating boundary connectivity with density statistics (mean and variance), enhancing structural sensitivity and stability; and (3) an adaptive hierarchical merging rule that replaces conventional single-distance metrics. Extensive experiments on synthetic and real-world datasets demonstrate that kFuse significantly improves clustering accuracy and robustness—particularly under challenging conditions including complex cluster shapes, heterogeneous densities, and noise contamination.

Technology Category

Application Category

📝 Abstract
Agglomerative clustering has emerged as a vital tool in data analysis due to its intuitive and flexible characteristics. However, existing agglomerative clustering methods often involve additional parameters for sub-cluster partitioning and inter-cluster similarity assessment. This necessitates different parameter settings across various datasets, which is undoubtedly challenging in the absence of prior knowledge. Moreover, existing agglomerative clustering techniques are constrained by the calculation method of connection distance, leading to unstable clustering results. To address these issues, this paper introduces a novel density-based agglomerative clustering method, termed kFuse. kFuse comprises four key components: (1) sub-cluster partitioning based on natural neighbors; (2) determination of boundary connectivity between sub-clusters through the computation of adjacent samples and shortest distances; (3) assessment of density similarity between sub-clusters via the calculation of mean density and variance; and (4) establishment of merging rules between sub-clusters based on boundary connectivity and density similarity. kFuse requires the specification of the number of clusters only at the final merging stage. Additionally, by comprehensively considering adjacent samples, distances, and densities among different sub-clusters, kFuse significantly enhances accuracy during the merging phase, thereby greatly improving its identification capability. Experimental results on both synthetic and real-world datasets validate the effectiveness of kFuse.
Problem

Research questions and friction points this paper is trying to address.

Eliminates need for parameter tuning in agglomerative clustering
Improves stability by redefining connection distance calculation
Enhances accuracy via boundary connectivity and density similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sub-cluster partitioning using natural neighbors
Boundary connectivity via adjacent samples and shortest distances
Density similarity assessment with mean density and variance
🔎 Similar Papers
No similar papers found.
Huan Yan
Huan Yan
Tsinghua University
Spatio-temporal data miningrecommender system
J
Junjie Hu
Department of Computer Science and Engineering, Shanghai Jiao Tong University