🤖 AI Summary
Minimum spanning tree (MST)-based clustering lacks theoretical consistency guarantees and robust partitioning mechanisms, especially in low-dimensional divisive clustering. Method: We propose an enhanced Genie–information-theoretic MST clustering framework integrating dynamic edge pruning, entropy-guided hierarchical splitting, and MST post-processing techniques; we further design an Oracle consistency metric to quantify clustering fidelity against ground-truth partitions. Contribution/Results: Evaluated across multiple benchmark datasets, our method significantly outperforms mainstream non-MST algorithms—including K-means, spectral clustering, and DBSCAN—and approaches the Oracle performance upper bound—defined by expert-annotated ground truth—in several scenarios. Results demonstrate that the MST paradigm achieves strong competitiveness and scalability, offering both theoretical insight and practical utility for graph-structured unsupervised learning.
📝 Abstract
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.