Clustering with minimum spanning trees: How good can it be?

📅 2023-03-10
🏛️ Journal of Classification
📈 Citations: 10
Influential: 1
📄 PDF
🤖 AI Summary
Minimum spanning tree (MST)-based clustering lacks theoretical consistency guarantees and robust partitioning mechanisms, especially in low-dimensional divisive clustering. Method: We propose an enhanced Genie–information-theoretic MST clustering framework integrating dynamic edge pruning, entropy-guided hierarchical splitting, and MST post-processing techniques; we further design an Oracle consistency metric to quantify clustering fidelity against ground-truth partitions. Contribution/Results: Evaluated across multiple benchmark datasets, our method significantly outperforms mainstream non-MST algorithms—including K-means, spectral clustering, and DBSCAN—and approaches the Oracle performance upper bound—defined by expert-annotated ground truth—in several scenarios. Results demonstrate that the MST paradigm achieves strong competitiveness and scalability, offering both theoretical insight and practical utility for graph-structured unsupervised learning.
📝 Abstract
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.
Problem

Research questions and friction points this paper is trying to address.

Quantifying MST effectiveness in low-dimensional clustering tasks
Comparing MST methods against expert labels and benchmarks
Developing improved MST-based partitioning schemes for clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

MST-based clustering methods outperform traditional algorithms
Genie and information-theoretic approaches show superior performance
Novel MST partitioning schemes generalize existing techniques
🔎 Similar Papers
No similar papers found.
M
M. Gagolewski
Systems Research Institute, Polish Academy of Sciences
A
Anna Cena
Warsaw University of Technology, Faculty of Mathematics and Information Science
M
Maciej Bartoszuk
QED Software
Ł
Łukasz Brzozowski
Warsaw University of Technology, Faculty of Mathematics and Information Science