AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm

📅 2024-08-13

📈 Citations: 1

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Persistent homology-based clustering algorithms—particularly in topological data analysis (e.g., Mapper)—often suffer from sensitivity to manually tuned hyperparameters, limiting robustness and broad applicability. To address this, we propose AuToMATo, the first fully automated, persistent-homology-driven clustering algorithm requiring no user-specified parameters. It integrates ToMATo’s density-peak detection with bootstrap-based significance testing to automatically identify statistically significant modes in the density function, thereby eliminating all hyperparameter dependence. Grounded rigorously in persistent homology theory, AuToMATo ensures mathematical soundness while delivering cross-dataset robustness. Extensive experiments demonstrate that AuToMATo consistently outperforms state-of-the-art parameter-free methods and frequently surpasses optimally tuned parametric alternatives across multiple benchmarks. An open-source Python implementation—fully compatible with the scikit-learn API—is publicly available and already integrated into standard topological analysis workflows.

Technology Category

Application Category

📝 Abstract

We present AuToMATo, a novel clustering algorithm based on persistent homology. While AuToMATo is not parameter-free per se, we provide default choices for its parameters that make it into an out-of-the-box clustering algorithm that performs well across the board. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that AuToMATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the Mapper algorithm, where it is desirable to work with a clustering algorithm that does not need tuning of its parameters. Indeed, we provide evidence that AuToMATo performs well when used with Mapper. Finally, we provide an open-source implementation of AuToMATo in Python that is fully compatible with the standard scikit-learn architecture.

Problem

Research questions and friction points this paper is trying to address.

Develops parameter-free clustering using persistent homology

Separates significant density peaks via bootstrapping procedure

Enables automated clustering for topological data analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines ToMATo algorithm with bootstrapping procedure

Uses persistent homology for density peak significance

Provides out-of-the-box parameter defaults for clustering

🔎 Similar Papers

No similar papers found.