🤖 AI Summary
Persistent homology-based clustering algorithms—particularly in topological data analysis (e.g., Mapper)—often suffer from sensitivity to manually tuned hyperparameters, limiting robustness and broad applicability. To address this, we propose AuToMATo, the first fully automated, persistent-homology-driven clustering algorithm requiring no user-specified parameters. It integrates ToMATo’s density-peak detection with bootstrap-based significance testing to automatically identify statistically significant modes in the density function, thereby eliminating all hyperparameter dependence. Grounded rigorously in persistent homology theory, AuToMATo ensures mathematical soundness while delivering cross-dataset robustness. Extensive experiments demonstrate that AuToMATo consistently outperforms state-of-the-art parameter-free methods and frequently surpasses optimally tuned parametric alternatives across multiple benchmarks. An open-source Python implementation—fully compatible with the scikit-learn API—is publicly available and already integrated into standard topological analysis workflows.
📝 Abstract
We present AuToMATo, a novel clustering algorithm based on persistent homology. While AuToMATo is not parameter-free per se, we provide default choices for its parameters that make it into an out-of-the-box clustering algorithm that performs well across the board. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that AuToMATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the Mapper algorithm, where it is desirable to work with a clustering algorithm that does not need tuning of its parameters. Indeed, we provide evidence that AuToMATo performs well when used with Mapper. Finally, we provide an open-source implementation of AuToMATo in Python that is fully compatible with the standard scikit-learn architecture.