🤖 AI Summary
This work proposes ToMAToMP, the first topological clustering method capable of handling multi-function inputs, addressing key limitations of the classical ToMATo algorithm. Traditional ToMATo relies on manual graph construction parameters, is sensitive to outliers, and supports only a single scalar function, rendering it unsuitable for multi-parameter settings. In contrast, ToMAToMP leverages multi-parameter persistent homology theory and employs an MMA decomposition to construct a parameter-free clustering pipeline that eliminates dependence on graph tuning. While preserving the robustness inherent in topological data analysis, ToMAToMP effectively overcomes ToMATo’s three main bottlenecks: graph dependency, outlier sensitivity, and single-function restriction. Experimental results demonstrate that ToMAToMP significantly outperforms both topological and non-topological clustering baselines across multiple datasets, achieving superior performance in both clustering quality and computational efficiency.
📝 Abstract
Topological clustering, and its main algorithm ToMATo, is a clustering method from Topological Data Analysis (TDA) which has been applied successfully in several applications during the last few years. This is due to its high versatility, as clusters are detected from the persistent components in the sublevel sets of any user-defined function (gene expression, pixel values, etc), and efficiency, as topological clustering enjoys robustness guarantees. However, ToMATo is also limited in several ways. First, a graph on the data points needs to be provided as a hyper-parameter of the method (whose fine-tuning is left to the user). Second, ToMATo is known to be very sensitive to outlier values in the function range. Finally, and most importantly, ToMATo can only handle one function at a time, whereas it is critical to use several functions in various applications. In this article, we introduce ToMAToMP: the first topological clustering method able to handle several functions at the same time with theoretical guarantees. More specifically, we leverage a recent tool from multi-parameter persistent homology, called MMA decomposition, to design our clustering algorithm, and prove that it enjoys robustness properties. As corollaries, we show that it can be used to make ToMATo independent of graph tuning, and robust to outliers. Finally, we provide a set of numerical experiments showcasing the efficiency and quality of the clusterings produced by ToMAToMP, by showing strong improvement over non-topological and topological baselines for various datasets.