G-Mapper: Learning a Cover in the Mapper Construction

📅 2023-09-12
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the challenge in Mapper algorithms where the cover parameter requires manual tuning and fails to adapt to intrinsic data structure. We propose a data-adaptive, automated cover optimization method. Our core innovations are: (i) the first integration of G-means clustering with the Anderson–Darling normality test to statistically determine cover interval boundaries; and (ii) the incorporation of Gaussian Mixture Models (GMMs) to guide semantically informed cover splitting. Evaluated on both synthetic and real-world datasets, our method significantly improves structural fidelity and semantic interpretability of Mapper graphs. Moreover, it achieves an order-of-magnitude speedup over iterative baseline approaches. The implementation is publicly available.
📝 Abstract
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice"Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly faster than a previous iterative method.
Problem

Research questions and friction points this paper is trying to address.

Optimizing cover parameter in Mapper construction
Using G-means clustering for cover selection
Enhancing Mapper graph accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes Mapper graph cover
Uses G-means clustering technique
Applies Gaussian mixture model