🤖 AI Summary
This work addresses two key limitations in topological data analysis (TDA): the conceptual and practical disjunction between filtration and Mapper methods, and the restricted modeling capacity of spherical neighborhoods. We propose the *Box Filtration* framework, which replaces conventional spherical neighborhoods with learnable, anisotropic, non-uniformly expanding hyperrectangles—unifying filtration and Mapper construction under a single paradigm. Theoretically, we are the first to introduce learnable hyperrectangles into TDA filtrations, rigorously guaranteeing Gromov–Hausdorff stability and ensuring equivalence of arbitrary-order box intersections—resolving the inconsistency inherent in Vietoris–Rips and Čech complexes. Algorithmically, our approach integrates linear programming for box expansion, dual point/pixel coverage, spatial pixelization, and persistent homology computation. Experiments demonstrate significantly improved topological summary accuracy over VR and DTM filtrations. The time complexity is (O(m|U(0)| log(mnpi) L(q))). An open-source implementation is publicly available.
📝 Abstract
We define a new framework that unifies the filtration and mapper approaches from TDA, and present efficient algorithms to compute it. Termed the box filtration of a PCD, we grow boxes (hyperrectangles) that are not necessarily centered at each point (in place of balls centered at points). We grow the boxes non-uniformly and asymmetrically in different dimensions based on the distribution of points. We present two approaches to handle the boxes: a point cover where each point is assigned its own box at start, and a pixel cover that works with a pixelization of the space of the PCD. Any box cover in either setting automatically gives a mapper of the PCD. We show that the persistence diagrams generated by the box filtration using both point and pixel covers satisfy the classical stability based on the Gromov-Hausdorff distance. Using boxes also implies that the box filtration is identical for pairwise or higher order intersections whereas the VR and Cech filtration are not the same. Growth in each dimension is computed by solving a linear program (LP) that optimizes a cost functional balancing the cost of expansion and benefit of including more points in the box. The box filtration algorithm runs in $O(m|U(0)|log(mnpi)L(q))$ time, where $m$ is number of steps of increments considered for box growth, $|U(0)|$ is the number of boxes in the initial cover ($leq$ number of points), $pi$ is the step length for increasing each box dimension, each LP is solved in $O(L(q))$ time, $n$ is the PCD dimension, and $q = n imes |X|$. We demonstrate through multiple examples that the box filtration can produce more accurate results to summarize the topology of the PCD than VR and distance-to-measure (DTM) filtrations. Software for our implementation is available at https://github.com/pragup/Box-Filteration.