Distributed Discrete Morse Sandwich: Efficient Computation of Persistence Diagrams for Massive Scalar Data

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of efficiently computing persistence diagrams for large-scale 3D scalar fields using discrete Morse–Sandwich (DMS) algorithms in distributed-memory environments, this paper introduces the first end-to-end distributed DMS framework. Methodologically, we design a self-correcting distributed pairing algorithm, reconstruct a distributed data structure enabling cross-node gradient propagation, and integrate a computation token mechanism with dedicated communication threads to overlap communication and computation under an MPI-plus-multithreading hybrid programming model. Our contribution is the first scalable multi-node DMS implementation, breaking prior scalability bottlenecks and enabling persistent homology analysis on 3D scalar fields with up to 10 billion cells. Experiments demonstrate excellent strong and weak scaling up to 512 cores; our method achieves an average 8× speedup over DIPHA; and it computes the full persistence diagram for a 6-billion-vertex dataset in just 174 seconds.

Technology Category

Application Category

📝 Abstract
The persistence diagram, which describes the topological features of a dataset, is a key descriptor in Topological Data Analysis. The"Discrete Morse Sandwich"(DMS) method has been reported to be the most efficient algorithm for computing persistence diagrams of 3D scalar fields on a single node, using shared-memory parallelism. In this work, we extend DMS to distributed-memory parallelism for the efficient and scalable computation of persistence diagrams for massive datasets across multiple compute nodes. On the one hand, we can leverage the embarrassingly parallel procedure of the first and most time-consuming step of DMS (namely the discrete gradient computation). On the other hand, the efficient distributed computations of the subsequent DMS steps are much more challenging. To address this, we have extensively revised the DMS routines by contributing a new self-correcting distributed pairing algorithm, redesigning key data structures and introducing computation tokens to coordinate distributed computations. We have also introduced a dedicated communication thread to overlap communication and computation. Detailed performance analyses show the scalability of our hybrid MPI+thread approach for strong and weak scaling using up to 16 nodes of 32 cores (512 cores total). Our algorithm outperforms DIPHA, a reference method for the distributed computation of persistence diagrams, with an average speedup of x8 on 512 cores. We show the practical capabilities of our approach by computing the persistence diagram of a public 3D scalar field of 6 billion vertices in 174 seconds on 512 cores. Finally, we provide a usage example of our open-source implementation at https://github.com/eve-le-guillou/DDMS-example.
Problem

Research questions and friction points this paper is trying to address.

Extend DMS to distributed-memory parallelism for massive datasets
Develop efficient distributed computations for DMS steps
Outperform DIPHA in distributed persistence diagram computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends DMS to distributed-memory parallelism
Introduces self-correcting distributed pairing algorithm
Uses hybrid MPI+thread approach for scalability
🔎 Similar Papers
No similar papers found.
E
Eve Le Guillou
CNRS, Sorbonne Université and University of Lille
P
Pierre Fortin
Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
Julien Tierny
Julien Tierny
French National Centre for Scientific Research, Sorbonne University
Topological Data AnalysisVisualizationComputational TopologyVisual Data ScienceUncertainty