🤖 AI Summary
To address the scalability limitations of spectral clustering in large-scale graph clustering, this paper proposes a parallel multi-scale framework. First, a parallelizable structure-preserving graph coarsening algorithm is designed to generate multiple high-fidelity coarse graphs. Second, spectral clustering is executed in parallel on these coarse graphs. Third, optimal transport is introduced—novelly—to align and fuse the resulting multiple partitions, thereby enhancing consistency and clustering quality. By integrating graph coarsening, parallel computation, and optimal transport theory, the method achieves significant speedup while preserving structural fidelity: it delivers several-fold runtime reduction on both synthetic and real-world datasets, while outperforming existing baselines in NMI and F1 scores. Key contributions are: (1) the first structure-preserving coarsening scheme supporting parallel coarsening; and (2) the first application of optimal transport for multi-partition fusion to improve clustering robustness and accuracy.
📝 Abstract
Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1-We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2-A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3-These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of 1 PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.