PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms

📅 2024-12-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scalability limitations of spectral clustering in large-scale graph clustering, this paper proposes a parallel multi-scale framework. First, a parallelizable structure-preserving graph coarsening algorithm is designed to generate multiple high-fidelity coarse graphs. Second, spectral clustering is executed in parallel on these coarse graphs. Third, optimal transport is introduced—novelly—to align and fuse the resulting multiple partitions, thereby enhancing consistency and clustering quality. By integrating graph coarsening, parallel computation, and optimal transport theory, the method achieves significant speedup while preserving structural fidelity: it delivers several-fold runtime reduction on both synthetic and real-world datasets, while outperforming existing baselines in NMI and F1 scores. Key contributions are: (1) the first structure-preserving coarsening scheme supporting parallel coarsening; and (2) the first application of optimal transport for multi-partition fusion to improve clustering robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1-We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2-A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3-These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of 1 PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.
Problem

Research questions and friction points this paper is trying to address.

Accelerates clustering for large graphs with many communities
Preserves structural properties during parallel coarsening
Combines partitions using optimal transport for final output
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel coarsening for graph clustering acceleration
Structure-preserving algorithm for efficient graph representation
Optimal transport method for partition alignment and combination
🔎 Similar Papers
No similar papers found.
E
Etienne Lasalle
Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP, UMR 5668, 69342, Lyon cedex 07, France.
Rémi Vaudaine
Rémi Vaudaine
Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP, UMR 5668, 69342, Lyon cedex 07, France.
Titouan Vayer
Titouan Vayer
Inria
optimal transportgraphsinverse problems
Pierre Borgnat
Pierre Borgnat
CNRS, ENS de Lyon, LPENSL, UMR5672, F-69342, Lyon cedex 07, France.
Rémi Gribonval
Rémi Gribonval
Inria & ENS de Lyon
signal processingmachine learningsparsityinverse problemsdimension reduction
Paulo Gonçalves
Paulo Gonçalves
Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP, UMR 5668, 69342, Lyon cedex 07, France.
M
M. Karsai
Department of Network and Data Science, Central European University, 1100 Vienna, Austria; National Laboratory for Health Security, HUN-REN Alfréd Rényi Institute of Mathematics, 1053 Budapest, Hungary.