Adaptive Local Clustering over Attributed Graphs

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing local clustering methods for attributed graphs suffer from sensitivity to topological noise, neglect of node attributes, and high computational overhead. To address these issues, we propose BDD-Cluster, the first approach that formulates local clustering as bidirectional diffusion distribution (BDD) estimation. Our method introduces an adaptive diffusion mechanism and a three-stage BDD approximation framework—ensuring both theoretical guarantees and strong locality. It jointly integrates graph diffusion, attribute preprocessing, adaptive vector propagation, and multi-hop attribute-aware similarity modeling. Computationally efficient via sparse linear algebra and iterative optimization, BDD-Cluster achieves state-of-the-art performance across eight real-world datasets, outperforming 17 baselines in clustering quality by 12.6%–38.4% and running 23–67× faster than the best existing method. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links that are commonly present in real-world graphs. To address this issue, this paper resorts to leveraging the complementary nature of graph topology and node attributes to enhance local clustering quality. To effectively exploit the attribute information, we first formulate the LGC as an estimation of the bidirectional diffusion distribution (BDD), which is specialized for capturing the multi-hop affinity between nodes in the presence of attributes. Furthermore, we propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets while maintaining strong locality. The core components of LACA include (i) a fast and theoretically-grounded preprocessing technique for node attributes, (ii) an adaptive algorithm for diffusing any vectors over $G$ with rigorous theoretical guarantees and expedited convergence, and (iii) an effective three-step scheme for BDD approximation. Extensive experiments, comparing 17 competitors on 8 real datasets, show that LACA outperforms all competitors in terms of result quality measured against ground truth local clusters, while also being up to orders of magnitude faster. The code is available at https://github.com/HaoranZ99/alac.

Problem

Research questions and friction points this paper is trying to address.

Enhance local clustering quality using graph topology and node attributes

Formulate local graph clustering as bidirectional diffusion distribution estimation

Propose LACA for efficient and effective local clustering with attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages graph topology and node attributes

Formulates LGC as bidirectional diffusion distribution

Proposes LACA with adaptive diffusion and preprocessing

🔎 Similar Papers

No similar papers found.

Authors to Follow