🤖 AI Summary
This work addresses the high computational cost of recomputing community structure from scratch using the Leiden algorithm in large dynamic graphs undergoing frequent vertex and edge updates. Existing incremental approaches lack theoretical guarantees and suffer from poor efficiency. To overcome these limitations, this paper presents the first bounded theoretical analysis for incrementally maintaining Leiden community structures and introduces the Hierarchical Incremental Tree Leiden (HIT-Leiden) algorithm. HIT-Leiden leverages a hierarchical incremental tree structure, connected component maintenance, and localized community updates to drastically reduce the scope of affected vertices. Experimental results across multiple datasets demonstrate that HIT-Leiden achieves up to five orders of magnitude speedup over state-of-the-art methods, substantially improving the efficiency of dynamic community structure maintenance.
📝 Abstract
As a well-known community detection algorithm, Leiden has been widely used in various scenarios such as large language model generation (e.g., Graph-RAG), anomaly detection, and biological analysis. In these scenarios, the graphs are often large and dynamic, where vertices and edges are inserted and deleted frequently, so it is costly to obtain the updated communities by Leiden from scratch when the graph has changed. Recently, one work has attempted to study how to maintain Leiden communities in the dynamic graph, but it lacks a detailed theoretical analysis, and its algorithms are inefficient for large graphs. To address these issues, in this paper, we first theoretically show that the existing algorithms are relatively unbounded via the boundedness analysis (a powerful tool for analyzing incremental algorithms on dynamic graphs), and also analyze the memberships of vertices in communities when the graph changes. Based on theoretical analysis, we develop a novel efficient maintenance algorithm, called Hierarchical Incremental Tree Leiden (HIT-Leiden), which effectively reduces the range of affected vertices by maintaining the connected components and hierarchical community structures. Comprehensive experiments in various datasets demonstrate the superior performance of HIT-Leiden. In particular, it achieves speedups of up to five orders of magnitude over existing methods.