π€ AI Summary
The applicability boundaries between bottom-up (BU) and top-down (TD) approaches in hierarchical community detection remain poorly understood. Method: We conduct a systematic theoretical analysis of exact recovery of tree-structured communities under the Hierarchical Stochastic Block Model (HSBM). Contribution/Results: We prove that BU agglomerative clustering achieves the information-theoretic optimal recovery thresholdβits recoverability condition is strictly weaker than that of TD methods. Moreover, we identify an intrinsic flaw in TD approaches: susceptibility to dendrogram inversion. Through principled linkage criterion design, refined information-theoretic bound derivation, and rigorous hierarchical structure analysis, we significantly expand the feasible region for exact recovery at intermediate hierarchy levels. Extensive experiments on synthetic benchmarks and real-world networks consistently demonstrate that BU methods outperform TD counterparts in recovery accuracy, robustness, and hierarchical consistency.
π Abstract
Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive ($ extit{top-down}$) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative ($ extit{bottom-up}$) algorithms first identify the smallest community structure and then repeatedly merge the communities using a $ extit{linkage}$ method. In this article, we establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Stochastic Block Model by a bottom-up algorithm. We also establish that this bottom-up algorithm attains the information-theoretic threshold for exact recovery at intermediate levels of the hierarchy. Notably, these recovery conditions are less restrictive compared to those existing for top-down algorithms. This shows that bottom-up algorithms extend the feasible region for achieving exact recovery at intermediate levels. Numerical experiments on both synthetic and real data sets confirm the superiority of bottom-up algorithms over top-down algorithms. We also observe that top-down algorithms can produce dendrograms with inversions. These findings contribute to a better understanding of hierarchical clustering techniques and their applications in network analysis.