🤖 AI Summary
This paper investigates exact community recovery in the two-community stochastic block model (SBM) with node-attribute side information, unifying Bernoulli and Gaussian edge observation models—encompassing SBM, submatrix localization, and ℤ₂ synchronization as special cases. We propose a low-complexity, non-iterative spectral algorithm that jointly leverages the leading eigenvector of the observed adjacency matrix and a channel model for side information. For general side information, we establish, for the first time, that this spectral method achieves the information-theoretic threshold for exact recovery. Using entrywise eigenvector analysis (Abbe et al., 2020), we show the algorithm is equivalent to an oracle-assisted estimator. Our results hold across the full spectrum of graph sparsity—from sparse to dense regimes—thereby substantially extending both the theoretical limits and practical applicability of spectral methods for community detection.
📝 Abstract
We study the problem of exact community recovery in general, two-community block models, in the presence of node-attributed $side$ $information$. We allow for a very general side information channel for node attributes, and for pairwise (edge) observations, consider both Bernoulli and Gaussian matrix models, capturing the Stochastic Block Model, Submatrix Localization, and $mathbb{Z}_2$-Synchronization as special cases. A recent work of Dreveton et al. 2024 characterized the information-theoretic limit of a very general exact recovery problem with side information. In this paper, we show algorithmic achievability in the above important cases by designing a simple but optimal spectral algorithm that incorporates side information (when present) along with the eigenvectors of the pairwise observation matrix. Using the powerful tool of entrywise eigenvector analysis of Abbe et al. 2020, we show that our spectral algorithm can mimic the so called $genie$-$aided$ $estimators$, where the $i^{mathrm{th}}$ genie-aided estimator optimally computes the estimate of the $i^{mathrm{th}}$ label, when all remaining labels are revealed by a genie. This perspective provides a unified understanding of the optimality of spectral algorithms for various exact recovery problems in a recent line of work.