🤖 AI Summary
This paper addresses the exact recovery of node labels in two-community geometric hidden community models (GHCMs), aiming to characterize the information-theoretic sharp threshold for reliable reconstruction in Euclidean space under transitivity—a property common in real-world networks. We establish, for the first time, a tight information-theoretic threshold for exact recovery in the two-community GHCM. To achieve this, we propose a two-stage linear-time algorithm that does not rely on the “distributional distinguishability” assumption, thereby resolving a long-standing conjecture on threshold achievability. Our method integrates geometric probability analysis, statistical inference, and spatial graph traversal techniques, guaranteeing full information recovery above the threshold. Furthermore, we extend our framework to classical statistical inference problems—including geometric dense subgraph detection and submatrix localization—providing both theoretical foundations and computationally efficient algorithms for structural learning in geometric random graphs.
📝 Abstract
This paper considers the problem of label recovery in random graphs and matrices. Motivated by transitive behavior in real-world networks (i.e., ``the friend of my friend is my friend''), a recent line of work considers spatially-embedded networks, which exhibit transitive behavior. In particular, the Geometric Hidden Community Model (GHCM), introduced by Gaudio, Guan, Niu, and Wei, models a network as a labeled Poisson point process where every pair of vertices is associated with a pairwise observation whose distribution depends on the labels and positions of the vertices. The GHCM is in turn a generalization of the Geometric SBM (proposed by Baccelli and Sankararaman). Gaudio et al. provided a threshold below which exact recovery is information-theoretically impossible. Above the threshold, they provided a linear-time algorithm that succeeds in exact recovery under a certain ``distinctness-of-distributions'' assumption, which they conjectured to be unnecessary. In this paper, we partially resolve the conjecture by showing that the threshold is indeed tight for the two-community GHCM. We provide a two-phase, linear-time algorithm that explores the spatial graph in a data-driven manner in Phase I to yield an almost exact labeling, which is refined to achieve exact recovery in Phase II. Our results extend achievability to geometric formulations of well-known inference problems, such as the planted dense subgraph problem and submatrix localization, in which the distinctness-of-distributions assumption does not hold.