Sharp exact recovery threshold for two-community Euclidean random graphs

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the exact recovery of node labels in two-community geometric hidden community models (GHCMs), aiming to characterize the information-theoretic sharp threshold for reliable reconstruction in Euclidean space under transitivity—a property common in real-world networks. We establish, for the first time, a tight information-theoretic threshold for exact recovery in the two-community GHCM. To achieve this, we propose a two-stage linear-time algorithm that does not rely on the “distributional distinguishability” assumption, thereby resolving a long-standing conjecture on threshold achievability. Our method integrates geometric probability analysis, statistical inference, and spatial graph traversal techniques, guaranteeing full information recovery above the threshold. Furthermore, we extend our framework to classical statistical inference problems—including geometric dense subgraph detection and submatrix localization—providing both theoretical foundations and computationally efficient algorithms for structural learning in geometric random graphs.

Technology Category

Application Category

📝 Abstract
This paper considers the problem of label recovery in random graphs and matrices. Motivated by transitive behavior in real-world networks (i.e., ``the friend of my friend is my friend''), a recent line of work considers spatially-embedded networks, which exhibit transitive behavior. In particular, the Geometric Hidden Community Model (GHCM), introduced by Gaudio, Guan, Niu, and Wei, models a network as a labeled Poisson point process where every pair of vertices is associated with a pairwise observation whose distribution depends on the labels and positions of the vertices. The GHCM is in turn a generalization of the Geometric SBM (proposed by Baccelli and Sankararaman). Gaudio et al. provided a threshold below which exact recovery is information-theoretically impossible. Above the threshold, they provided a linear-time algorithm that succeeds in exact recovery under a certain ``distinctness-of-distributions'' assumption, which they conjectured to be unnecessary. In this paper, we partially resolve the conjecture by showing that the threshold is indeed tight for the two-community GHCM. We provide a two-phase, linear-time algorithm that explores the spatial graph in a data-driven manner in Phase I to yield an almost exact labeling, which is refined to achieve exact recovery in Phase II. Our results extend achievability to geometric formulations of well-known inference problems, such as the planted dense subgraph problem and submatrix localization, in which the distinctness-of-distributions assumption does not hold.
Problem

Research questions and friction points this paper is trying to address.

Geometric Hidden Community Model
Label Recovery
Information-Theoretic Limits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage Algorithm
Geometric Hidden Community Model (GHCM)
Label Recovery
🔎 Similar Papers
No similar papers found.
Julia Gaudio
Julia Gaudio
Northwestern University
Discrete ProbabilityRandom GraphsNetwork Inference
C
Charlie K. Guan
Department of Industrial Engineering and Management Sciences, Northwestern University