🤖 AI Summary
Existing methods struggle to jointly identify principal communities and learn vertex embeddings from graph data, suffer from weak robustness to label noise, and incur high computational costs. Method: We propose a joint optimization framework that (i) formally defines “principal communities” and introduces a theoretically grounded community importance scoring mechanism, quantifying community salience via joint analysis of the adjacency matrix and noisy labels; and (ii) designs a Bernoulli graph model–driven global encoder that integrates label-conditional density constraints with spectral compression to generate discriminative, low-dimensional embeddings—retaining only dimensions corresponding to principal communities. Results: Experiments on synthetic and real-world graphs demonstrate significant improvements in principal community identification accuracy and embedding separability, superior downstream classification performance, strong robustness to label noise, and efficient scalability to large-scale graphs.
📝 Abstract
In this paper, we introduce the concept of principal communities and propose a principal graph encoder embedding method that concurrently detects these communities and achieves vertex embedding. Given a graph adjacency matrix with vertex labels, the method computes a sample community score for each community, ranking them to measure community importance and estimate a set of principal communities. The method then produces a vertex embedding by retaining only the dimensions corresponding to these principal communities. Theoretically, we define the population version of the encoder embedding and the community score based on a random Bernoulli graph distribution. We prove that the population principal graph encoder embedding preserves the conditional density of the vertex labels and that the population community score successfully distinguishes the principal communities. We conduct a variety of simulations to demonstrate the finite-sample accuracy in detecting ground-truth principal communities, as well as the advantages in embedding visualization and subsequent vertex classification. The method is further applied to a set of real-world graphs, showcasing its numerical advantages, including robustness to label noise and computational scalability.