🤖 AI Summary
This work addresses the fundamental problem of structural information loss in network embedding. We systematically characterize the encoding capacity boundary of embeddings with respect to graph generative models—identifying conditions under which the original graph structure can be fully recovered, partially recovered, or irrecoverable. We propose a theoretical framework grounded in reversibility analysis and equivalence-class decomposition, revealing that non-reversible embeddings preserve community-level equivalence classes while discarding edge-density information, thereby amplifying structural bias in link prediction. Furthermore, we prove an inherent limitation of embedding-only link prediction: under specific topological conditions, it disproportionately generates spurious edges—potentially exacerbating or mitigating existing biases. Empirical evaluations confirm the framework’s explanatory and predictive power for performance boundaries in both community detection and link prediction.
📝 Abstract
We analyze a simple algorithm for network embedding, explicitly characterizing conditions under which the learned representation encodes the graph's generative model fully, partially, or not at all. In cases where the embedding loses some information (i.e., is not invertible), we describe the equivalence classes of graphons that map to the same embedding, finding that these classes preserve community structure but lose substantial density information. Finally, we show implications for community detection and link prediction. Our results suggest strong limitations on the effectiveness of link prediction based on embeddings alone, and we show common conditions under which naive link prediction adds edges in a disproportionate manner that can either mitigate or exacerbate structural biases.