🤖 AI Summary
Link prediction in large-scale networks faces significant challenges, including difficulties in modeling global structure, high computational costs, and limited interpretability. This work proposes a novel architecture that integrates the Overlapping Stochastic Block Model (OSBM) with a sparse Graph Transformer, uniquely combining interpretable generative community modeling with efficient graph representation learning. The approach leverages expander-augmented sparse attention, a neural variational encoder, and an OSBM-based edge decoder to achieve structured posterior inference and transparent decision-making while maintaining near-linear time complexity. Experimental results demonstrate that the model achieves an average rank of 1.6 across multiple benchmarks, accelerates training by up to sixfold, and generates semantically coherent community structures.
📝 Abstract
Link prediction is a cornerstone of the Web ecosystem, powering applications from recommendation and search to knowledge graph completion and collaboration forecasting. However, large-scale networks present unique challenges: they contain hundreds of thousands of nodes and edges with heterogeneous and overlapping community structures that evolve over time. Existing approaches face notable limitations: traditional graph neural networks struggle to capture global structural dependencies, while recent graph transformers achieve strong performance but incur quadratic complexity and lack interpretable latent structure. We propose \textbf{TGSBM} (Transformer-Guided Stochastic Block Model), a framework that integrates the principled generative structure of Overlapping Stochastic Block Models with the representational power of sparse Graph Transformers. TGSBM comprises three main components: (i) \emph{expander-augmented sparse attention} that enables near-linear complexity and efficient global mixing, (ii) a \emph{neural variational encoder} that infers structured posteriors over community memberships and strengths, and (iii) a \emph{neural edge decoder} that reconstructs links via OSBM's generative process, preserving interpretability. Experiments across diverse benchmarks demonstrate competitive performance (mean rank 1.6 under HeaRT protocol), superior scalability (up to $6\times$ faster training), and interpretable community structures. These results position TGSBM as a practical approach that strikes a balance between accuracy, efficiency, and transparency for large-scale link prediction.