π€ AI Summary
This work addresses the reliance of link prediction in graph machine learning on complex structural priors or memory-intensive embeddings by proposing PENCILβa minimalist approach that leverages only a standard encoder-only Transformer. By applying self-attention over sampled local subgraphs, PENCIL implicitly generalizes diverse heuristic rules and subgraph structures without requiring node features, explicit topological encodings, or node ID embeddings. Experimental results demonstrate that PENCIL outperforms graph neural network methods relying on handcrafted heuristics across multiple benchmark datasets, achieves substantially higher parameter efficiency than ID-embedding-based models, and maintains state-of-the-art performance even in the absence of node features. These findings validate the scalability and effectiveness of pure Transformer architectures for large-scale graph link prediction.
π Abstract
Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies. While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structural heuristics or memory-intensive node embeddings -- approaches that struggle to generalize or scale to massive graphs. Emerging Graph Transformers (GTs) offer a potential alternative but often incur significant overhead due to complex structural encodings, hindering their applications to large-scale link prediction. We challenge these sophisticated paradigms with PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs, retaining the scalability and hardware efficiency of standard Transformers. Through experimental and theoretical analysis, we show that PENCIL extracts richer structural signals than GNNs, implicitly generalizing a broad class of heuristics and subgraph-based expressivity. Empirically, PENCIL outperforms heuristic-informed GNNs and is far more parameter-efficient than ID-embedding--based alternatives, while remaining competitive across diverse benchmarks -- even without node features. Our results challenge the prevailing reliance on complex engineering techniques, demonstrating that simple design choices are potentially sufficient to achieve the same capabilities.