π€ AI Summary
This work addresses the performance degradation and training instability commonly observed in mini-batch training of graph neural networks (GNNs), which often arise from subgraph sampling that disrupts topological structure and introduces boundary effects. Through backward error analysis, the authors reveal that mini-batch stochastic gradient descent on graphs implicitly optimizes an objective containing a regularization term proportional to gradient variance. While random node sampling disregards local structural information, it yields an expected loss closer to the full-graph loss and exhibits lower gradient variance. The study provides the first theoretical interpretation of graph sampler selection as an implicit regularization mechanism and rigorously demonstrates that random node sampling constitutes an efficient, scalable, and high-performing GNN training strategy. Across ten benchmark datasets, this approach matches or exceeds full-graph training performance on eight, with reduced time and memory overhead.
π Abstract
Mini-batch training of Graph Neural Networks (GNNs) is fundamentally different from training on i.i.d. data: sampling a subgraph alters the topology and introduces boundary effects, leading prior work to develop structure-aware samplers that preserve local connectivity and reduce embedding variance. Surprisingly, we demonstrate that the simplest possible scheme, Random Node Sampling (RNS), training on the induced subgraph of uniformly sampled nodes, matches or outperforms full-graph training on 8 of 10 datasets at a fraction of the wall-clock time and memory. To explain this, we apply backward error analysis to graph mini-batch Stochastic Gradient Descent (SGD) and show that it implicitly minimizes the sampled loss plus a regularizer proportional to the mini-batch gradient variance, a quantity directly shaped by the sampler. Although RNS discards local structure, it produces mini-batches whose expected loss is closer to the full-graph loss, and whose per-batch gradients have lower variance, yielding a better implicit objective. Our analysis reframes the choice of graph sampler as a form of implicit regularization, and identifies RNS as a strong, theoretically grounded method for scalable GNN training.