Implicit Regularization of Mini-Batch Training in Graph Neural Networks

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the performance degradation and training instability commonly observed in mini-batch training of graph neural networks (GNNs), which often arise from subgraph sampling that disrupts topological structure and introduces boundary effects. Through backward error analysis, the authors reveal that mini-batch stochastic gradient descent on graphs implicitly optimizes an objective containing a regularization term proportional to gradient variance. While random node sampling disregards local structural information, it yields an expected loss closer to the full-graph loss and exhibits lower gradient variance. The study provides the first theoretical interpretation of graph sampler selection as an implicit regularization mechanism and rigorously demonstrates that random node sampling constitutes an efficient, scalable, and high-performing GNN training strategy. Across ten benchmark datasets, this approach matches or exceeds full-graph training performance on eight, with reduced time and memory overhead.

📝 Abstract

Mini-batch training of Graph Neural Networks (GNNs) is fundamentally different from training on i.i.d. data: sampling a subgraph alters the topology and introduces boundary effects, leading prior work to develop structure-aware samplers that preserve local connectivity and reduce embedding variance. Surprisingly, we demonstrate that the simplest possible scheme, Random Node Sampling (RNS), training on the induced subgraph of uniformly sampled nodes, matches or outperforms full-graph training on 8 of 10 datasets at a fraction of the wall-clock time and memory. To explain this, we apply backward error analysis to graph mini-batch Stochastic Gradient Descent (SGD) and show that it implicitly minimizes the sampled loss plus a regularizer proportional to the mini-batch gradient variance, a quantity directly shaped by the sampler. Although RNS discards local structure, it produces mini-batches whose expected loss is closer to the full-graph loss, and whose per-batch gradients have lower variance, yielding a better implicit objective. Our analysis reframes the choice of graph sampler as a form of implicit regularization, and identifies RNS as a strong, theoretically grounded method for scalable GNN training.

Problem

Research questions and friction points this paper is trying to address.

Graph Neural Networks

Mini-batch Training

Implicit Regularization

Random Node Sampling

Gradient Variance

Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit regularization

graph neural networks

mini-batch training