🤖 AI Summary
This work addresses the lack of a systematic mathematical understanding of the generalization capabilities of graph neural networks (GNNs). It presents the first unified synthesis and comparison of three major theoretical frameworks: uniform convergence analyses based on hypothesis class complexity, simplified models in asymptotic regimes such as infinite-width or large-graph limits—including Gaussian processes, neural tangent kernels, and graphon operators—and high-dimensional statistical approaches grounded in random graph models like the contextual stochastic block model. By critically examining the core results, limitations, and open challenges within each framework, the paper establishes a cohesive conceptual foundation for GNN generalization theory, clarifies its underlying mathematical principles, and outlines promising directions for future research.
📝 Abstract
Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited mathematical understanding of the performance of GNNs. We discuss the various perspectives used to study statistical generalisation in GNNs. We identify three broad frameworks. The first approach, rooted in learning theory, relies on uniform convergence bounds and the complexity of the hypothesis class of specific GNN architectures. This approach also builds on the expressivity of GNNs, typically studied through the lens of graph isomorphism tests. The second principle is to simplify the neural architecture by analysing GNNs under the asymptotics of infinitely many parameters or infinite graph size. This approach approximates GNNs using Gaussian processes, neural tangent kernels or graphon neural network operators, which allow studying the generalisation or stability of trained GNNs. The third framework studies GNNs under random graph models, often the contextual stochastic block model, and derives non-asymptotic error rates using tools from high-dimensional statistics. We highlight some key theoretical results and discuss a few limitations and open research questions for each perspective.