🤖 AI Summary
This work investigates the convergence of message-passing graph neural networks (MP-GNNs) to a continuous limit on large-scale random graphs. Addressing the limitation of prior studies—which only establish convergence for mean-normalized aggregation schemes (e.g., adjacency matrix or graph Laplacian propagation)—we develop the first non-asymptotic convergence theory applicable to general aggregation functions, including attention mechanisms, coordinate-wise max-pooling, degree-normalized convolutions, and moment-based statistics—many of which are nonlinear and non-mean-type. Leveraging McDiarmid’s inequality and a generalized operator-theoretic model of random graph operators, we derive high-probability convergence bounds under mild assumptions. Notably, we obtain a novel convergence rate specifically for coordinate-wise maximum aggregation. Our results substantially broaden the theoretical scope of MP-GNN continuous-limit analysis and explicitly characterize how distinct aggregation mechanisms differentially affect convergence behavior.
📝 Abstract
We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing, (degree-normalized) convolutional message passing, or moment-based aggregation message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.