🤖 AI Summary
This paper reveals a nontrivial trade-off between the expressive power and generalization performance of Graph Neural Networks (GNNs): when graph labels are determined by structural features, excessive expressivity harms generalization—unless the training set is sufficiently large or training and test graphs are structurally close.
Method: We introduce, for the first time, a family of pre-metrics quantifying structural similarity among graphs, and establish a theoretical framework linking expressivity, structural distance, model complexity, and generalization error—yielding an interpretable, data-dependent generalization upper bound.
Contribution/Results: Our theory precisely characterizes the generalization cost of increased expressivity under structural assumptions. Empirical validation—via structure-aware label modeling and extensive experiments—confirms that overly expressive GNNs indeed degrade performance under limited samples or structural distribution shift. Theoretical predictions align closely with empirical observations, providing both explanatory insight and practical guidance for GNN design.
📝 Abstract
Graph Neural Networks (GNNs) are powerful tools for learning on structured data, yet the relationship between their expressivity and predictive performance remains unclear. We introduce a family of premetrics that capture different degrees of structural similarity between graphs and relate these similarities to generalization, and consequently, the performance of expressive GNNs. By considering a setting where graph labels are correlated with structural features, we derive generalization bounds that depend on the distance between training and test graphs, model complexity, and training set size. These bounds reveal that more expressive GNNs may generalize worse unless their increased complexity is balanced by a sufficiently large training set or reduced distance between training and test graphs. Our findings relate expressivity and generalization, offering theoretical insights supported by empirical results.