🤖 AI Summary
This paper challenges the prevailing consensus in graph neural network (GNN) research that over-smoothing is the primary cause of performance degradation in deep GNNs.
Method: The authors systematically decouple the three core operations—aggregation, linear transformation, and nonlinear activation—and conduct theoretical analysis and gradient propagation modeling to isolate their individual effects. They further empirically validate their findings by reproducing and enhancing classical stabilization techniques (e.g., residual connections, normalization).
Contribution/Results: The analysis reveals that performance degradation stems predominantly from vanishing gradients induced by linear transformations and nonlinear activations—not from aggregation-induced over-smoothing; moreover, over-smoothing is not intrinsic to GNNs. Well-designed deep GNNs can achieve stable training and avoid degradation. The work establishes a theoretical foundation and practical architectural paradigm for constructing GNNs with hundreds of layers, thereby redefining the design principles for deep GNNs.
📝 Abstract
Oversmoothing has been recognized as a main obstacle to building deep Graph Neural Networks (GNNs), limiting the performance. This position paper argues that the influence of oversmoothing has been overstated and advocates for a further exploration of deep GNN architectures. Given the three core operations of GNNs, aggregation, linear transformation, and non-linear activation, we show that prior studies have mistakenly confused oversmoothing with the vanishing gradient, caused by transformation and activation rather than aggregation. Our finding challenges prior beliefs about oversmoothing being unique to GNNs. Furthermore, we demonstrate that classical solutions such as skip connections and normalization enable the successful stacking of deep GNN layers without performance degradation. Our results clarify misconceptions about oversmoothing and shed new light on the potential of deep GNNs.