Backward Oversmoothing: why is it hard to train deep Graph Neural Networks?

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Training deep Graph Neural Networks (GNNs) suffers from severe optimization difficulties, yet their root cause remains poorly understood. Method: This work identifies and formalizes “backward over-smoothing”—a layer-wise linear smoothing of gradients during backpropagation—as the fundamental culprit. Unlike forward smoothing, this phenomenon is independent of nonlinear activations and inherently arises in deep GNNs due to graph convolutional aggregation. We theoretically prove its uniqueness to deep GNNs (absent in MLPs), analyze gradient flow dynamics, establish existence conditions for spurious stationary points (high-loss regions with near-zero gradients), and validate our claims via controlled GNN/MLP comparisons. Contribution/Results: Backward over-smoothing induces vanishing gradients and proliferates pathological stationary points, rendering optimization ill-conditioned. Crucially, we demonstrate that the coupling between forward and backward smoothing—not forward smoothing alone—is the core mechanism behind training failure in deep GNNs, establishing backward over-smoothing as the essential source of their optimization pathology.

Technology Category

Application Category

📝 Abstract

Oversmoothing has long been identified as a major limitation of Graph Neural Networks (GNNs): input node features are smoothed at each layer and converge to a non-informative representation, if the weights of the GNN are sufficiently bounded. This assumption is crucial: if, on the contrary, the weights are sufficiently large, then oversmoothing may not happen. Theoretically, GNN could thus learn to not oversmooth. However it does not really happen in practice, which prompts us to examine oversmoothing from an optimization point of view. In this paper, we analyze backward oversmoothing, that is, the notion that backpropagated errors used to compute gradients are also subject to oversmoothing from output to input. With non-linear activation functions, we outline the key role of the interaction between forward and backward smoothing. Moreover, we show that, due to backward oversmoothing, GNNs provably exhibit many spurious stationary points: as soon as the last layer is trained, the whole GNN is at a stationary point. As a result, we can exhibit regions where gradients are near-zero while the loss stays high. The proof relies on the fact that, unlike forward oversmoothing, backward errors are subjected to a linear oversmoothing even in the presence of non-linear activation function, such that the average of the output error plays a key role. Additionally, we show that this phenomenon is specific to deep GNNs, and exhibit counter-example Multi-Layer Perceptron. This paper is a step toward a more complete comprehension of the optimization landscape specific to GNNs.

Problem

Research questions and friction points this paper is trying to address.

Analyzing backward oversmoothing in deep Graph Neural Networks

Exploring spurious stationary points due to backward oversmoothing

Understanding optimization challenges specific to deep GNNs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes backward oversmoothing in GNNs

Identifies spurious stationary points in training

Explores forward-backward smoothing interaction effects

🔎 Similar Papers

Over-Squashing in Graph Neural Networks: A Comprehensive survey