On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

📅 2025-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three fundamental limitations of Graph Neural Networks (GNNs)—gradient vanishing, oversmoothing, and overcompression—by identifying their unified dynamical origin: degeneration of gradient flow arising from the implicit modeling of GNNs as unstable recurrent systems. We propose a zero-parameter-overhead state-space reconstruction paradigm, reformulating GNNs as controllable linear dynamical systems. This establishes, for the first time, a theoretical linkage among gradient vanishing, oversmoothing, and overcompression. By integrating graph rewiring with state-space modeling, our approach mitigates representation degradation without introducing additional parameters, enabling stable training of deep GNNs. We theoretically prove the inherent inevitability of gradient vanishing in standard GNN architectures and empirically validate substantial improvements in representation quality and downstream task performance across multiple benchmark datasets.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) are models that leverage the graph structure to transmit information between nodes, typically through the message-passing operation. While widely successful, this approach is well known to suffer from the over-smoothing and over-squashing phenomena, which result in representational collapse as the number of layers increases and insensitivity to the information contained at distant and poorly connected nodes, respectively. In this paper, we present a unified view of these problems through the lens of vanishing gradients, using ideas from linear control theory for our analysis. We propose an interpretation of GNNs as recurrent models and empirically demonstrate that a simple state-space formulation of a GNN effectively alleviates over-smoothing and over-squashing at no extra trainable parameter cost. Further, we show theoretically and empirically that (i) GNNs are by design prone to extreme gradient vanishing even after a few layers; (ii) Over-smoothing is directly related to the mechanism causing vanishing gradients; (iii) Over-squashing is most easily alleviated by a combination of graph rewiring and vanishing gradient mitigation. We believe our work will help bridge the gap between the recurrent and graph neural network literature and will unlock the design of new deep and performant GNNs.
Problem

Research questions and friction points this paper is trying to address.

GNNs suffer from vanishing gradients
Over-smoothing linked to vanishing gradients
Over-squashing mitigated by graph rewiring
Innovation

Methods, ideas, or system contributions that make the work stand out.

State-space formulation alleviates issues
Graph rewiring mitigates over-squashing
Recurrent model interpretation of GNNs
🔎 Similar Papers