🤖 AI Summary
Deep graph neural networks (GNNs) often suffer from expressive degradation and oversmoothing due to repeated message passing, especially on heterophilous graphs. To address this, we propose an adaptive initial residual connection mechanism that dynamically modulates residual strength at the node level, enhancing deep information propagation while mitigating oversmoothing. Theoretically, we establish the first Dirichlet energy lower bound for residual connections with nonlinear activations, rigorously proving their ability to preserve embedding diversity—unifying the analysis for both static and adaptive residual settings. Our method supports both learnable and heuristic residual strength configurations, with theoretical guidance for optimizing time complexity. Extensive experiments demonstrate that our approach significantly outperforms standard and state-of-the-art GNNs across diverse graph benchmarks, particularly on heterophilous graphs. Notably, the heuristic variant achieves performance comparable to the learnable version, offering superior efficiency and practicality.
📝 Abstract
Message passing is the core operation in graph neural networks, where each node updates its embeddings by aggregating information from its neighbors. However, in deep architectures, this process often leads to diminished expressiveness. A popular solution is to use residual connections, where the input from the current (or initial) layer is added to aggregated neighbor information to preserve embeddings across layers. Following a recent line of research, we investigate an adaptive residual scheme in which different nodes have varying residual strengths. We prove that this approach prevents oversmoothing; particularly, we show that the Dirichlet energy of the embeddings remains bounded away from zero. This is the first theoretical guarantee not only for the adaptive setting, but also for static residual connections (where residual strengths are shared across nodes) with activation functions. Furthermore, extensive experiments show that this adaptive approach outperforms standard and state-of-the-art message passing mechanisms, especially on heterophilic graphs. To improve the time complexity of our approach, we introduce a variant in which residual strengths are not learned but instead set heuristically, a choice that performs as well as the learnable version.