🤖 AI Summary
This work formally defines and reveals the “spatiotemporal over-compression” problem in spatiotemporal graph neural networks (STGNNs): due to strong coupling between graph topology and time series, information propagation among distant spatiotemporal nodes is impeded, and—counterintuitively—convolutional STGNNs preferentially transmit temporally distant rather than nearby signals. Theoretical analysis and graph signal propagation modeling demonstrate that this phenomenon is pervasive across both joint and sequential spatiotemporal processing paradigms. Extensive validation on synthetic data and real-world benchmarks—including PEMS and METR-LA—confirms the inherent propagation bias of existing models. Our contributions include: (i) the first formal characterization of spatiotemporal over-compression; (ii) theoretical insights grounded in graph signal processing; and (iii) interpretable design principles for robust spatiotemporal modeling, offering a new foundation for principled STGNN architecture development.
📝 Abstract
Graph Neural Networks (GNNs) have achieved remarkable success across various domains. However, recent theoretical advances have identified fundamental limitations in their information propagation capabilities, such as over-squashing, where distant nodes fail to effectively exchange information. While extensively studied in static contexts, this issue remains unexplored in Spatiotemporal GNNs (STGNNs), which process sequences associated with graph nodes. Nonetheless, the temporal dimension amplifies this challenge by increasing the information that must be propagated. In this work, we formalize the spatiotemporal over-squashing problem and demonstrate its distinct characteristics compared to the static case. Our analysis reveals that counterintuitively, convolutional STGNNs favor information propagation from points temporally distant rather than close in time. Moreover, we prove that architectures that follow either time-and-space or time-then-space processing paradigms are equally affected by this phenomenon, providing theoretical justification for computationally efficient implementations. We validate our findings on synthetic and real-world datasets, providing deeper insights into their operational dynamics and principled guidance for more effective designs.