🤖 AI Summary
To address the limitations of conventional Transformers in modeling dynamic, complex inter-variable dependencies and long-range temporal correlations in multivariate time series forecasting, this paper proposes a spectral-domain joint modeling framework. Methodologically, it introduces (1) a novel spectral operator neural network that integrates learnable wavelet transforms with the Koopman operator to jointly encode temporal and inter-variable dynamics in the spectral domain; and (2) a multivariate coherent attention (MVCA) mechanism that quantifies dynamic coupling among variables via spectral coherence—marking the first such use in forecasting. Evaluated on 47 benchmark tasks, our method achieves state-of-the-art (SOTA) performance on 34 tasks, reducing average MAE by 1.1% over the strongest baseline. Moreover, when MVCA is transferred to other architectures, average MAE further decreases by 10.7% on high-difficulty tasks.
📝 Abstract
Multivariable time series forecasting methods can integrate information from exogenous variables, leading to significant prediction accuracy gains. Transformer architecture has been widely applied in various time series forecasting models due to its ability to capture long-range sequential dependencies. However, a na""ive application of transformers often struggles to effectively model complex relationships among variables over time. To mitigate against this, we propose a novel architecture, namely the Spectral Operator Neural Network (Sonnet). Sonnet applies learnable wavelet transformations to the input and incorporates spectral analysis using the Koopman operator. Its predictive skill relies on the Multivariable Coherence Attention (MVCA), an operation that leverages spectral coherence to model variable dependencies. Our empirical analysis shows that Sonnet yields the best performance on $34$ out of $47$ forecasting tasks with an average mean absolute error (MAE) reduction of $1.1%$ against the most competitive baseline (different per task). We further show that MVCA -- when put in place of the na""ive attention used in various deep learning models -- can remedy its deficiencies, reducing MAE by $10.7%$ on average in the most challenging forecasting tasks.