🤖 AI Summary
This work addresses the optimization conflict arising from jointly modeling autoregressive dynamics and cross-dimensional interactions in long-term multivariate time series forecasting, which often hinders model performance. To resolve this issue, the authors propose AltTS, a dual-path framework that explicitly decouples these two modeling objectives for the first time: one path employs a linear autoregressive structure to capture temporal dependencies, while the other leverages a Cross-Relation Self-Attention Transformer to model inter-dimensional relationships. An alternating optimization mechanism is introduced to isolate gradient noise and minimize interference between modules. This design significantly enhances training stability and prediction accuracy, consistently outperforming existing methods—particularly under long-horizon forecasting settings—across multiple benchmark datasets.
📝 Abstract
Multivariate time series forecasting involves two qualitatively distinct factors: (i) stable within-series autoregressive (AR) dynamics, and (ii) intermittent cross-dimension interactions that can become spurious over long horizons. We argue that fitting a single model to capture both effects creates an optimization conflict: the high-variance updates needed for cross-dimension modeling can corrupt the gradients that support autoregression, resulting in brittle training and degraded long-horizon accuracy. To address this, we propose ALTTS, a dual-path framework that explicitly decouples autoregression and cross-relation (CR) modeling. In ALTTS, the AR path is instantiated with a linear predictor, while the CR path uses a Transformer equipped with Cross-Relation Self-Attention (CRSA); the two branches are coordinated via alternating optimization to isolate gradient noise and reduce cross-block interference. Extensive experiments on multiple benchmarks show that ALTTS consistently outperforms prior methods, with the most pronounced improvements on long-horizon forecasting. Overall, our results suggest that carefully designed optimization strategies, rather than ever more complex architectures, can be a key driver of progress in multivariate time series forecasting.