DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing methods for long-term multivariate time series forecasting suffer from inadequate multi-scale feature modeling and excessive reliance on strong seasonal priors. Method: This paper proposes a dual-stream adaptive Transformer architecture featuring: (1) an EMA-Fourier hybrid decomposition coupled with RevIN normalization and noise-aware Top-k gating for adaptive trend-seasonal separation; (2) a multi-scale sparse allocation mechanism integrated with a hybrid attention module combining CNN-based local modeling and global self-attention; and (3) a dual-stream CNN-MLP residual framework with expert collaboration loss. Contribution/Results: The method achieves significant improvements over state-of-the-art approaches across nine benchmark datasets, attaining new SOTA performance on multiple metrics. Moreover, it demonstrates superior generalization capability and robustness in cross-dataset transfer tasks.

Technology Category

Application Category

📝 Abstract

Time series forecasting is crucial for various applications, such as weather, traffic, electricity, and energy predictions. Currently, common time series forecasting methods are based on Transformers. However, existing approaches primarily model limited time series or fixed scales, making it more challenging to capture diverse features cross different ranges. Additionally, traditional methods like STL for complex seasonality-trend decomposition require pre-specified seasonal periods and typically handle only single, fixed seasonality. We propose the Hybrid Decomposition Dual-Stream Adaptive Transformer (DSAT-HD), which integrates three key innovations to address the limitations of existing methods: 1) A hybrid decomposition mechanism combining EMA and Fourier decomposition with RevIN normalization, dynamically balancing seasonal and trend components through noise Top-k gating; 2) A multi-scale adaptive pathway leveraging a sparse allocator to route features to four parallel Transformer layers, followed by feature merging via a sparse combiner, enhanced by hybrid attention combining local CNNs and global interactions; 3) A dual-stream residual learning framework where CNN and MLP branches separately process seasonal and trend components, coordinated by a balanced loss function minimizing expert collaboration variance. Extensive experiments on nine datasets demonstrate that DSAT-HD outperforms existing methods overall and achieves state-of-the-art performance on some datasets. Notably, it also exhibits stronger generalization capabilities across various transfer scenarios.

Problem

Research questions and friction points this paper is trying to address.

Captures diverse features across different time series ranges

Handles complex seasonality-trend decomposition without pre-specified periods

Improves multivariate time series forecasting through dual-stream architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid decomposition balances seasonal and trend components

Multi-scale adaptive pathway routes features to Transformers

Dual-stream residual learning processes seasonal and trend separately

🔎 Similar Papers

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting