Three-Stage Learning Unlocks Strong Performance in Simple Models for Long-Term Time Series Forecasting

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses how to fully unleash the potential of simple time series models for long-term forecasting without relying on complex architectures. The authors propose STAIR, a three-stage training paradigm that sequentially employs shared temporal mapping, channel-wise fine-tuning, and residual cross-variable learning to overcome the limitations imposed by assumptions of channel independence and strong normalization in simple models. By integrating Shared-to-Individual fine-tuning, alpha-RevIN, and shallow MLP or linear backbones, the method achieves state-of-the-art or competitive performance across nine long-term forecasting benchmarks while maintaining structural simplicity and computational efficiency.
📝 Abstract
Recent studies on long-term time series forecasting have shown that simple linear models and MLP-based predictors can achieve strong performance without increasingly complex architectures. However, many competitive baselines still rely on structural priors such as frequency-domain modeling, explicit decomposition, multi-scale mixing, or sophisticated cross-variable interaction modules, while paying less attention to how simple temporal mappings should be trained and organized. In this paper, we propose STAIR, short for Stagewise Temporal Adaptation via Individualization and Residual Learning, a training paradigm for long-term time series forecasting that aims to unlock the capacity of simple temporal mapping models without introducing complex architectural modules. STAIR decomposes forecasting ability into three progressive stages: it first learns common temporal dynamics across variables through a shared temporal mapping, then adapts the shared model to each variable via channel-wise fine-tuning to capture variable-specific patterns, and finally complements the backbone with cross-variable information through residual learning. We further introduce Shared-to-Individual Fine-tuning and alpha-RevIN to mitigate the limitations of strict channel independence and the overly strong normalization prior induced by standard RevIN. This design gradually increases modeling flexibility while keeping the core temporal predictor as a shallow MLP in the main experiments, with linear variants analyzed separately. Experiments on nine long-term forecasting benchmarks show that STAIR matches or outperforms recent strong baselines while preserving a simple temporal backbone, providing a concise and effective modeling perspective for long-term time series forecasting.
Problem

Research questions and friction points this paper is trying to address.

long-term time series forecasting
simple models
temporal mapping
model training paradigm
forecasting performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

three-stage learning
temporal adaptation
residual learning
channel-wise fine-tuning
long-term forecasting