Selective Learning for Deep Time Series Forecasting

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep time-series forecasting is highly susceptible to noise and anomalies, leading to overfitting and degraded generalization. To address this, we propose a selective learning strategy that abandons uniform optimization across all time steps and instead dynamically identifies high-confidence, generalizable steps for training. Our core innovation is a dual-mask mechanism: an uncertainty mask derived from residual entropy and an anomaly mask generated via residual lower-bound estimation—jointly filtering unreliable time steps. The method is modular and seamlessly integrated into mainstream deep models (e.g., Informer, TimesNet, iTransformer), requiring only lightweight components for end-to-end optimization. Extensive evaluation across eight real-world datasets demonstrates substantial improvements in robustness and accuracy: MSE reductions range from 6.5% to 37.4% over state-of-the-art baselines, with Informer achieving the largest gain of 37.4%.

Technology Category

Application Category

📝 Abstract
Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37.4% MSE reduction for Informer, 8.4% for TimesNet, and 6.5% for iTransformer.
Problem

Research questions and friction points this paper is trying to address.

Addresses overfitting in deep time series forecasting models
Filters uncertain and anomalous timesteps during model optimization
Improves forecasting accuracy through selective learning strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective learning strategy screens timesteps for MSE loss
Dual-mask mechanism filters uncertain and anomalous timesteps
Uncertainty and anomaly masks improve deep forecasting performance
🔎 Similar Papers
No similar papers found.
Y
Yisong Fu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Zezhi Shao
Zezhi Shao
Institute of Computing Technology, Chinese Academy of Sciences
Time Series ForecastingSpatial-Temporal Data MiningGraph Data Mining
C
Chengqing Yu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Y
Yujie Li
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Zhulin An
Zhulin An
Institute Of Computing Technology Chinese Academy Of Sciences
Automatic Deep LearningLifelong Learning
Q
Qi Wang
Department of Automation, Tsinghua University
Y
Yongjun Xu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
F
Fei Wang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences