Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mainstream time-series forecasting models rely on the “long-sequence information gain hypothesis,” which often leads to overfitting to noise and redundant fluctuations, thereby impairing critical feature extraction. Method: Challenging this assumption, we empirically discover that moderate truncation of historical inputs improves forecasting accuracy. Grounded in information bottleneck theory, we propose (i) a dynamic adaptive masking loss that suppresses redundant feature learning via gradient-guided regularization, and (ii) a representation consistency constraint that stabilizes the mapping among input, label, and prediction. Our method requires no architectural modifications and is agnostic to underlying time-series models. Contribution/Results: Evaluated across multiple real-world benchmarks, our approach significantly reduces prediction error—achieving an average 12.7% reduction in MAE—while enhancing model robustness and generalization. It establishes a novel paradigm for information filtering in time-series modeling.

Technology Category

Application Category

📝 Abstract
Time series forecasting plays a pivotal role in critical domains such as energy management and financial markets. Although deep learning-based approaches (e.g., MLP, RNN, Transformer) have achieved remarkable progress, the prevailing "long-sequence information gain hypothesis" exhibits inherent limitations. Through systematic experimentation, this study reveals a counterintuitive phenomenon: appropriately truncating historical data can paradoxically enhance prediction accuracy, indicating that existing models learn substantial redundant features (e.g., noise or irrelevant fluctuations) during training, thereby compromising effective signal extraction. Building upon information bottleneck theory, we propose an innovative solution termed Adaptive Masking Loss with Representation Consistency (AMRC), which features two core components: 1) Dynamic masking loss, which adaptively identified highly discriminative temporal segments to guide gradient descent during model training; 2) Representation consistency constraint, which stabilized the mapping relationships among inputs, labels, and predictions. Experimental results demonstrate that AMRC effectively suppresses redundant feature learning while significantly improving model performance. This work not only challenges conventional assumptions in temporal modeling but also provides novel theoretical insights and methodological breakthroughs for developing efficient and robust forecasting models.
Problem

Research questions and friction points this paper is trying to address.

Addressing redundant feature learning in time series prediction
Challenging the long-sequence information gain hypothesis limitations
Improving forecasting accuracy by suppressing noise and irrelevant fluctuations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive masking loss identifies key temporal segments
Representation consistency stabilizes input-output mapping relationships
Dynamic masking suppresses redundant feature learning in training
🔎 Similar Papers
No similar papers found.
R
Renzhao Liang
Beihang University
S
Sizhe Xu
New York University
C
Chenggang Xie
Beihang University
J
Jingru Chen
Peking University
F
Feiyang Ren
New York University
S
Shu Yang
New York University
Takahiro Yabe
Takahiro Yabe
Assistant Professor, New York University
Human mobilityUrban resilienceComputational social scienceComplex systems