🤖 AI Summary
Time-series forecasting suffers significant performance degradation under distribution shifts; existing normalization methods (e.g., RevIN) rely on local statistics and assume channel independence, rendering them fragile to missing values, noise, and global distribution changes. To address this, we propose APT—a lightweight, plug-and-play module that enables distribution-aware, globally adaptive normalization by dynamically generating affine parameters guided by timestamps via prototype learning. Its core innovations include: (i) timestamp-embedded prototype clustering to explicitly model distribution evolution over time; (ii) decoupling of inter-channel dependencies; and (iii) full compatibility with arbitrary backbone architectures and normalization strategies. APT incurs negligible computational overhead and requires no additional supervision. Extensive experiments across six benchmark datasets and diverse backbone–normalization combinations demonstrate substantial improvements in both robustness and accuracy—particularly under severe distribution shifts, missing data, and high noise levels.
📝 Abstract
Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to capture global distribution shift. Methods like RevIN and its variants attempt to decouple distribution and pattern but still struggle with missing values, noisy observations, and invalid channel-wise affine transformation. To address these limitations, we propose Affine Prototype Timestamp (APT), a lightweight and flexible plug-in module that injects global distribution features into the normalization-forecasting pipeline. By leveraging timestamp conditioned prototype learning, APT dynamically generates affine parameters that modulate both input and output series, enabling the backbone to learn from self-supervised, distribution-aware clustered instances. APT is compatible with arbitrary forecasting backbones and normalization strategies while introducing minimal computational overhead. Extensive experiments across six benchmark datasets and multiple backbone-normalization combinations demonstrate that APT significantly improves forecasting performance under distribution shift.