π€ AI Summary
To address representation shift caused by distribution mismatch across multi-source time series data in foundation model pretraining, this paper proposes Prototype-guided Dynamic Normalization (ProtoNorm). ProtoNorm models data distribution priors as learnable prototypes and adaptively selects normalization paths based on sample-prototype similarity, enabling fine-grained distribution alignment. Seamlessly integrated into Transformer architectures via a single-line code replacement, ProtoNorm combines prototype learning, conditional normalization, and multi-task contrastive pretraining. Evaluated on multiple time series classification and forecasting benchmarks, the resulting model significantly outperforms standard pretraining baselines, achieving an average 12.7% improvement in downstream task performance. The approach demonstrates strong robustness and generalization across heterogeneous time series domains, offering both architectural simplicity and computational efficiency without sacrificing expressiveness.
π Abstract
Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware adaptive normalization strategy within the Transformer architecture. Specifically, we replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm), where learned prototypes encapsulate distinct data distributions, and sample-to-prototype affinity determines the appropriate normalization layer. This mechanism effectively captures the heterogeneity of time series characteristics, aligning pretrained representations with downstream tasks. Through comprehensive empirical evaluation, we demonstrate that our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks, while effectively mitigating the adverse effects of distribution shifts during pretraining. Incorporating ProtoNorm is as simple as replacing a single line of code. Extensive experiments on diverse real-world time series benchmarks validate the robustness and generalizability of our approach, advancing the development of more versatile time series foundation models.