Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization

πŸ“… 2025-04-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address representation shift caused by distribution mismatch across multi-source time series data in foundation model pretraining, this paper proposes Prototype-guided Dynamic Normalization (ProtoNorm). ProtoNorm models data distribution priors as learnable prototypes and adaptively selects normalization paths based on sample-prototype similarity, enabling fine-grained distribution alignment. Seamlessly integrated into Transformer architectures via a single-line code replacement, ProtoNorm combines prototype learning, conditional normalization, and multi-task contrastive pretraining. Evaluated on multiple time series classification and forecasting benchmarks, the resulting model significantly outperforms standard pretraining baselines, achieving an average 12.7% improvement in downstream task performance. The approach demonstrates strong robustness and generalization across heterogeneous time series domains, offering both architectural simplicity and computational efficiency without sacrificing expressiveness.

Technology Category

Application Category

πŸ“ Abstract
Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware adaptive normalization strategy within the Transformer architecture. Specifically, we replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm), where learned prototypes encapsulate distinct data distributions, and sample-to-prototype affinity determines the appropriate normalization layer. This mechanism effectively captures the heterogeneity of time series characteristics, aligning pretrained representations with downstream tasks. Through comprehensive empirical evaluation, we demonstrate that our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks, while effectively mitigating the adverse effects of distribution shifts during pretraining. Incorporating ProtoNorm is as simple as replacing a single line of code. Extensive experiments on diverse real-world time series benchmarks validate the robustness and generalizability of our approach, advancing the development of more versatile time series foundation models.
Problem

Research questions and friction points this paper is trying to address.

Addressing data distribution mismatches in time series pretraining
Proposing adaptive normalization for Transformer time series models
Improving robustness against distribution shifts in foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-guided dynamic normalization (ProtoNorm) replaces LayerNorm
Learned prototypes encapsulate distinct data distributions
Sample-to-prototype affinity determines normalization layer
πŸ”Ž Similar Papers
No similar papers found.
P
Peiliang Gong
Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Emadeldeen Eldele
Emadeldeen Eldele
Assistant Professor, Khalifa University
time seriesself-supervised learningdeep learningdomain adaptationEEG
Min Wu
Min Wu
Professor, IEEE Fellow, China University of Geosciences
Process controlRobust controlIntelligent systems
Z
Zhenghua Chen
Institute for Infocomm Research (I2R) and the Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A βˆ—STAR), Singapore
X
Xiaoli Li
Institute for Infocomm Research (I2R) and the Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A βˆ—STAR), Singapore
Daoqiang Zhang
Daoqiang Zhang
Nanjing University of Aeronautics and Astronautics
Machine learningpattern recognitionmedical image analysisdata mining