Bridging Distribution Gaps in Time Series Foundation Model Pretraining with Prototype-Guided Normalization

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address representation shift caused by distribution mismatch across multi-source time series data in foundation model pretraining, this paper proposes Prototype-guided Dynamic Normalization (ProtoNorm). ProtoNorm models data distribution priors as learnable prototypes and adaptively selects normalization paths based on sample-prototype similarity, enabling fine-grained distribution alignment. Seamlessly integrated into Transformer architectures via a single-line code replacement, ProtoNorm combines prototype learning, conditional normalization, and multi-task contrastive pretraining. Evaluated on multiple time series classification and forecasting benchmarks, the resulting model significantly outperforms standard pretraining baselines, achieving an average 12.7% improvement in downstream task performance. The approach demonstrates strong robustness and generalization across heterogeneous time series domains, offering both architectural simplicity and computational efficiency without sacrificing expressiveness.

Technology Category

Application Category

📝 Abstract

Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware adaptive normalization strategy within the Transformer architecture. Specifically, we replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm), where learned prototypes encapsulate distinct data distributions, and sample-to-prototype affinity determines the appropriate normalization layer. This mechanism effectively captures the heterogeneity of time series characteristics, aligning pretrained representations with downstream tasks. Through comprehensive empirical evaluation, we demonstrate that our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks, while effectively mitigating the adverse effects of distribution shifts during pretraining. Incorporating ProtoNorm is as simple as replacing a single line of code. Extensive experiments on diverse real-world time series benchmarks validate the robustness and generalizability of our approach, advancing the development of more versatile time series foundation models.

Problem

Research questions and friction points this paper is trying to address.

Addressing data distribution mismatches in time series pretraining

Proposing adaptive normalization for Transformer time series models

Improving robustness against distribution shifts in foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-guided dynamic normalization (ProtoNorm) replaces LayerNorm

Learned prototypes encapsulate distinct data distributions

Sample-to-prototype affinity determines normalization layer

🔎 Similar Papers

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model