Be Wary of Your Time Series Preprocessing

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

This study investigates the impact of normalization strategies in time series preprocessing on the representational capacity of Transformer models. Focusing on commonly used methods such as Standard and Min-Max normalization, it provides the first theoretical analysis of how these techniques influence the discriminative power of the representation space and introduces a quantitative evaluation framework to assess this capability. Through theoretical bounds and systematic experiments across multiple benchmark datasets—complemented by comparisons between instance-wise normalization and global scaling—the work demonstrates that normalization significantly affects model performance, yet no universally optimal strategy exists. Notably, for certain tasks, omitting normalization altogether yields superior results, revealing that preprocessing choices must be co-designed with task-specific characteristics.

Technology Category

Application Category

📝 Abstract

Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from a theoretical perspective. In this work, we present the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series representation learning. We propose a novel expressivity framework tailored to time series, which quantifies a model's ability to distinguish between similar and dissimilar inputs in the representation space. Using this framework, we derive theoretical bounds for two widely used normalization methods: Standard and Min-Max scaling. Our analysis reveals that the choice of normalization strategy can significantly influence the model's representational capacity, depending on the task and data characteristics. We complement our theory with empirical validation on classification and forecasting benchmarks using multiple Transformer-based models. Our results show that no single normalization method consistently outperforms others, and in some cases, omitting normalization entirely leads to superior performance. These findings highlight the critical role of preprocessing in time series learning and motivate the need for more principled normalization strategies tailored to specific tasks and datasets.

Problem

Research questions and friction points this paper is trying to address.

time series

normalization

scaling

Transformer

expressivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

time series normalization

Transformer expressivity

representation learning