🤖 AI Summary
In multivariate time series forecasting, local embeddings often degenerate into sequence identifiers, undermining model generalization and transferability. To address this, this paper initiates the first systematic study of embedding regularization. We propose a suite of synergistic regularization strategies—including embedding perturbation (with periodic reset), contrastive constraints, sparsity enforcement, and Dropout variants—to prevent local embeddings from overfitting to individual sequence IDs and instead encourage them to capture transferable temporal patterns. Integrated into mainstream architectures such as Informer and Autoformer, our approach achieves average MAE reductions of 3.2–7.8% across multiple benchmark datasets. Notably, it significantly improves zero-shot and few-shot generalization performance. This work establishes a reusable, empirically validated embedding regularization paradigm for time-series foundation models.
📝 Abstract
In forecasting multiple time series, accounting for the individual features of each sequence can be challenging. To address this, modern deep learning methods for time series analysis combine a shared (global) model with local layers, specific to each time series, often implemented as learnable embeddings. Ideally, these local embeddings should encode meaningful representations of the unique dynamics of each sequence. However, when these are learned end-to-end as parameters of a forecasting model, they may end up acting as mere sequence identifiers. Shared processing blocks may then become reliant on such identifiers, limiting their transferability to new contexts. In this paper, we address this issue by investigating methods to regularize the learning of local learnable embeddings for time series processing. Specifically, we perform the first extensive empirical study on the subject and show how such regularizations consistently improve performance in widely adopted architectures. Furthermore, we show that methods attempting to prevent the co-adaptation of local and global parameters by means of embeddings perturbation are particularly effective in this context. In this regard, we include in the comparison several perturbation-based regularization methods, going as far as periodically resetting the embeddings during training. The obtained results provide an important contribution to understanding the interplay between learnable local parameters and shared processing layers: a key challenge in modern time series processing models and a step toward developing effective foundation models for time series.