🤖 AI Summary
This paper addresses the unified challenge of modeling non-stationary univariate series, strongly coupled multivariate series, and exogenous covariates in multidimensional time series forecasting. We propose Timer-XL—the first causal decoder-only Transformer foundation model designed for general-purpose time series prediction. Our method introduces: (1) a novel multivariate next-token prediction paradigm, naturally extending univariate modeling to long-context, joint multivariate forecasting; (2) TimeAttention—a causally constrained attention mechanism—paired with position encodings that enforce both temporal causality and variable equivalence; and (3) patch-based representation learning combined with large-scale pretraining to enable zero-shot cross-task generalization. Timer-XL achieves state-of-the-art performance across all benchmark categories—including univariate, multivariate, and covariate-augmented forecasting—and provides the first empirical validation of strong generalization capability in pretrained time series foundation models.
📝 Abstract
We present Timer-XL, a causal Transformer for unified time series forecasting. To uniformly predict multidimensional time series, we generalize next token prediction, predominantly adopted for 1D token sequences, to multivariate next token prediction. The paradigm formulates various forecasting tasks as a long-context prediction problem. We opt for decoder-only Transformers that capture causal dependencies from varying-length contexts for unified forecasting, making predictions on non-stationary univariate time series, multivariate series with complicated dynamics and correlations, as well as covariate-informed contexts that include exogenous variables. Technically, we propose a universal TimeAttention to capture fine-grained intra- and inter-series dependencies of flattened time series tokens (patches), which is further enhanced by deft position embedding for temporal causality and variable equivalence. Timer-XL achieves state-of-the-art performance across task-specific forecasting benchmarks through a unified approach. Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance, making it a promising architecture for pre-trained time series models. Code is available at this repository: https://github.com/thuml/Timer-XL.