🤖 AI Summary
This work investigates the approximation and generalization capabilities of Transformers as time-series foundation models, focusing on their ability to automatically fit autoregressive (AR) models in univariate and multivariate settings. Theoretically, we establish the first proof that Transformers trained via gradient descent can exactly recover arbitrary-order multivariate AR processes. Furthermore, leveraging the Dobrushin condition, we derive a pretraining generalization error bound that explicitly accounts for temporal and cross-variable probabilistic dependencies—providing formal theoretical grounding for foundation models such as MOIRAI. Empirically, MOIRAI demonstrates adaptive modeling across varying-scale covariates and achieves both high expressivity and strong generalization in multivariate forecasting. Our key contributions are: (1) the first exact fitting guarantee for multivariate AR modeling with Transformers; (2) the first generalization bound for time-series foundation models incorporating probabilistic dependency structure; and (3) unified theoretical and empirical validation of automatic, data-driven AR modeling.
📝 Abstract
We give a comprehensive analysis of transformers as time series foundation models, focusing on their approximation and generalization capabilities. First, we demonstrate that there exist transformers that fit an autoregressive model on input univariate time series via gradient descent. We then analyze MOIRAI, a multivariate time series foundation model capable of handling an arbitrary number of covariates. We prove that it is capable of automatically fitting autoregressive models with an arbitrary number of covariates, offering insights into its design and empirical success. For generalization, we establish bounds for pretraining when the data satisfies Dobrushin's condition. Experiments support our theoretical findings, highlighting the efficacy of transformers as time series foundation models.