🤖 AI Summary
To address high domain-specific customization costs and poor generalization in multivariate time series forecasting, this paper pioneers the adaptation of pretrained decoder-only large language models (e.g., Llama, Phi) to this non-textual task. We propose a multivariate patching embedding scheme that maps temporal segments into the LLM token space while preserving temporal continuity and inter-variable dynamic dependencies. This constitutes the first LLM-adaptive embedding framework specifically designed for multivariate time series and provides the first empirical validation of cross-modal knowledge transfer from LLMs to time series prediction. Furthermore, we introduce a weight-based diagnostic tool to enhance model interpretability. Evaluated on multiple benchmark datasets, our method achieves prediction accuracy comparable to state-of-the-art specialized models—including Autoformer and Informer—while substantially reducing modeling complexity and domain customization overhead.
📝 Abstract
Pre-trained Large Language Models (LLMs) encapsulate large amounts of knowledge and take enormous amounts of compute to train. We make use of this resource, together with the observation that LLMs are able to transfer knowledge and performance from one domain or even modality to another seemingly-unrelated area, to help with multivariate demand time series forecasting. Attention in transformer-based methods requires something worth attending to -- more than just samples of a time-series. We explore different methods to map multivariate input time series into the LLM token embedding space. In particular, our novel multivariate patching strategy to embed time series features into decoder-only pre-trained Transformers produces results competitive with state-of-the-art time series forecasting models. We also use recently-developed weight-based diagnostics to validate our findings.