🤖 AI Summary
Existing approaches for integrating time series into large language models (LLMs) typically perform shallow fusion only at the input layer, leading to rapid degradation of temporal representations in deeper layers and failure in cross-modal alignment and semantic adaptation. To address this, we propose a Multi-layer Editable Embedding Fusion (MEF) framework that dynamically injects semantically rich time-series embeddings—extracted by a pretrained time-series foundation model—into intermediate LLM layers via layer-specific guidance vectors. This enables sustained cross-modal interaction and alignment between textual and temporal modalities throughout the LLM’s depth. MEF effectively mitigates temporal information decay and significantly enhances few-shot cross-modal understanding. Evaluated on seven benchmark datasets, MEF achieves an average 31.8% reduction in mean squared error (MSE) over strong baselines, demonstrating substantial improvements in forecasting accuracy. The implementation is publicly available.
📝 Abstract
Time series (TS) data are ubiquitous across various application areas, rendering time series forecasting (TSF) a fundamental task. With the astounding advances in large language models (LLMs), a variety of methods have been developed to adapt LLMs for time series forecasting. Despite unlocking the potential of LLMs in comprehending TS data, existing methods are inherently constrained by their shallow integration of TS information, wherein LLMs typically access TS representations at shallow layers, primarily at the input layer. This causes the influence of TS representations to progressively fade in deeper layers and eventually leads to ineffective adaptation between textual embeddings and TS representations. In this paper, we propose the Multi-layer Steerable Embedding Fusion (MSEF), a novel framework that enables LLMs to directly access time series patterns at all depths, thereby mitigating the progressive loss of TS information in deeper layers. Specifically, MSEF leverages off-the-shelf time series foundation models to extract semantically rich embeddings, which are fused with intermediate text representations across LLM layers via layer-specific steering vectors. These steering vectors are designed to continuously optimize the alignment between time series and textual modalities and facilitate a layer-specific adaptation mechanism that ensures efficient few-shot learning capabilities. Experimental results on seven benchmarks demonstrate significant performance improvements by MSEF compared with baselines, with an average reduction of 31.8% in terms of MSE. The code is available at https://github.com/One1sAll/MSEF.