🤖 AI Summary
This paper addresses zero-shot time series forecasting—predicting future values without fine-tuning, using only frozen large language models (LLMs)—by bridging the representational misalignment between raw numerical sequences and LLMs’ pretrained textual knowledge. Methodologically, we propose: (1) a time-semantics-aware numerical tokenization scheme that converts sequences into semantically grounded text tokens; (2) the first inference-time Gaussian noise injection strategy, serving as a non-intrusive data augmentation to enhance robust modeling of intrinsic temporal structure; and (3) two novel, pretraining-unseen benchmark datasets, accompanied by theoretically grounded interpretability analysis. Empirical evaluation across diverse multivariate time series benchmarks demonstrates substantial gains over existing zero-shot approaches. Ablation and interpretability studies confirm that noise injection effectively mitigates spurious numerical surface-level biases, thereby improving generalization to unseen temporal patterns.
📝 Abstract
Large Language Models (LLMs) have demonstrated effectiveness as zero-shot time series (TS) forecasters. The key challenge lies in tokenizing TS data into textual representations that align with LLMs' pre-trained knowledge. While existing work often relies on fine-tuning specialized modules to bridge this gap, a distinct, yet challenging, paradigm aims to leverage truly off-the-shelf LLMs without any fine-tuning whatsoever, relying solely on strategic tokenization of numerical sequences. The performance of these fully frozen models is acutely sensitive to the textual representation of the input data, as their parameters cannot adapt to distribution shifts. In this paper, we introduce a simple yet highly effective strategy to overcome this brittleness: injecting noise into the raw time series before tokenization. This non-invasive intervention acts as a form of inference-time augmentation, compelling the frozen LLM to extrapolate based on robust underlying temporal patterns rather than superficial numerical artifacts. We theoretically analyze this phenomenon and empirically validate its effectiveness across diverse benchmarks. Notably, to fully eliminate potential biases from data contamination during LLM pre-training, we introduce two novel TS datasets that fall outside all utilized LLMs' pre-training scopes, and consistently observe improved performance. This study provides a further step in directly leveraging off-the-shelf LLMs for time series forecasting.