Rethinking the Role of LLMs in Time Series Forecasting

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses ongoing skepticism regarding the effectiveness of large language models (LLMs) in time series forecasting. Through a large-scale empirical evaluation encompassing 8 billion observations across 17 diverse scenarios, the authors propose LLM4TS—a framework integrating pre-alignment and post-alignment strategies, token-level routing analysis, and prompt engineering. Their findings demonstrate that pre-alignment consistently outperforms post-alignment in over 90% of tasks, revealing complementary roles between LLMs’ pre-trained knowledge and architectural design. Moreover, full-scale LLMs significantly enhance performance in mixed-distribution and cross-domain generalization settings, thereby affirming their critical value in time series prediction.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have been introduced to time series forecasting (TSF) to incorporate contextual knowledge beyond numerical signals. However, existing studies question whether LLMs provide genuine benefits, often reporting comparable performance without LLMs. We show that such conclusions stem from limited evaluation settings and do not hold at scale. We conduct a large-scale study of LLM-based TSF (LLM4TSF) across 8 billion observations, 17 forecasting scenarios, 4 horizons, multiple alignment strategies, and both in-domain and out-of-domain settings. Our results demonstrate that \emph{LLM4TS indeed improves forecasting performance}, with especially large gains in cross-domain generalization. Pre-alignment outperforming post-alignment in over 90\% of tasks. Both pretrained knowledge and model architecture of LLMs contribute and play complementary roles: pretraining is critical under distribution shifts, while architecture excels at modeling complex temporal dynamics. Moreover, under large-scale mixed distributions, a fully intact LLM becomes indispensable, as confirmed by token-level routing analysis and prompt-based improvements. Overall, Our findings overturn prior negative assessments, establish clear conditions under which LLMs are not only useful, and provide practical guidance for effective model design. We release our code at https://github.com/EIT-NLP/LLM4TSF.

Problem

Research questions and friction points this paper is trying to address.

large language models

time series forecasting

cross-domain generalization

model evaluation

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM4TSF

time series forecasting

cross-domain generalization