🤖 AI Summary
To address modeling and evaluation challenges posed by heterogeneous data—characterized by multi-frequency sampling, high dimensionality, and multimodality—in large time-series models (LTSMs), this paper introduces the first unified toolbox and benchmark platform for time-series forecasting. Methodologically, it achieves full-stack decoupling and co-evaluation across preprocessing, tokenization, prompt learning, training paradigms, and data diversity; it further proposes a Transformer-based autoregressive architecture, multi-granularity tokenization, instruction-style prompting, and cross-frequency/dimension adaptation techniques. The core contribution lies in systematically uncovering strong synergistic effects among design choices and identifying an optimal configuration, which significantly improves zero-shot and few-shot generalization performance across multiple standard benchmarks—outperforming both state-of-the-art LTSMs and conventional time-series models.
📝 Abstract
Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired by the success of Large Language Models (LLMs), researchers are now developing Large Time Series Models (LTSMs)-universal transformer-based models that use autoregressive prediction-to improve TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities. However, these design choices are typically studied and evaluated in isolation and are not benchmarked collectively. In this work, we introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs, spanning pre-processing techniques, model configurations, and dataset configuration. It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, training paradigms, base model selection, data quantity, and dataset diversity. Furthermore, we combine the most effective design choices identified in our study. Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods on benchmark datasets.