🤖 AI Summary
Temporal foundation models (TSFMs) often suffer from overfitting and fail to fully harness multi-scale forecasting capabilities during downstream fine-tuning.
Method: This paper proposes MSFT, a causality-driven multi-scale fine-tuning framework. It introduces causal analysis to TSFM fine-tuning for the first time, uncovering confounding bias induced by single-scale training. MSFT employs a lightweight joint multi-scale optimization paradigm—using multi-scale chunked inputs, shared-weight encoding, and scale-aware loss functions—to enable cross-scale temporal pattern co-learning without altering model architecture or increasing parameters.
Results: Evaluated on encoder-based TSFMs including Moirai, Moment, and UniTS, MSFT significantly outperforms naive fine-tuning, parameter-efficient fine-tuning (PEFT) methods, and state-of-the-art time-series models in both zero-shot transfer and few-shot forecasting tasks. It comprehensively enhances multi-scale modeling efficacy and generalization capability.
📝 Abstract
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance. Given the diverse temporal patterns across sampling scales and the inherent multi-scale forecasting capabilities of TSFMs, we adopt a causal perspective to analyze finetuning process, through which we highlight the critical importance of explicitly modeling multiple scales and reveal the shortcomings of naive approaches. Focusing on extit{encoder-based} TSFMs, we propose extbf{M}ulti extbf{ extsc{s}}cale extbf{ extsc{f}}ine extbf{ extsc{t}}uning ( extbf{MSFT}), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. Experimental results on three different backbones (moirai, moment and units) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods.