π€ AI Summary
This work addresses the underutilized computational potential of time series foundation models (TSFMs) during inference, where standard sampling strategies often fail to adhere to scaling laws and suffer from insufficient exploration. The authors propose a parameter-free, diversity-aware inference augmentation mechanism that enhances predictive performance under a fixed computational budget by expanding the support of the generation distribution through tailored time series perturbations. A theoretical analysis elucidates the trade-off between diversity and fidelity, yielding a critical sample threshold that guarantees superior performance over standard sampling. To assess the upper-bound performance, the study introduces a RobustMSE metric. Extensive experiments across multiple TSFMs and datasets demonstrate the methodβs effectiveness, significantly improving inference quality and establishing inference design as a crucial dimension for efficiently optimizing TSFMs.
π Abstract
The advancement of Time Series Foundation Models (TSFMs) has been driven primarily by large-scale pre-training, but inference-time compute potential remains largely untapped. This work systematically investigates two questions: how do TSFMs behave under standard sampling-based inference scaling, and can controlled sampling diversity enhance performance? We first examine the properties of TSFMs under standard sampling often fail to adhere to scaling laws due to insufficient exploration of the solution space. Building on this, we then delve into diversified inference scaling via tailored time series perturbations to expand the generative distribution's support. We theoretically analyze the diversity-fidelity trade-off and derive a critical sample threshold for diversified sampling to outperform standard sampling. Extensive experiments across various TSFMs and datasets show proper diversified inference scaling yields substantial performance gains without parameter updates, establishing inference design as a critical, compute-efficient dimension of TSFM optimization. As an application, we propose RobustMSE, a rigorous metric to quantify the headroom performance of TSFM under a fixed budget. Overall, our findings clarify these factor interactions, enabling reliable performance via diverse large-scale inference time series in parallel environments without re-training TSFMs.