🤖 AI Summary
Time-series forecasting demands large-scale data and substantial computational resources, yet dataset distillation is hindered by temporal misalignment—caused by strong autocorrelation—and the absence of diversity-aware priors. To address this, we propose the first lightweight distillation framework tailored for time-series forecasting. Our approach comprises three key innovations: (1) a frequency-domain statistical alignment mechanism that mitigates temporal misalignment and ensures faithful matching between teacher and student models in periodic and trend structures; (2) an information-bottleneck-based inter-sample regularization that explicitly enforces distributional diversity of synthesized trajectories; and (3) a first-order-optimization-compatible gradient-matching strategy enabling efficient and stable training. Evaluated on 20 benchmark datasets, our method achieves an average relative accuracy improvement of 30% while increasing computational overhead by only 2.49%, significantly outperforming existing distillation approaches.
📝 Abstract
Time-series forecasting is fundamental across many domains, yet training accurate models often requires large-scale datasets and substantial computational resources. Dataset distillation offers a promising alternative by synthesizing compact datasets that preserve the learning behavior of full data. However, extending dataset distillation to time-series forecasting is non-trivial due to two fundamental challenges: 1.temporal bias from strong autocorrelation, which leads to distorted value-term alignment between teacher and student models; and 2.insufficient diversity among synthetic samples, arising from the absence of explicit categorical priors to regularize trajectory variety.
In this work, we propose DDTime, a lightweight and plug-in distillation framework built upon first-order condensation decomposition. To tackle Challenge 1, it revisits value-term alignment through temporal statistics and introduces a frequency-domain alignment mechanism to mitigate autocorrelation-induced bias, ensuring spectral consistency and temporal fidelity. To address Challenge 2, we further design an inter-sample regularization inspired by the information bottleneck principle, which enhances diversity and maximizes information density across synthetic trajectories. The combined objective is theoretically compatible with a wide range of condensation paradigms and supports stable first-order optimization. Extensive experiments on 20 benchmark datasets and diverse forecasting architectures demonstrate that DDTime consistently outperforms existing distillation methods, achieving about 30% relative accuracy gains while introducing about 2.49% computational overhead. All code and distilled datasets will be released.