Online Data Augmentation for Forecasting with Deep Learning

📅 2024-04-25

📈 Citations: 3

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address insufficient training data and distribution imbalance in few-shot multivariate time series forecasting, this paper proposes the first online data augmentation framework embedded within the training iteration process: synthetic samples are generated dynamically and paired with real samples within each mini-batch, thereby avoiding distributional shift and storage overhead associated with offline augmentation. The method unifies three backbone architectures—TCN, LSTM, and Informer—and integrates seven synthesis techniques, including TS-TCC, GAN-based generation, and diffusion-inspired approaches, augmented by an adaptive online sampling scheduling strategy. Evaluated on six benchmark datasets comprising 3,797 time series, the framework achieves an 8.2% reduction in MASE and a 6.7% reduction in MAE compared to both no-augmentation and offline-augmentation baselines, demonstrating significant improvements in prediction accuracy, generalization, and robustness under limited-data regimes.

Technology Category

Application Category

📝 Abstract

Deep learning approaches are increasingly used to tackle forecasting tasks involving datasets with multiple univariate time series. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. Synthetic data generation techniques can be applied in these scenarios to augment the dataset. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Deep Learning

Time Series Prediction

Data Scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Data Augmentation

Real-time Synthetic Data Generation

Small-batch Training Optimization

🔎 Similar Papers

Data Augmentation Policy Search for Long-Term Forecasting