Online Data Augmentation for Forecasting with Deep Learning

📅 2024-04-25
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient training data and distribution imbalance in few-shot multivariate time series forecasting, this paper proposes the first online data augmentation framework embedded within the training iteration process: synthetic samples are generated dynamically and paired with real samples within each mini-batch, thereby avoiding distributional shift and storage overhead associated with offline augmentation. The method unifies three backbone architectures—TCN, LSTM, and Informer—and integrates seven synthesis techniques, including TS-TCC, GAN-based generation, and diffusion-inspired approaches, augmented by an adaptive online sampling scheduling strategy. Evaluated on six benchmark datasets comprising 3,797 time series, the framework achieves an 8.2% reduction in MASE and a 6.7% reduction in MAE compared to both no-augmentation and offline-augmentation baselines, demonstrating significant improvements in prediction accuracy, generalization, and robustness under limited-data regimes.

Technology Category

Application Category

📝 Abstract
Deep learning approaches are increasingly used to tackle forecasting tasks involving datasets with multiple univariate time series. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. Synthetic data generation techniques can be applied in these scenarios to augment the dataset. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Deep Learning
Time Series Prediction
Data Scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Data Augmentation
Real-time Synthetic Data Generation
Small-batch Training Optimization
🔎 Similar Papers
No similar papers found.
V
Vítor Cerqueira
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal; Laboratory for Artificial Intelligence and Computer Science (LIACC), Portugal
Moisés Santos
Moisés Santos
Researcher at FEUP
Synthetic DataResponsible AITime Series
L
Luis Roque
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal; Laboratory for Artificial Intelligence and Computer Science (LIACC), Portugal
Yassine Baghoussi
Yassine Baghoussi
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
C
Carlos Soares
Faculdade de Engenharia da Universidade do Porto, Porto, Portugal; Laboratory for Artificial Intelligence and Computer Science (LIACC), Portugal; Fraunhofer Portugal AICOS, Portugal