Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods

📅 2024-03-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Manual data augmentation design is labor-intensive and suboptimal, limiting model generalization and robustness. Method: We propose the first systematic framework for AutoML-driven data augmentation, unifying three paradigms—data transformation, ensemble-based augmentation, and synthetic-data generation—via integrated Bayesian optimization, reinforcement learning, and meta-learning. The framework supports end-to-end differentiable search across multimodal domains (images and text). Contribution/Results: We establish a standardized evaluation protocol and empirically demonstrate, on CIFAR-10/100, ImageNet, and NLP benchmarks, an average test accuracy gain of 1.2–2.7% over state-of-the-art hand-crafted augmentations, alongside substantial reduction in manual hyperparameter tuning effort. Further experiments confirm superior generalization across unseen domains and enhanced robustness to distributional shifts and adversarial perturbations.

Technology Category

Application Category

📝 Abstract

Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. Automated data augmentation methods aim to automate the process. State-of-the-art approaches typically rely on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. We present extensive discussion of techniques for realizing each of the major subtasks of the data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.

Problem

Research questions and friction points this paper is trying to address.

Automatic Data Augmentation

Machine Learning Performance

Comparison with Traditional Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoML

Data Augmentation

Performance Comparison

🔎 Similar Papers

A Comprehensive Survey on Data Augmentation