A Survey on Mixup Augmentations and Beyond

📅 2024-09-08

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Addressing the lack of systematic analysis and theory-practice disconnect in Mixup regularization under label-scarce regimes, this work proposes the first modular unified training framework that refactors data mixing, label interpolation, and stability optimization across Mixup variants. We conduct systematic categorization and empirical evaluation across diverse vision downstream tasks and multimodal modalities—including image, text, and speech. Our analysis rigorously delineates theoretical applicability boundaries (e.g., limitations of the convex combination assumption) and practical bottlenecks (e.g., performance degradation induced by cross-modal heterogeneity). As a key contribution, we introduce the first comprehensive taxonomy of Mixup methods and publicly release Awesome-Mixup—an open-source, curated online repository—providing reusable algorithmic patterns and actionable implementation guidelines for researchers and practitioners.

Technology Category

Application Category

📝 Abstract

As Deep Neural Networks have achieved thrilling breakthroughs in the past decade, data augmentations have garnered increasing attention as regularization techniques when massive labeled data are unavailable. Among existing augmentations, Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted because they yield high performances by generating data-dependent virtual data while easily migrating to various domains. This survey presents a comprehensive review of foundational mixup methods and their applications. We first elaborate on the training pipeline with mixup augmentations as a unified framework containing modules. A reformulated framework could contain various mixup methods and give intuitive operational procedures. Then, we systematically investigate the applications of mixup augmentations on vision downstream tasks, various data modalities, and some analysis &theorems of mixup. Meanwhile, we conclude the current status and limitations of mixup research and point out further work for effective and efficient mixup augmentations. This survey can provide researchers with the current state of the art in mixup methods and provide some insights and guidance roles in the mixup arena. An online project with this survey is available at https://github.com/Westlake-AI/Awesome-Mixup.

Problem

Research questions and friction points this paper is trying to address.

Surveying Mixup and data-mixing methods for data augmentation

Exploring applications of mixup in vision tasks and data modalities

Analyzing current limitations and future directions for mixup research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixup combines samples and labels convexly

Mixup generates data-dependent virtual data

Mixup easily migrates to various domains

🔎 Similar Papers

No similar papers found.