GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing

πŸ“… 2024-12-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 3
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited generalization of conventional data augmentation methods under large source-target domain discrepancies in domain adaptation, this paper proposes GenMixβ€”a prompt-guided generative data augmentation framework. GenMix leverages diffusion models for controllable image editing, integrating conditional prompt engineering with fractal mask fusion to simultaneously enhance intra-domain discriminability and cross-domain robustness while preserving semantic consistency and label fidelity. Its core innovation lies in the first synergistic integration of prompt-driven controllable editing and fractal mixing for data augmentation. Extensive experiments across eight public benchmarks demonstrate that GenMix consistently outperforms state-of-the-art methods, achieving significant improvements in cross-domain classification, fine-grained recognition, self-supervised pretraining, few-shot learning, and adversarial robustness tasks.

Technology Category

Application Category

πŸ“ Abstract
Data augmentation is widely used to enhance generalization in visual classification tasks. However, traditional methods struggle when source and target domains differ, as in domain adaptation, due to their inability to address domain gaps. This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach that enhances both in-domain and cross-domain image classification. Our technique leverages image editing to generate augmented images based on custom conditional prompts, designed specifically for each problem type. By blending portions of the input image with its edited generative counterpart and incorporating fractal patterns, our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models. Efficacy of our method is established with extensive experiments on eight public datasets for general and fine-grained classification, in both in-domain and cross-domain settings. Additionally, we demonstrate performance improvements for self-supervised learning, learning with data scarcity, and adversarial robustness. As compared to the existing state-of-the-art methods, our technique achieves stronger performance across the board.
Problem

Research questions and friction points this paper is trying to address.

Enhancing image classification across domain gaps
Improving adversarial robustness with generative data augmentation
Addressing data scarcity in visual classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages image editing with custom prompts
Blends input images with generative counterparts
Incorporates fractal patterns to enhance realism
πŸ”Ž Similar Papers
No similar papers found.
K
Khawar Islam
School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
M
Muhammad Zaigham Zaheer
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Arif Mahmood
Arif Mahmood
Professor of Computer Science at Information Technology University (itu.edu.pk)
Machine LearningComputer Vision
Karthik Nandakumar
Karthik Nandakumar
Michigan State University, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Trustworthy machine learningcomputer visionbiometric recognitionapplied cryptography
N
Naveed Akhtar
School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia