Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance

📅 2024-09-09
🏛️ International Conference on Pattern Recognition Applications and Methods
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
High pixel-level annotation cost and the lack of semantic diversity and structural controllability in conventional data augmentation pose significant challenges in semantic segmentation. To address these, this paper proposes a generative data augmentation method based on controllable diffusion models. Our key contributions are: (1) the novel introduction of Class-Prompt Appending and Visual Prior Blending mechanisms, which jointly enable class-semantic guidance and faithful preservation of original image structure; and (2) a class-balanced sampling strategy to mitigate long-tail label distribution bias. Extensive experiments on PASCAL VOC demonstrate substantial improvements in segmentation accuracy. The generated samples exhibit high semantic consistency, sharp structural fidelity, and balanced class distribution. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Data augmentation is crucial for pixel-wise annotation tasks like semantic segmentation, where labeling requires significant effort and intensive labor. Traditional methods, involving simple transformations such as rotations and flips, create new images but often lack diversity along key semantic dimensions and fail to alter high-level semantic properties. To address this issue, generative models have emerged as an effective solution for augmenting data by generating synthetic images. Controllable Generative models offer data augmentation methods for semantic segmentation tasks by using prompts and visual references from the original image. However, these models face challenges in generating synthetic images that accurately reflect the content and structure of the original image due to difficulties in creating effective prompts and visual references. In this work, we introduce an effective data augmentation pipeline for semantic segmentation using Controllable Diffusion model. Our proposed method includes efficient prompt generation using extit{Class-Prompt Appending} and extit{Visual Prior Blending} to enhance attention to labeled classes in real images, allowing the pipeline to generate a precise number of augmented images while preserving the structure of segmentation-labeled classes. In addition, we implement a extit{class balancing algorithm} to ensure a balanced training dataset when merging the synthetic and original images. Evaluation on PASCAL VOC datasets, our pipeline demonstrates its effectiveness in generating high-quality synthetic images for semantic segmentation. Our code is available at href{https://github.com/chequanghuy/Enhanced-Generative-Data-Augmentation-for-Semantic-Segmentation-via-Stronger-Guidance}{this https URL}.
Problem

Research questions and friction points this paper is trying to address.

Improving data diversity in semantic segmentation augmentation
Enhancing generative model accuracy for original image content
Balancing class distribution in synthetic and real datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Controllable Diffusion model for data augmentation
Class-Prompt Appending and Visual Prior Blending
Class balancing algorithm for dataset equilibrium
🔎 Similar Papers
No similar papers found.
Q
Quang-Huy Che
University of Information Technology, Ho Chi Minh City, Vietnam, Vietnam National University, Ho Chi Minh City, Vietnam
D
Duc-Tri Le
University of Information Technology, Ho Chi Minh City, Vietnam, Vietnam National University, Ho Chi Minh City, Vietnam
Vinh-Tiep Nguyen
Vinh-Tiep Nguyen
University of Information Technology, VNU-HCMC
Deep learningComputer VisionInformation Retrieval