Salient Concept-Aware Generative Data Augmentation

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing generative data augmentation methods struggle to simultaneously preserve image fidelity and diversity under joint vision-language prompting, primarily due to entanglement between image representations and non-essential attributes (e.g., background), causing conflicts with text prompts. To address this, we propose a saliency-aware image generation framework that explicitly disentangles and suppresses interfering visual attributes via a novel embedding model. Our method integrates vision-language conditional generation with a personalized disentanglement architecture to enable fine-grained, semantically consistent, and controllable image synthesis. Evaluated on eight fine-grained visual datasets, our approach improves classification accuracy by 0.73% on average under standard settings and by 6.5% under long-tail scenarios. It also significantly enhances vision-language alignment and preserves discriminative class-specific features.

Technology Category

Application Category

📝 Abstract

Recent generative data augmentation methods conditioned on both image and text prompts struggle to balance between fidelity and diversity, as it is challenging to preserve essential image details while aligning with varied text prompts. This challenge arises because representations in the synthesis process often become entangled with non-essential input image attributes such as environmental contexts, creating conflicts with text prompts intended to modify these elements. To address this, we propose a personalized image generation framework that uses a salient concept-aware image embedding model to reduce the influence of irrelevant visual details during the synthesis process, thereby maintaining intuitive alignment between image and text inputs. By generating images that better preserve class-discriminative features with additional controlled variations, our framework effectively enhances the diversity of training datasets and thereby improves the robustness of downstream models. Our approach demonstrates superior performance across eight fine-grained vision datasets, outperforming state-of-the-art augmentation methods with averaged classification accuracy improvements by 0.73% and 6.5% under conventional and long-tail settings, respectively.

Problem

Research questions and friction points this paper is trying to address.

Balancing fidelity and diversity in generative data augmentation

Preserving essential image details while aligning text prompts

Reducing entanglement of non-essential image attributes during synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses salient concept-aware embedding to reduce irrelevant details

Generates images preserving class features with controlled variations

Enhances dataset diversity and downstream model robustness

🔎 Similar Papers

Semantic Augmentation in Images using Language