Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial anomaly detection suffers from severe scarcity of authentic anomaly samples, leading to suboptimal performance in both localization and classification. To address this, we propose a region-guided few-shot anomaly image-mask pair generation framework built upon a pre-trained latent diffusion model. Our method innovatively integrates local concept decomposition with adaptive multi-round anomaly clustering to enable controllable generation of anomaly types and locations while enhancing semantic consistency. A region-guided mask generation mechanism ensures pixel-level alignment between synthesized anomalies and their corresponding masks. Additionally, we introduce a low-quality sample filtering strategy to improve synthesis reliability. Extensive experiments on MVTec AD and LOCO demonstrate that our generated anomalies exhibit high photorealism and precise localization, consistently outperforming state-of-the-art methods in downstream anomaly localization and classification tasks.

Technology Category

Application Category

📝 Abstract
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples significantly limits the effectiveness of existing methods in tasks such as localization and classification. While several anomaly synthesis approaches have been introduced for data augmentation, they often struggle with low realism, inaccurate mask alignment, and poor generalization. To overcome these limitations, we propose Generate Aligned Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework. GAA leverages the strong priors of a pretrained latent diffusion model to generate realistic, diverse, and semantically aligned anomalies using only a small number of samples. The framework first employs Localized Concept Decomposition to jointly model the semantic features and spatial information of anomalies, enabling flexible control over the type and location of anomalies. It then utilizes Adaptive Multi-Round Anomaly Clustering to perform fine-grained semantic clustering of anomaly concepts, thereby enhancing the consistency of anomaly representations. Subsequently, a region-guided mask generation strategy ensures precise alignment between anomalies and their corresponding masks, while a low-quality sample filtering module is introduced to further improve the overall quality of the generated samples. Extensive experiments on the MVTec AD and LOCO datasets demonstrate that GAA achieves superior performance in both anomaly synthesis quality and downstream tasks such as localization and classification.
Problem

Research questions and friction points this paper is trying to address.

Scarcity of anomaly samples limits industrial inspection effectiveness
Existing anomaly synthesis lacks realism and mask alignment
Need for few-shot generation of realistic aligned anomaly pairs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained latent diffusion model
Uses Localized Concept Decomposition
Applies Adaptive Multi-Round Clustering
🔎 Similar Papers
No similar papers found.
Y
Yilin Lu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China.
Jianghang Lin
Jianghang Lin
Xiamen University
Multimodal Large Language ModelVision-Language ModelSemi/Weakly-Supervised Learning
L
Linhuang Xie
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China.
K
Kai Zhao
VIVO.
Yansong Qu
Yansong Qu
Purdue University-West Lafayette
Intelligent TransportationAutonomous Driving
Shengchuan Zhang
Shengchuan Zhang
Xiamen University
computer visionmachine learning
L
Liujuan Cao
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China.
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China.