Large-Scale Universal Defect Generation: Foundation Models and Datasets

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
Existing defect generation methods suffer from poor generalization, limited realism, and weak category consistency due to their reliance on few-shot learning and the scarcity of large-scale paired data. To address this, this work introduces UDG, a large-scale universal defect dataset comprising 300,000 quadruplets of normal images, anomalous images, masks, and textual descriptions, along with UniDG, a foundational model that enables cross-domain defect generation without category-specific fine-tuning for the first time. UniDG leverages structured dual-branch inputs, a multimodal attention fusion mechanism, an MM-DiT architecture, adaptive defect cropping, and a Defect-Context Editing strategy, trained via a two-stage process—Diversity-SFT and Consistency-RFT—to jointly optimize diversity, photorealism, and reference consistency. Experiments demonstrate that UniDG significantly outperforms existing approaches on MVTec-AD and VisA, achieving state-of-the-art performance in synthetic quality and downstream single- and multi-class anomaly detection and localization tasks.

Technology Category

Application Category

📝 Abstract
Existing defect/anomaly generation methods often rely on few-shot learning, which overfits to specific defect categories due to the lack of large-scale paired defect editing data. This issue is aggravated by substantial variations in defect scale and morphology, resulting in limited generalization, degraded realism, and category consistency. We address these challenges by introducing UDG, a large-scale dataset of 300K normal-abnormal-mask-caption quadruplets spanning diverse domains, and by presenting UniDG, a universal defect generation foundation model that supports both reference-based defect generation and text instruction-based defect editing without per-category fine-tuning. UniDG performs Defect-Context Editing via adaptive defect cropping and structured diptych input format, and fuses reference and target conditions through MM-DiT multimodal attention. A two-stage training strategy, Diversity-SFT followed by Consistency-RFT, further improves diversity while enhancing realism and reference consistency. Extensive experiments on MVTec-AD and VisA show that UniDG outperforms prior few-shot anomaly generation and image insertion/editing baselines in synthesis quality and downstream single- and multi-class anomaly detection/localization. Code will be available at https://github.com/RetoFan233/UniDG.
Problem

Research questions and friction points this paper is trying to address.

defect generation
anomaly synthesis
few-shot learning
generalization
realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
defect generation
multimodal attention
large-scale dataset
reference-based editing