IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the pressing need in industrial quality inspection for a unified framework that jointly achieves defect localization, natural language explanation, and controllable image generation—a challenge unmet by existing methods. We propose a dual-encoder architecture that freezes DINOv2 to extract regional features and injects them via lightweight token fusion into the Qwen3.5-4B vision-language backbone. This enables, for the first time, the simultaneous execution of anomaly segmentation, region-grounded visual understanding, and mask-guided generation within a single model. The core innovation lies in a novel region-grounding mechanism, which substantially improves localization accuracy (+76%), generation fidelity, and interpretability. Extensive experiments on our newly curated Anomaly-56K multitask benchmark and the MMAD dataset demonstrate strong cross-category generalization capabilities.

Technology Category

Application Category

📝 Abstract
Real-world industrial inspection requires not only localizing defects, but also explaining them in natural language and generating controlled defect edits. However, existing approaches fail to jointly support all three capabilities within a unified framework and evaluation protocol. We propose IAD-Unify, a dual-encoder unified framework in which a frozen DINOv2-based region expert supplies precise anomaly evidence to a shared Qwen3.5-4B vision-language backbone via lightweight token injection, jointly enabling anomaly segmentation, region-grounded understanding, and mask-guided generation. To enable unified evaluation, we further construct Anomaly-56K, a comprehensive unified multi-task IAD evaluation platform, spanning 59,916 images across 24 categories and 104 defect variants. Controlled ablations yield four findings: (i) region grounding is the decisive mechanism for understanding, removing it degrades location accuracy by >76 pp; (ii) predicted-region performance closely matches oracle, confirming deployment viability; (iii) region-grounded generation achieves the best full-image fidelity and masked-region perceptual quality; and (iv) pre-initialized joint training improves understanding at negligible generation cost (-0.16 dB). IAD-Unify further achieves strong performance on the MMAD benchmark, including categories unseen during training, demonstrating robust cross-category generalization.
Problem

Research questions and friction points this paper is trying to address.

industrial anomaly segmentation
region-grounded understanding
mask-guided generation
unified framework
anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

region-grounded understanding
unified anomaly segmentation
mask-guided generation
dual-encoder framework
token injection