🤖 AI Summary
This work addresses the challenges of facade defect detection, where geometric variability, complex backgrounds, low contrast, compound defects, and scarce pixel-level annotations severely limit model generalization. To overcome these issues, the authors propose FacadeFixer, a multi-agent collaborative framework that formulates defect perception as a cooperative reasoning task. Dedicated detection and segmentation agents handle diverse defect types, while a generative agent decouples complex defects through semantic recombination and synthesizes them onto varied clean textures to produce high-fidelity augmented data with expert masks. This study presents the first integration of multi-agent collaborative reasoning with generative semantic recombination, effectively alleviating annotation scarcity and enhancing generalization. The authors also introduce the first multi-task dataset covering six facade categories with pixel-level annotations, demonstrating significant performance gains over existing methods—particularly in pixel-level detection of structural anomalies.
📝 Abstract
Building facade defect inspection is fundamental to structural health monitoring and sustainable urban maintenance, yet it remains a formidable challenge due to extreme geometric variability, low contrast against complex backgrounds, and the inherent complexity of composite defects (e.g., cracks co-occurring with spalling). Such characteristics lead to severe pixel imbalance and feature ambiguity, which, coupled with the critical scarcity of high-quality pixel-level annotations, hinder the generalization of existing detection and segmentation models. To address gaps, we propose \textit{FacadeFixer}, a unified multi-agent framework that treats defect perception as a collaborative reasoning task rather than isolated recognition. Specifically,\textit{FacadeFixer} orchestrates specialized agents for detection and segmentation to handle multi-type defect interference, working in tandem with a generative agent to enable semantic recomposition. This process decouples intricate defects from noisy backgrounds and realistically synthesizes them onto diverse clean textures, generating high-fidelity augmented data with precise expert-level masks. To support this, we introduce a comprehensive multi-task dataset covering six primary facade categories with pixel-level annotations. Extensive experiments demonstrate that \textit{FacadeFixer} significantly outperforms state-of-the-art (SOTA) baselines. Specifically, it excels in capturing pixel-level structural anomalies and highlights generative synthesis as a robust solution to data scarcity in infrastructure inspection. Our code and dataset will be made publicly available.