🤖 AI Summary
This work addresses the limitations of existing zero-shot anomaly detection methods, which rely on static 2D images and struggle to meet the dynamic, multi-view observation requirements of industrial settings. The authors propose a novel Real-to-Twin paradigm that achieves semantic alignment between real-world observations and geometrically registered CAD-based digital twins, enabling zero-shot anomaly localization without any defect annotations. Built upon the AVATAR framework, the method leverages only defect-free real-CAD paired data to bridge the simulation-to-reality domain gap through semantic alignment, transforming CAD priors into dynamic anomaly-free references. Anomalies are identified as regions exhibiting misalignment between the real input and its synthetic counterpart. Experiments demonstrate that the approach significantly outperforms state-of-the-art methods under drastic viewpoint variations, exhibiting exceptional robustness and zero-shot detection capability.
📝 Abstract
The deployment of zero-shot anomaly detection (AD) in embodied industrial inspection is severely bottlenecked by its reliance on passive, fixed-viewpoint 2D imagery. Such formulations inherently fail to accommodate the active, dynamic observations required in real-world environments. To break this limitation, we introduce Real-to-Twin Anomaly Detection, a novel task that evaluates physical observations directly against geometrically matched CAD Digital Twins. To tackle this new task, we propose AVATAR, a framework designed to learn robust semantic alignment between Real and Digital Twins. By bridging benign Sim2Real domain gaps using only defect-free pairs, AVATAR effectively transforms CAD priors into dynamic, anomaly-free references. This elegant formulation enables the model to localize diverse anomalies in a zero-shot manner as unalignable deviations, eliminating the need for defect annotations. Extensive experiments demonstrate that AVATAR substantially outperforms adapted state-of-the-art baselines, exhibiting exceptional robustness to severe viewpoint variations. The code and dataset will be made publicly available.