Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment

πŸ“… 2025-11-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Unsupervised multi-view visual anomaly detection faces a core challenge: distinguishing genuine defects from benign appearance variations induced by viewpoint changes. To address this, we propose ViewSense-ADβ€”the first framework to incorporate geometric consistency modeling into this task. It introduces a homography-driven Multi-View Alignment Module (MVAM) for precise cross-view feature calibration; a View-aligned Latent Diffusion Model (VALDM) that progressively aligns multi-view representations in the latent space; and a lightweight Fusion Refinement Module (FRM) synergized with a memory bank, enabling anomaly discrimination via multi-level normal prototypes. Evaluated on RealIAD and MANTA benchmarks, ViewSense-AD achieves state-of-the-art performance, significantly reducing false positive rates while demonstrating strong robustness to large viewpoint variations and complex textures.

Technology Category

Application Category

πŸ“ Abstract
Unsupervised visual anomaly detection from multi-view images presents a significant challenge: distinguishing genuine defects from benign appearance variations caused by viewpoint changes. Existing methods, often designed for single-view inputs, treat multiple views as a disconnected set of images, leading to inconsistent feature representations and a high false-positive rate. To address this, we introduce ViewSense-AD (VSAD), a novel framework that learns viewpoint-invariant representations by explicitly modeling geometric consistency across views. At its core is our Multi-View Alignment Module (MVAM), which leverages homography to project and align corresponding feature regions between neighboring views. We integrate MVAM into a View-Align Latent Diffusion Model (VALDM), enabling progressive and multi-stage alignment during the denoising process. This allows the model to build a coherent and holistic understanding of the object's surface from coarse to fine scales. Furthermore, a lightweight Fusion Refiner Module (FRM) enhances the global consistency of the aligned features, suppressing noise and improving discriminative power. Anomaly detection is performed by comparing multi-level features from the diffusion model against a learned memory bank of normal prototypes. Extensive experiments on the challenging RealIAD and MANTA datasets demonstrate that VSAD sets a new state-of-the-art, significantly outperforming existing methods in pixel, view, and sample-level visual anomaly proving its robustness to large viewpoint shifts and complex textures.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing genuine defects from benign appearance variations caused by viewpoint changes
Addressing inconsistent feature representations in multi-view anomaly detection
Reducing false-positive rates in unsupervised visual anomaly detection systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Homography-guided multi-view feature alignment
View-Align Latent Diffusion Model for progressive alignment
Fusion Refiner Module for global consistency enhancement
πŸ”Ž Similar Papers
No similar papers found.
X
Xintao Chen
ShanghaiTech University
Xiaohao Xu
Xiaohao Xu
Google; University of Michigan, Ann Arbor
Robust Visual IntelligenceAnomaly DetectionVideo&3DComputer VisionRobotics
B
Bozhong Zheng
ShanghaiTech University
Y
Yun Liu
ShanghaiTech University
Y
Yingna Wu
ShanghaiTech University