Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Unsupervised multi-view visual anomaly detection faces a core challenge: distinguishing genuine defects from benign appearance variations induced by viewpoint changes. To address this, we propose ViewSense-AD—the first framework to incorporate geometric consistency modeling into this task. It introduces a homography-driven Multi-View Alignment Module (MVAM) for precise cross-view feature calibration; a View-aligned Latent Diffusion Model (VALDM) that progressively aligns multi-view representations in the latent space; and a lightweight Fusion Refinement Module (FRM) synergized with a memory bank, enabling anomaly discrimination via multi-level normal prototypes. Evaluated on RealIAD and MANTA benchmarks, ViewSense-AD achieves state-of-the-art performance, significantly reducing false positive rates while demonstrating strong robustness to large viewpoint variations and complex textures.

Technology Category

Application Category

📝 Abstract

Unsupervised visual anomaly detection from multi-view images presents a significant challenge: distinguishing genuine defects from benign appearance variations caused by viewpoint changes. Existing methods, often designed for single-view inputs, treat multiple views as a disconnected set of images, leading to inconsistent feature representations and a high false-positive rate. To address this, we introduce ViewSense-AD (VSAD), a novel framework that learns viewpoint-invariant representations by explicitly modeling geometric consistency across views. At its core is our Multi-View Alignment Module (MVAM), which leverages homography to project and align corresponding feature regions between neighboring views. We integrate MVAM into a View-Align Latent Diffusion Model (VALDM), enabling progressive and multi-stage alignment during the denoising process. This allows the model to build a coherent and holistic understanding of the object's surface from coarse to fine scales. Furthermore, a lightweight Fusion Refiner Module (FRM) enhances the global consistency of the aligned features, suppressing noise and improving discriminative power. Anomaly detection is performed by comparing multi-level features from the diffusion model against a learned memory bank of normal prototypes. Extensive experiments on the challenging RealIAD and MANTA datasets demonstrate that VSAD sets a new state-of-the-art, significantly outperforming existing methods in pixel, view, and sample-level visual anomaly proving its robustness to large viewpoint shifts and complex textures.

Problem

Research questions and friction points this paper is trying to address.

Distinguishing genuine defects from benign appearance variations caused by viewpoint changes

Addressing inconsistent feature representations in multi-view anomaly detection

Reducing false-positive rates in unsupervised visual anomaly detection systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Homography-guided multi-view feature alignment

View-Align Latent Diffusion Model for progressive alignment

Fusion Refiner Module for global consistency enhancement

🔎 Similar Papers

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis