🤖 AI Summary
The core challenge in multi-view anomaly detection lies in consistently modeling local neighborhoods of normal instances across views: existing methods rely on learned fusion strategies, which struggle with inter-view density heterogeneity and incur prohibitive O(N²) time complexity, severely limiting scalability. This paper proposes a learning-free approach for direct, consistent neighborhood representation, constructing adaptive spherical neighborhoods via joint multi-view instance embedding—thereby explicitly aligning neighborhood structures of identical samples across views. The method is both data-adaptive (accommodating sparse and dense regions) and computationally efficient, achieving linear time complexity O(N) without explicit optimization. Evaluated on multiple large-scale benchmarks, it significantly improves detection accuracy while accelerating runtime by several orders of magnitude, effectively resolving the long-standing trade-off between accuracy and scalability.
📝 Abstract
The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local neighborhood in each view independently, and then capture the consistent neighbors across all views via a learning process. They suffer from two key issues. First, there is no guarantee that they can capture consistent neighbors well, especially when the same neighbors are in regions of varied densities in different views, resulting in inferior detection accuracy. Second, the learning process has a high computational cost of $mathcal{O}(N^2)$, rendering them inapplicable for large datasets. To address these issues, we propose a novel method termed extbf{S}pherical extbf{C}onsistent extbf{N}eighborhoods extbf{E}nsemble (SCoNE). It has two unique features: (a) the consistent neighborhoods are represented with multi-view instances directly, requiring no intermediate representations as used in existing approaches; and (b) the neighborhoods have data-dependent properties, which lead to large neighborhoods in sparse regions and small neighborhoods in dense regions. The data-dependent properties enable local neighborhoods in different views to be represented well as consistent neighborhoods, without learning. This leads to $mathcal{O}(N)$ time complexity. Empirical evaluations show that SCoNE has superior detection accuracy and runs orders-of-magnitude faster in large datasets than existing approaches.