🤖 AI Summary
In unsupervised anomaly detection on high-dimensional data, dimensionality reduction often obscures the geometric distribution of anomalies, hindering discriminative capability.
Method: We propose a “manifold-in/manifold-out” binary anomaly classification framework. Innovatively adopting a manifold-aware paradigm, we characterize the intrinsic geometric nature of anomalies on the learned low-dimensional manifold and design a manifold-aligned multi-method fusion strategy that synergistically integrates models such as Isolation Forest.
Contribution/Results: Our approach maintains precision while significantly improving recall—achieving a 16% gain over the best-performing single model (Isolation Forest) on MNIST. Extensive experiments demonstrate strong generalization and robustness in real-world high-dimensional scenarios. By explicitly leveraging manifold geometry, our work establishes a novel geometric-structure-driven paradigm for anomaly detection.
📝 Abstract
Unsupervised machine learning methods are well suited to searching for anomalies at scale but can struggle with the high-dimensional representation of many modern datasets, hence dimensionality reduction (DR) is often performed first. In this paper we analyse unsupervised anomaly detection (AD) from the perspective of the manifold created in DR. We present an idealised illustration,"Finding Pegasus", and a novel formal framework with which we categorise AD methods and their results into"on manifold"and"off manifold". We define these terms and show how they differ. We then use this insight to develop an approach of combining AD methods which significantly boosts AD recall without sacrificing precision in situations employing high DR. When tested on MNIST data, our approach of combining AD methods improves recall by as much as 16 percent compared with simply combining with the best standalone AD method (Isolation Forest), a result which shows great promise for its application to real-world data.