Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

In unsupervised anomaly detection on high-dimensional data, dimensionality reduction often obscures the geometric distribution of anomalies, hindering discriminative capability. Method: We propose a “manifold-in/manifold-out” binary anomaly classification framework. Innovatively adopting a manifold-aware paradigm, we characterize the intrinsic geometric nature of anomalies on the learned low-dimensional manifold and design a manifold-aligned multi-method fusion strategy that synergistically integrates models such as Isolation Forest. Contribution/Results: Our approach maintains precision while significantly improving recall—achieving a 16% gain over the best-performing single model (Isolation Forest) on MNIST. Extensive experiments demonstrate strong generalization and robustness in real-world high-dimensional scenarios. By explicitly leveraging manifold geometry, our work establishes a novel geometric-structure-driven paradigm for anomaly detection.

Technology Category

Application Category

📝 Abstract

Unsupervised machine learning methods are well suited to searching for anomalies at scale but can struggle with the high-dimensional representation of many modern datasets, hence dimensionality reduction (DR) is often performed first. In this paper we analyse unsupervised anomaly detection (AD) from the perspective of the manifold created in DR. We present an idealised illustration,"Finding Pegasus", and a novel formal framework with which we categorise AD methods and their results into"on manifold"and"off manifold". We define these terms and show how they differ. We then use this insight to develop an approach of combining AD methods which significantly boosts AD recall without sacrificing precision in situations employing high DR. When tested on MNIST data, our approach of combining AD methods improves recall by as much as 16 percent compared with simply combining with the best standalone AD method (Isolation Forest), a result which shows great promise for its application to real-world data.

Problem

Research questions and friction points this paper is trying to address.

Enhancing unsupervised anomaly detection

Addressing high-dimensional data challenges

Developing a manifold-based approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Manifold-based approach

Combining AD methods

Enhanced recall precision

🔎 Similar Papers

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations