🤖 AI Summary
This paper addresses the challenges of modeling high-dimensional data and high computational cost in unsupervised anomaly detection. We propose a novel detection method based on a Randomized Principal Component Analysis (RPCA) Forest, which constructs an ensemble of RPCA trees to exploit randomized subspace partitioning and low-rank approximation for efficient anomaly scoring. To further enhance scalability, we integrate approximate nearest neighbor search for rapid anomaly measurement, balancing accuracy and efficiency. Evaluated on 12 standard benchmark datasets, our method achieves an average AUC improvement of 3.2% over traditional PCA, Isolation Forest, and state-of-the-art deep learning methods, with 1.8–5.4× faster training. It also demonstrates superior robustness in low-sample and high-dimensional regimes. Our core contribution is the first integration of RPCA into a forest-based ensemble framework, unifying subspace diversity, computational scalability, and discriminative power in unsupervised anomaly detection.
📝 Abstract
We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Inspired by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for outlier detection. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects it high generalization power and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection.