Bagged Regularized k-Distances for Anomaly Detection

📅 2023-12-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised anomaly detection, distance-based methods are highly sensitive to the choice of the nearest-neighbor parameter *k*. This paper proposes an adaptive *k*-distance weighting framework that formulates weight learning as a convex optimization problem—specifically, minimizing a surrogate risk of bagged weighted *k*-distance density estimation—thereby eliminating manual *k* tuning. To our knowledge, this is the first work to cast *k*-distance weighting into a tractable convex formulation while integrating bagging for enhanced robustness and scalability. We theoretically establish that the AUC regret converges at rate *O*(1/√*n*), and demonstrate significantly reduced computational complexity. Experiments on benchmark datasets show that our method is highly robust to *k* selection, outperforms state-of-the-art distance-based approaches in stability, achieves substantial efficiency gains on large-scale data, and delivers superior anomaly detection performance on real-world datasets.
📝 Abstract
We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD) converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Address sensitivity to neighbor count in distance-based anomaly detection
Convert unsupervised anomaly detection into convex optimization problem
Improve efficiency and performance with bagging technique
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts anomaly detection into convex optimization
Uses bagging to reduce computational complexity
Minimizes surrogate risk for hyperparameter insensitivity
🔎 Similar Papers
No similar papers found.
Y
Yuchao Cai
School of Statistics, Renmin University of China, China
Y
Yuheng Ma
School of Statistics, Renmin University of China, China
Hanfang Yang
Hanfang Yang
Assistant professor of statistics, School of Statistics, Renmin university of China
H
Hanyuan Hang
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, The Netherlands