Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data

📅 2024-05-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Semi-supervised anomaly detection commonly assumes that unlabeled data contain only normal instances; however, in practice, such data are often contaminated with anomalies, degrading model performance. To address this, we propose a robust framework that— for the first time—integrates positive-unlabeled (PU) learning into a deep anomaly detection architecture combining autoencoders and Deep SVDD. Without requiring labeled normal samples, our method explicitly models and estimates the anomaly score distribution of normal data. By incorporating a PU-specific loss function, it effectively mitigates bias induced by anomaly contamination in the unlabeled set. Extensive experiments on multiple benchmark datasets demonstrate significant improvements in AUC and F1 scores over state-of-the-art methods, confirming strong robustness to label contamination. Our approach establishes a novel paradigm for semi-supervised anomaly detection under realistic, imperfect labeling conditions.

Technology Category

Application Category

📝 Abstract

Semi-supervised anomaly detection, which aims to improve the anomaly detection performance by using a small amount of labeled anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that most unlabeled data are normal, and train anomaly detectors by minimizing the anomaly scores for the unlabeled data while maximizing those for the labeled anomaly data. However, in practice, the unlabeled data are often contaminated with anomalies. This weakens the effect of maximizing the anomaly scores for anomalies, and prevents us from improving the detection performance. To solve this problem, we propose the deep positive-unlabeled anomaly detection framework, which integrates positive-unlabeled learning with deep anomaly detection models such as autoencoders and deep support vector data descriptions. Our approach enables the approximation of anomaly scores for normal data using the unlabeled data and the labeled anomaly data. Therefore, without labeled normal data, our approach can train anomaly detectors by minimizing the anomaly scores for normal data while maximizing those for the labeled anomaly data. Experiments on various datasets show that our approach achieves better detection performance than existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in contaminated unlabeled data

Integrates positive-unlabeled learning with deep models

Improves anomaly detection without labeled normal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates positive-unlabeled learning

Uses autoencoders and SVDD

Approximates normal data scores

🔎 Similar Papers

No similar papers found.