🤖 AI Summary
This work addresses the challenge of unsupervised tabular anomaly detection, where the absence of ground-truth anomaly labels hinders effective modeling of localized anomalous patterns. To overcome this limitation, the authors propose PLAG, a novel framework that generates fine-grained synthetic anomalies guided by pseudo-labels and decomposes global anomaly scores into cumulative feature-level contributions. PLAG incorporates a two-stage filtering mechanism—combining format validation and uncertainty estimation—to enhance the quality and diversity of synthesized anomalies. Notably, the method operates without requiring real anomaly labels and can be seamlessly integrated into existing detectors as a plug-in module. Extensive experiments demonstrate that PLAG consistently achieves state-of-the-art performance across multiple benchmarks, yielding substantial improvements in F1 score ranging from 0.08 to 0.21 over prior approaches.
📝 Abstract
Identifying anomalous instances in tabular data is essential for improving data reliability and maintaining system stability. Due to the scarcity of ground-truth anomaly labels, existing methods mainly rely on unsupervised anomaly detection models, or exploit a small number of labeled anomalies to facilitate detection via sample generation or contrastive learning. However, unsupervised methods lack sufficient anomaly awareness, while current generation and contrastive approaches tend to compute anomalies globally, overlooking the localized anomaly patterns of tabular features, resulting in suboptimal detection performance. To address these limitations, we propose PLAG, a pseudo-label-guided anomaly generation method designed to enhance tabular anomaly detection. Specifically, by utilizing pseudo-anomalies as guidance signals and decoupling the overall anomaly quantification of a sample into an accumulation of feature-level abnormalities, PLAG not only effectively obviates the need for scarce ground-truth labels but also provides a novel perspective for the model to comprehend localized anomalous signals at a fine-grained level. Furthermore, a two-stage data selection strategy is proposed, integrating format verification and uncertainty estimation to rigorously filter candidate samples, thereby ensuring the fidelity and diversity of the synthetic anomalies. Ultimately, these filtered synthetic anomalies serve as robust discriminative guidance, empowering the model to better separate normal and anomalous instances. Extensive experiments demonstrate that PLAG achieves state-of-the-art performance against eight representative baselines. Moreover, as a flexible framework, it integrates seamlessly with existing unsupervised detectors, consistently boosting F1-scores by 0.08 to 0.21.