🤖 AI Summary
Existing semi-supervised anomaly detection methods are limited to point-level or pairwise modeling, neglecting that anomalies fundamentally manifest as higher-order deviations relative to group-level contextual patterns—thus hindering discriminative representation learning. This paper introduces, for the first time, a set-level anomaly detection paradigm, reformulating anomaly identification as a higher-order interaction modeling task over contextual sets. Methodologically, we propose an attention-driven set encoder, a hierarchical supervised learning objective, and a context-calibrated normalized deviation aggregation scoring mechanism. Evaluated on ten real-world benchmark datasets, our approach consistently outperforms state-of-the-art methods. Moreover, performance improves monotonically with increasing set size, empirically validating the paradigm’s effectiveness, scalability, and theoretical soundness.
📝 Abstract
Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual points or simple pairs. Such {point- or pair-centric} view not only overlooks the contextual nature of anomalies, which are defined by their deviation from a collective group, but also fails to exploit the rich supervisory signals that can be generated from the combinatorial composition of sets. Consequently, such models struggle to exploit the high-order interactions within the data, which are critical for learning discriminative representations. To address these limitations, we propose SetAD, a novel framework that reframes semi-supervised AD as a Set-level Anomaly Detection task. SetAD employs an attention-based set encoder trained via a graded learning objective, where the model learns to quantify the degree of anomalousness within an entire set. This approach directly models the complex group-level interactions that define anomalies. Furthermore, to enhance robustness and score calibration, we propose a context-calibrated anomaly scoring mechanism, which assesses a point's anomaly score by aggregating its normalized deviations from peer behavior across multiple, diverse contextual sets. Extensive experiments on 10 real-world datasets demonstrate that SetAD significantly outperforms state-of-the-art models. Notably, we show that our model's performance consistently improves with increasing set size, providing strong empirical support for the set-based formulation of anomaly detection.