🤖 AI Summary
Traditional Histogram-Based Outlier Score (HBOS) assumes feature independence, limiting its ability to detect anomalies arising from feature dependencies. Method: We propose Extended HBOS (EHBOS), the first HBOS variant incorporating two-dimensional histograms into the framework to explicitly model pairwise joint feature distributions and capture critical dependency structures—thereby relaxing the univariate independence assumption. EHBOS employs combinatorial pairwise feature scanning, bivariate density estimation, and weighted score fusion to enable context-sensitive anomaly detection. Contribution/Results: Extensive evaluation on 17 benchmark datasets demonstrates that EHBOS significantly outperforms HBOS, achieving an average ROC AUC improvement of over 8%. Gains are especially pronounced on datasets exhibiting strong feature interactions. Moreover, EHBOS retains computational efficiency and exhibits robust performance across diverse data characteristics.
📝 Abstract
Histogram-Based Outlier Score (HBOS) is a widely used outlier or anomaly detection method known for its computational efficiency and simplicity. However, its assumption of feature independence limits its ability to detect anomalies in datasets where interactions between features are critical. In this paper, we propose the Extended Histogram-Based Outlier Score (EHBOS), which enhances HBOS by incorporating two-dimensional histograms to capture dependencies between feature pairs. This extension allows EHBOS to identify contextual and dependency-driven anomalies that HBOS fails to detect. We evaluate EHBOS on 17 benchmark datasets, demonstrating its effectiveness and robustness across diverse anomaly detection scenarios. EHBOS outperforms HBOS on several datasets, particularly those where feature interactions are critical in defining the anomaly structure, achieving notable improvements in ROC AUC. These results highlight that EHBOS can be a valuable extension to HBOS, with the ability to model complex feature dependencies. EHBOS offers a powerful new tool for anomaly detection, particularly in datasets where contextual or relational anomalies play a significant role.