🤖 AI Summary
Existing semi-supervised anomaly detection methods for heterogeneous data overlook data heterogeneity and uncertainty. To address this, we propose Label-Guided Granular Density Outlier Factor (GDOF), the first approach to embed sparse anomaly labels into a fuzzy granulation process. GDOF constructs an attribute-adaptive granular density ensemble: it models multi-granularity uncertainty via fuzzy sets, captures heterogeneous attribute structures using granular computing principles, and enhances discriminability through label-guided density estimation and attribute-correlation-weighted fusion. Extensive experiments on multiple real-world heterogeneous datasets demonstrate that GDOF achieves state-of-the-art performance with only a minimal number of labeled anomalies (e.g., 5–10 samples), significantly outperforming existing semi-supervised methods.
📝 Abstract
Outlier detection, crucial for identifying unusual patterns with significant implications across numerous applications, has drawn considerable research interest. Existing semi-supervised methods typically treat data as purely numerical and} in a deterministic manner, thereby neglecting the heterogeneity and uncertainty inherent in complex, real-world datasets. This paper introduces a label-informed outlier detection method for heterogeneous data based on Granular Computing and Fuzzy Sets, namely Granule Density-based Outlier Factor (GDOF). Specifically, GDOF first employs label-informed fuzzy granulation to effectively represent various data types and develops granule density for precise density estimation. Subsequently, granule densities from individual attributes are integrated for outlier scoring by assessing attribute relevance with a limited number of labeled outliers. Experimental results on various real-world datasets show that GDOF stands out in detecting outliers in heterogeneous data with a minimal number of labeled outliers. The integration of Fuzzy Sets and Granular Computing in GDOF offers a practical framework for outlier detection in complex and diverse data types. All relevant datasets and source codes are publicly available for further research. This is the author's accepted manuscript of a paper published in IEEE Transactions on Fuzzy Systems. The final version is available at https://doi.org/10.1109/TFUZZ.2024.3514853