Consistency-guided semi-supervised outlier detection in heterogeneous data using fuzzy rough sets

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Semi-supervised anomaly detection on heterogeneous data suffers from high false positive rates, as existing methods predominantly focus on numerical features and neglect the intrinsic heterogeneity of data types. Method: This paper proposes a novel approach integrating fuzzy rough set theory with a classification consistency mechanism. Leveraging a small number of labeled anomalies, it constructs a label-guided fuzzy similarity relation; designs a fuzzy decision system consistency measure to evaluate attribute discriminability; and accordingly defines an integrated anomaly factor. Contribution/Results: This work is the first to incorporate fuzzy rough sets and classification consistency into semi-supervised anomaly detection for heterogeneous data, eliminating reliance on numerical representations. Extensive experiments on 15 newly constructed heterogeneous datasets demonstrate that the method achieves accuracy comparable to or superior than state-of-the-art approaches, while significantly reducing false positive rates.

Technology Category

Application Category

📝 Abstract
Outlier detection aims to find samples that behave differently from the majority of the data. Semi-supervised detection methods can utilize the supervision of partial labels, thus reducing false positive rates. However, most of the current semi-supervised methods focus on numerical data and neglect the heterogeneity of data information. In this paper, we propose a consistency-guided outlier detection algorithm (COD) for heterogeneous data with the fuzzy rough set theory in a semi-supervised manner. First, a few labeled outliers are leveraged to construct label-informed fuzzy similarity relations. Next, the consistency of the fuzzy decision system is introduced to evaluate attributes' contributions to knowledge classification. Subsequently, we define the outlier factor based on the fuzzy similarity class and predict outliers by integrating the classification consistency and the outlier factor. The proposed algorithm is extensively evaluated on 15 freshly proposed datasets. Experimental results demonstrate that COD is better than or comparable with the leading outlier detectors. This manuscript is the accepted author version of a paper published by Elsevier. The final published version is available at https://doi.org/10.1016/j.asoc.2024.112070
Problem

Research questions and friction points this paper is trying to address.

Detects outliers in heterogeneous data semi-supervised
Uses fuzzy rough sets to handle mixed data types
Improves accuracy by integrating consistency and outlier factors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses fuzzy rough sets for heterogeneous data
Integrates classification consistency with outlier factor
Leverages labeled outliers to construct similarity relations
🔎 Similar Papers
No similar papers found.
B
Baiyang Chen
College of Computer Science, Sichuan University, Chengdu, 610065, China
Zhong Yuan
Zhong Yuan
Penn State Univeristy
Deep Learning in Health CareDiffusion Model
Dezhong Peng
Dezhong Peng
Sichuan University
Multi-modal LearningMultimedia AnalysisNeural Network
X
Xiaoliang Chen
Department of Computer Science and Operations Research, University of Montreal, Montreal, QC H3C3J7, Canada
H
Hongmei Chen
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China