Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of uncertainty and heterogeneity in anomaly detection for mixed-attribute data, this paper proposes FROD, a semi-supervised framework. FROD is the first method to jointly leverage fuzzy rough set modeling and fuzzy relative entropy quantification: it employs a small set of labeled instances to assess attribute discriminability, and computes anomaly scores via a synergistic measure—fuzzy approximation accuracy and relative entropy—derived from unlabeled data. By explicitly modeling both uncertainty and structural heterogeneity inherent in mixed-attribute spaces, FROD relaxes the strong homogeneity and determinism assumptions underlying conventional approaches. Extensive experiments across 16 public benchmark datasets demonstrate that FROD matches or surpasses state-of-the-art methods in detection performance. All code and datasets are publicly released, confirming its robustness and effectiveness in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm. Semi-supervised methods improve detection performance by leveraging partially labeled data but typically overlook the uncertainty and heterogeneity of real-world mixed-attribute data. This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD), to effectively handle these challenges. Specifically, we first utilize a small subset of labeled data to construct fuzzy decision systems, through which we introduce the attribute classification accuracy based on fuzzy approximations to evaluate the contribution of attribute sets in outlier detection. Unlabeled data is then used to compute fuzzy relative entropy, which provides a characterization of outliers from the perspective of uncertainty. Finally, we develop the detection algorithm by combining attribute classification accuracy with fuzzy relative entropy. Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms. All datasets and source codes are accessible at https://github.com/ChenBaiyang/FROD. This manuscript is the accepted author version of a paper published by Elsevier. The final published version is available at https://doi.org/10.1016/j.ijar.2025.109373
Problem

Research questions and friction points this paper is trying to address.

Develops a semi-supervised method for outlier detection in mixed-attribute data.
Addresses uncertainty and heterogeneity using fuzzy approximations and relative entropy.
Proposes FROD algorithm combining attribute accuracy with fuzzy entropy for detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuzzy rough sets construct decision systems for attribute evaluation
Fuzzy relative entropy quantifies outlier uncertainty from unlabeled data
Combines attribute accuracy and entropy for semi-supervised detection
🔎 Similar Papers
No similar papers found.
B
Baiyang Chen
College of Computer Science, Sichuan University, Chengdu, 610065, China
Zhong Yuan
Zhong Yuan
Penn State Univeristy
Deep Learning in Health CareDiffusion Model
Z
Zheng Liu
Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu, 610095, China
Dezhong Peng
Dezhong Peng
Sichuan University
Multi-modal LearningMultimedia AnalysisNeural Network
Yongxiang Li
Yongxiang Li
Professor, RMIT University
Electronic Materials and Devices
C
Chang Liu
College of Computer Science, Sichuan University, Chengdu, 610065, China
G
Guiduo Duan
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China