Robust fuzzy clustering with cellwise outliers

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This paper addresses the degradation of robustness in fuzzy clustering caused by cellwise outliers. We propose a robust fuzzy clustering method that integrates probabilistic modeling with cellwise outlier detection. Specifically, outlier cells are modeled as locally contaminated entries and treated as missing values within an EM algorithm framework, enabling joint estimation of cluster memberships, model parameters, and outlier locations. By introducing latent variables, the method achieves co-optimization of outlier detection and clustering, accompanied by an interpretable parameter-tuning strategy. Our key contribution is the first incorporation of cellwise robustness into a probabilistic fuzzy clustering framework, substantially enhancing resilience against localized data contamination. Experiments on obesity risk stratification and OECD country well-being analysis demonstrate that the proposed method outperforms existing approaches in both clustering stability and structural pattern discovery.

Technology Category

Application Category

📝 Abstract
Fuzzy clustering is a technique for identifying subgroups in heterogeneous populations by quantifying unit membership degrees. The magnitude of the latter depends on the desired level of fuzzification, based on the purpose of the analysis. We combine the advantages of fuzzy clustering with a robust approach able to detecting cellwise outliers, i.e., anomalous cells in a data matrix. The proposed methodology is formulated within a probabilistic framework and estimated via an Expectation-Maximization algorithm for missing data. It includes an additional step for flagging contaminated cells, which are then treated as missing information. The strengths of the model are illustrated through two real-world applications: the first one identifies individuals at potential risk of obesity based on their physiological measurements, while the second one analyzes well-being across regions of the OECD countries. We also explore the effects of the model's tuning parameters and provide guidance for users on how to set them suitably.
Problem

Research questions and friction points this paper is trying to address.

Detecting cellwise outliers in fuzzy clustering
Combining robust methods with fuzzy clustering techniques
Handling contaminated cells as missing data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust fuzzy clustering for cellwise outliers
Probabilistic framework with EM algorithm
Additional step flags contaminated cells
G
Giorgia Zaccaria
Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Via Bicocca degli Arcimboldi 8, Milan, 20100, Italy
L
Lorenzo Benzakour
Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Via Bicocca degli Arcimboldi 8, Milan, 20100, Italy
L
Luis A. García-Escudero
Department of Statistics and Operational Research, University of Valladolid, Paseo de Belén 7, Valladolid, 47011, Spain
Francesca Greselin
Francesca Greselin
Department of Statistics and Quantitative Methods, University of Milano-Bicocca
Inference for inequality measuresrobust estimation of mixture models for classification and
A
Agustín Mayo-Íscar
Department of Statistics and Operational Research, University of Valladolid, Paseo de Belén 7, Valladolid, 47011, Spain