A smooth multi-group Gaussian Mixture Model for cellwise robust covariance estimation

📅 2025-04-03

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study addresses the inconsistency between predefined clinical diagnostic labels and data-driven clustering. We propose an interpretable, smoothly transitioning, and unit-level robust multi-group Gaussian mixture model (GMM). Methodologically, we introduce the first smooth modeling framework integrating group-structure priors with data-driven mixtures, featuring a likelihood-based smooth GMM formulation, unit-level robust estimation, and an EM-type optimization algorithm; we further derive the breakdown point of the robust estimator theoretically—filling a key theoretical gap. Experiments demonstrate that our model significantly outperforms conventional group-mean estimators, independent GMMs, and non-robust covariance estimators on synthetic data. On real-world medical and cross-domain datasets, it effectively identifies samples near ambiguous group boundaries and locally anomalous units, thereby enhancing grouping interpretability and clinical applicability.

Technology Category

Application Category

📝 Abstract

Are data groups which are pre-defined by expert opinions or medical diagnoses corresponding to groups based on statistical modeling? For which reason might observations be inconsistent? This contribution intends to answer both questions by proposing a novel multi-group Gaussian mixture model that accounts for the given group context while allowing high flexibility. This is achieved by assuming that the observations of a particular group originate not from a single distribution but from a Gaussian mixture of all group distributions. Moreover, the model provides robustness against cellwise outliers, thus against atypical data cells of the observations. The objective function can be formulated as a likelihood problem and optimized efficiently. We also derive the theoretical breakdown point of the estimators, an innovative result in this context to quantify the degree of robustness to cellwise outliers. Simulations demonstrate the excellent performance and the advantages to alternative models and estimators. Applications from different areas illustrate the strength of the method, particularly in investigating observations which are on the overlap of different groups.

Problem

Research questions and friction points this paper is trying to address.

Aligning expert-defined groups with statistical model clusters

Detecting discrepancies between predefined labels and data-driven groupings

Providing robust Gaussian mixture modeling with outlier resistance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-group Gaussian mixture modeling

Robust cellwise outlier detection

Penalized likelihood estimation approach

🔎 Similar Papers

No similar papers found.