Turtle shell clustering: A mixture approach to discriminative clustering with applications to flow cytometry and other data

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the challenges in unsupervised clustering of simultaneously modeling both cluster geometry and discriminative boundaries, automatically determining the number of clusters, and maintaining robustness under noise or irregular cluster shapes. The authors propose a novel approach that integrates generative and discriminative principles by constructing a Gaussian–uniform mixture probabilistic model regularized through a mutual information objective. Coupled with a cluster-merging strategy inspired by reversible-jump Markov chain Monte Carlo, the method enables adaptive estimation of nonlinear decision boundaries and automatic selection of the optimal cluster count. Experimental results demonstrate that the proposed framework accurately recovers complex cluster structures across diverse synthetic and real-world datasets—including flow cytometry data—and exhibits strong robustness to noise and outliers.

Technology Category

Application Category

📝 Abstract
Generative approaches to clustering provide information on geometric properties of clusters, whereas discriminative approaches provide boundaries between clusters. Ideas from both approaches are incorporated to present a fully unsupervised, probabilistic, and discriminative clustering method via a regularized mutual information objective function, wherein a mixture of mixtures of Gaussian and uniform distributions is used for formulation of the conditional model. Automatic selection of the number of components is established with the introduction of the regularizing term and a merge step, similar to those applied in reversible jump Markov chain Monte Carlo methods used in Bayesian clustering. Consequently, the turtle shell method -- a fully unsupervised clustering method capable of estimating non-linear boundary lines, automatically selecting the number of components, and capturing intuitive clusters in the presence of data abnormalities such as noise and/or irregular cluster shapes -- is introduced. We test this method on various simulated and real datasets commonly explored in clustering research, and extend the analysis to datasets arising from flow cytometry experiments.
Problem

Research questions and friction points this paper is trying to address.

discriminative clustering
unsupervised learning
cluster number selection
nonlinear boundaries
data abnormalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

discriminative clustering
mixture models
regularized mutual information
automatic cluster number selection
non-linear boundaries
🔎 Similar Papers
2021-06-14IEEE Transactions on Visualization and Computer GraphicsCitations: 12