On Model-Based Clustering With Entropic Optimal Transport

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the susceptibility of traditional model-based clustering to poor local optima arising from log-likelihood maximization. To mitigate this issue, the authors propose a novel loss function grounded in entropy-regularized optimal transport, which replaces the conventional log-likelihood objective. This new formulation preserves consistency with the global optimum while substantially reducing spurious local minima, thereby yielding a smoother optimization landscape. Within an expectation-maximization (EM) framework, they develop an efficient Sinkhorn-EM algorithm to optimize the proposed objective. Experimental results demonstrate that the method outperforms standard log-likelihood-based approaches in both C. elegans microscopy image segmentation and spatial transcriptomics clustering, achieving markedly improved clustering stability and accuracy.
📝 Abstract
We develop a new methodology for model-based clustering. Optimizing the log-likelihood provides a principled statistical framework for clustering, with solutions found via the EM algorithm. However, because the log-likelihood is nonconvex, only convergence to stationary points can be guaranteed, and practitioners often use multiple starting points in the hope that one will converge to the global solution. We consider a new loss function based on entropic optimal transport that shares the same global optimum as the log-likelihood but has a much better-behaved landscape, thereby avoiding spurious local-optima configurations that are pervasive with the log-likelihood. Similar to the EM algorithm for the log-likelihood, this new loss can be optimized by the Sinkhorn-EM algorithm, which we show converges at a rate comparable to that of EM. By analyzing extensive numerical experiments and two real-world applications in image segmentation in C. elegans microscopy and clustering in spatial transcriptomics, we show that this new loss outperforms log-likelihood optimization, indicating that it represents a valuable clustering methodology for practitioners.
Problem

Research questions and friction points this paper is trying to address.

model-based clustering
log-likelihood
nonconvex optimization
local optima
entropic optimal transport
Innovation

Methods, ideas, or system contributions that make the work stand out.

entropic optimal transport
model-based clustering
Sinkhorn-EM
nonconvex optimization
log-likelihood landscape