🤖 AI Summary
The EM algorithm is conventionally treated as a non-differentiable black box, impeding its integration into end-to-end differentiable learning frameworks—particularly those grounded in optimal transport. This paper presents the first rigorous differentiable formulation of the EM algorithm, introducing multiple strategies for automatic differentiation and implicit-function-based gradient approximation, and constructing a fully differentiable pipeline for computing the mixture Wasserstein distance. Theoretically, we establish stability conditions for the mixture Wasserstein distance between Gaussian mixture models and, for the first time, define and analyze its unbalanced variant. Empirically, our method enables stable gradient backpropagation across diverse tasks—including image barycenter estimation, color transfer, generative modeling, and texture synthesis—substantially improving model trainability and generalization. This work establishes a novel paradigm for deep integration of latent-variable models and optimal transport theory.
📝 Abstract
The Expectation-Maximisation (EM) algorithm is a central tool in statistics and machine learning, widely used for latent-variable models such as Gaussian Mixture Models (GMMs). Despite its ubiquity, EM is typically treated as a non-differentiable black box, preventing its integration into modern learning pipelines where end-to-end gradient propagation is essential. In this work, we present and compare several differentiation strategies for EM, from full automatic differentiation to approximate methods, assessing their accuracy and computational efficiency. As a key application, we leverage this differentiable EM in the computation of the Mixture Wasserstein distance $mathrm{MW}_2$ between GMMs, allowing $mathrm{MW}_2$ to be used as a differentiable loss in imaging and machine learning tasks. To complement our practical use of $mathrm{MW}_2$, we contribute a novel stability result which provides theoretical justification for the use of $mathrm{MW}_2$ with EM, and also introduce a novel unbalanced variant of $mathrm{MW}_2$. Numerical experiments on barycentre computation, colour and style transfer, image generation, and texture synthesis illustrate the versatility and effectiveness of the proposed approach in different settings.