Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Model Optimal Transport

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

The EM algorithm is conventionally treated as a non-differentiable black box, impeding its integration into end-to-end differentiable learning frameworks—particularly those grounded in optimal transport. This paper presents the first rigorous differentiable formulation of the EM algorithm, introducing multiple strategies for automatic differentiation and implicit-function-based gradient approximation, and constructing a fully differentiable pipeline for computing the mixture Wasserstein distance. Theoretically, we establish stability conditions for the mixture Wasserstein distance between Gaussian mixture models and, for the first time, define and analyze its unbalanced variant. Empirically, our method enables stable gradient backpropagation across diverse tasks—including image barycenter estimation, color transfer, generative modeling, and texture synthesis—substantially improving model trainability and generalization. This work establishes a novel paradigm for deep integration of latent-variable models and optimal transport theory.

Technology Category

Application Category

📝 Abstract

The Expectation-Maximisation (EM) algorithm is a central tool in statistics and machine learning, widely used for latent-variable models such as Gaussian Mixture Models (GMMs). Despite its ubiquity, EM is typically treated as a non-differentiable black box, preventing its integration into modern learning pipelines where end-to-end gradient propagation is essential. In this work, we present and compare several differentiation strategies for EM, from full automatic differentiation to approximate methods, assessing their accuracy and computational efficiency. As a key application, we leverage this differentiable EM in the computation of the Mixture Wasserstein distance $mathrm{MW}_2$ between GMMs, allowing $mathrm{MW}_2$ to be used as a differentiable loss in imaging and machine learning tasks. To complement our practical use of $mathrm{MW}_2$, we contribute a novel stability result which provides theoretical justification for the use of $mathrm{MW}_2$ with EM, and also introduce a novel unbalanced variant of $mathrm{MW}_2$. Numerical experiments on barycentre computation, colour and style transfer, image generation, and texture synthesis illustrate the versatility and effectiveness of the proposed approach in different settings.

Problem

Research questions and friction points this paper is trying to address.

Making EM algorithm differentiable for gradient-based learning

Enabling MW2 distance as differentiable loss in ML tasks

Providing theoretical stability for MW2 with EM applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable EM algorithm for gradient propagation

Differentiable MW2 distance as loss function

Unbalanced MW2 variant for enhanced applications

🔎 Similar Papers

Optimal Transport for Domain Adaptation through Gaussian Mixture Models