🤖 AI Summary
This work aims to unify the understanding of the dynamic interplay between the global monotonic descent property and the local spectral convergence behavior of the EM algorithm. By constructing a latent-variable operator that bridges global and local perspectives, the authors decompose the likelihood increments along EM trajectories into energetic components and, upon linearization, derive an operator that jointly characterizes local contraction rates, posterior rigidity, and geometric curvature. Integrating tools from information geometry, relative entropy decomposition, operator linearization, and spectral analysis, this framework is the first to coherently embed EM’s global monotonicity, local spectral properties, and optimal relaxation rules within a unified dynamical system. Key contributions include a precise spectral characterization of local convergence, derivation of an optimal scalar relaxation rule for accelerating EM locally, and demonstration of the operator’s consistency with both the missing information fraction and the information-geometric Hessian.
📝 Abstract
The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the linearized EM map. How these two levels fit into a single dynamical picture has remained less transparent.
We make explicit the latent-variable operator that connects them. Along the EM trajectory, the likelihood increment admits a global energy decomposition in terms of posterior-relative entropy. Linearization at a nondegenerate maximizer $θ^\ast$ then reveals the local operator \[ \mathcal G_{θ^\ast}=I-DT(θ^\ast), \] which coincides with both the missing-information ratio and the information-geometric Hessian of the observed likelihood.
This operator provides a unified description of local contraction, posterior rigidity, and geometric curvature. Its spectrum yields a sharp characterization of local convergence and naturally leads to an optimal scalar relaxation rule for locally accelerated EM. These results place global descent, local spectral behavior, and optimal local relaxation within a common dynamical framework.