🤖 AI Summary
This paper addresses the problem of approximating a high-dimensional target measure via a parametric mixture distribution, given only finitely many moment constraints, with approximation quality measured by either the 2-Wasserstein or total variation distance. The proposed method is a hierarchy of semidefinite programming (SDP) relaxations tailored to compact semialgebraic parameter sets, requiring no prior assumption on the number of mixture components. Under a rank condition, the hierarchy converges in finitely many steps and exactly recovers the optimal mixture measure. By integrating moment theory, semialgebraic geometry, and optimization over Wasserstein/TV distances, the approach achieves asymptotically optimal approximation. Experiments demonstrate its effectiveness in clustering: it automatically determines the number of clusters and delivers high-quality initializations, significantly accelerating convergence of classical algorithms such as EM.
📝 Abstract
Mixture models, such as Gaussian mixture models, are widely used in machine learning to represent complex data distributions. A key challenge, especially in high-dimensional settings, is to determine the mixture order and estimate the mixture parameters. We study the problem of approximating a target measure, available only through finitely many of its moments, by a mixture of distributions from a parametric family (e.g., Gaussian, exponential, Poisson), with approximation quality measured by the 2-Wasserstein or the total variation distance. Unlike many existing approaches, the parameter set is not assumed to be finite; it is modeled as a compact basic semi-algebraic set. We introduce a hierarchy of semidefinite relaxations with asymptotic convergence to the desired optimal value. In addition, when a certain rank condition is satisfied, the convergence is even finite and recovery of an optimal mixing measure is obtained. We also present an application to clustering, where our framework serves either as a stand-alone method or as a preprocessing step that yields both the number of clusters and strong initial parameter estimates, thereby accelerating convergence of standard (local) clustering algorithms.