🤖 AI Summary
To address core challenges in federated learning—including data heterogeneity, partial client participation, and communication constraints—this paper proposes QSMM, the first federated optimization framework grounded in the Majorize-Minimize (MM) paradigm. Instead of conventional parameter aggregation, QSMM distributes the construction, update, and aggregation of surrogate functions across clients, enabling seamless integration with gradient-based methods, EM algorithms, and variational inference. We further introduce Stochastic Surrogate MM (SSMM), a randomized approximation mechanism that preserves theoretical convergence guarantees while substantially reducing communication overhead and natively supporting non-IID data. Extensive experiments demonstrate that QSMM significantly outperforms existing approaches on challenging tasks such as federated optimal transport mapping estimation, validating its effectiveness, robustness, and generalizability across diverse model families and data distributions.
📝 Abstract
This paper proposes a unified approach for designing stochastic optimization algorithms that robustly scale to the federated learning setting. Our work studies a class of Majorize-Minimization (MM) problems, which possesses a linearly parameterized family of majorizing surrogate functions. This framework encompasses (proximal) gradient-based algorithms for (regularized) smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM. We show that our framework motivates a unifying algorithm called Stochastic Approximation Stochastic Surrogate MM (SSMM), which includes previous stochastic MM procedures as special instances. We then extend SSMM to the federated setting, while taking into consideration common bottlenecks such as data heterogeneity, partial participation, and communication constraints; this yields QSMM. The originality of QSMM is to learn locally and then aggregate information characterizing the extit{surrogate majorizing function}, contrary to classical algorithms which learn and aggregate the extit{original parameter}. Finally, to showcase the flexibility of this methodology beyond our theoretical setting, we use it to design an algorithm for computing optimal transport maps in the federated setting.