A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the challenges in federated learning where the number of global clusters is unknown, local clustering structures are heterogeneous across clients, and clusters may overlap. To tackle these issues, the authors propose FedGEM, a novel federated clustering algorithm based on the Generalized Expectation-Maximization (GEM) framework. In FedGEM, each client performs local EM steps and constructs a component-wise uncertainty set, which the server leverages to infer the global number of clusters and identify cross-client cluster overlaps via a closed-form solution. As the first federated clustering method that jointly handles unknown global cluster counts and heterogeneous overlapping structures, FedGEM achieves low-complexity local computation and efficient aggregation. Theoretical analysis establishes its probabilistic convergence, and experiments demonstrate that its performance closely matches that of centralized EM while significantly outperforming existing federated clustering approaches.

Technology Category

Application Category

📝 Abstract

We study the problem of federated clustering when the total number of clusters $K$ across clients is unknown, and the clients have heterogeneous but potentially overlapping cluster sets in their local data. To that end, we develop FedGEM: a federated generalized expectation-maximization algorithm for the training of mixture models with an unknown number of components. Our proposed algorithm relies on each of the clients performing EM steps locally, and constructing an uncertainty set around the maximizer associated with each local component. The central server utilizes the uncertainty sets to learn potential cluster overlaps between clients, and infer the global number of clusters via closed-form computations. We perform a thorough theoretical study of our algorithm, presenting probabilistic convergence guarantees under common assumptions. Subsequently, we study the specific setting of isotropic GMMs, providing tractable, low-complexity computations to be performed by each client during each iteration of the algorithm, as well as rigorously verifying assumptions required for algorithm convergence. We perform various numerical experiments, where we empirically demonstrate that our proposed method achieves comparable performance to centralized EM, and that it outperforms various existing federated clustering methods.

Problem

Research questions and friction points this paper is trying to address.

federated clustering

unknown number of components

mixture models

heterogeneous data

cluster overlap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Clustering

Expectation-Maximization

Mixture Models