🤖 AI Summary
This paper addresses the highly heterogeneous independent component analysis (ICA) problem in distributed and federated learning settings, proposing the first one-shot robust aggregation framework. It tackles key challenges including extreme client-side data sparsity (up to half of clients possessing only a few observations), permutation ambiguity in estimated separation matrices, and severe statistical heterogeneity. Methodologically, it introduces a novel two-stage aggregation mechanism that jointly leverages k-means clustering and the geometric median: clustering first resolves permutation inconsistencies across clients’ separation matrices, followed by geometric-median-based robust aggregation. Theoretically, it establishes the first joint characterization of the geometric median’s estimation error bound and k-means’ maximum misclustering rate, rigorously guaranteeing convergence under extreme heterogeneity. Experiments demonstrate that the framework consistently outperforms both vanilla averaging and existing robust aggregation methods across diverse heterogeneous configurations.
📝 Abstract
This paper investigates a general robust one-shot aggregation framework for distributed and federated Independent Component Analysis (ICA) problem. We propose a geometric median-based aggregation algorithm that leverages $k$-means clustering to resolve the permutation ambiguity in local client estimations. Our method first performs k-means to partition client-provided estimators into clusters and then aggregates estimators within each cluster using the geometric median. This approach provably remains effective even in highly heterogeneous scenarios where at most half of the clients can observe only a minimal number of samples. The key theoretical contribution lies in the combined analysis of the geometric median's error bound-aided by sample quantiles-and the maximum misclustering rates of the aforementioned solution of $k$-means. The effectiveness of the proposed approach is further supported by simulation studies conducted under various heterogeneous settings.