Robust Clustered Federated Learning for Heterogeneous High-dimensional Data

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the dual challenges in federated learning: the coexistence of subgroup structure and intra-group heterogeneity, coupled with high-dimensional heavy-tailed data. Methodologically, we propose an adaptive clustering federated learning framework that jointly models subgroup partitioning and sparse parameter estimation, integrating Huber loss with iterative hard thresholding (IHT) compression within a grouped federated architecture to simultaneously capture inter-group discrepancies and facilitate intra-group knowledge sharing. Theoretically, we establish the first non-asymptotic error bound and provide recoverability guarantees for the underlying clustering structure. Empirically, our method significantly improves convergence speed, parameter estimation accuracy, and clustering fidelity on both synthetic and real-world datasets, while maintaining robustness to heavy-tailed noise and computational efficiency.

Technology Category

Application Category

📝 Abstract
Federated learning has attracted significant attention as a privacy-preserving framework for training personalised models on multi-source heterogeneous data. However, most existing approaches are unable to handle scenarios where subgroup structures coexist alongside within-group heterogeneity. In this paper, we propose a federated learning algorithm that addresses general heterogeneity through adaptive clustering. Specifically, our method partitions tasks into subgroups to address substantial between-group differences while enabling efficient information sharing among similar tasks within each group. Furthermore, we integrate the Huber loss and Iterative Hard Thresholding (IHT) to tackle the challenges of high dimensionality and heavy-tailed distributions. Theoretically, we establish convergence guarantees, derive non-asymptotic error bounds, and provide recovery guarantees for the latent cluster structure. Extensive simulation studies and real-data applications further demonstrate the effectiveness and adaptability of our approach.
Problem

Research questions and friction points this paper is trying to address.

Handles subgroup structures with within-group heterogeneity
Addresses high dimensionality and heavy-tailed data distributions
Enables efficient information sharing among similar tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive clustering partitions tasks into subgroups
Huber loss and IHT handle high-dimensional data
Enables information sharing among similar tasks
🔎 Similar Papers
No similar papers found.
C
Changxin Yang
Department of Statistics and Data Science, Fudan University
Z
Zhongyi Zhu
Department of Statistics and Data Science, Fudan University
Heng Lian
Heng Lian
Department of Mathematics, City University of Hong Kong
Learning theoryhigh-dimensional statisticsfunctional data analysisnon-/semi-parametetric statistics