π€ AI Summary
This work addresses the challenge of uncovering multi-granular and even nested hierarchical cluster structures in unlabeled, non-IID client data within federated learning. The authors propose an efficient single-round communication framework for hierarchical federated clustering. Each client locally generates fine-grained βclusterletsβ and uploads only their prototypes to the server, which then integrates these multi-granular clusterlets to reconstruct a global hierarchical clustering structure. The key innovation lies in achieving hierarchical clustering with just one round of communication, thereby simultaneously preserving privacy, ensuring computational efficiency, and capturing complex cluster patterns. Extensive experiments on ten public datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches and effectively reveals intricate cross-client cluster distributions.
π Abstract
Driven by the growth of Web-scale decentralized services, Federated Clustering (FC) aims to extract knowledge from heterogeneous clients in an unsupervised manner while preserving the clients'privacy, which has emerged as a significant challenge due to the lack of label guidance and the Non-Independent and Identically Distributed (non-IID) nature of clients. In real scenarios such as personalized recommendation and cross-device user profiling, the global cluster may be fragmented and distributed among different clients, and the clusters may exist at different granularities or even nested. Although Hierarchical Clustering (HC) is considered promising for exploring such distributions, the sophisticated recursive clustering process makes it more computationally expensive and vulnerable to privacy exposure, thus relatively unexplored under the federated learning scenario. This paper introduces an efficient one-shot hierarchical FC framework that performs client-end distribution exploration and server-end distribution aggregation through one-way prototype-level communication from clients to the server. A fine partition mechanism is developed to generate successive clusterlets to describe the complex landscape of the clients'clusters. Then, a multi-granular learning mechanism on the server is proposed to fuse the clusterlets, even when they have inconsistent granularities generated from different clients. It turns out that the complex cluster distributions across clients can be efficiently explored, and extensive experiments comparing state-of-the-art methods on ten public datasets demonstrate the superiority of the proposed method.