🤖 AI Summary
This work addresses the poor performance and low efficiency of single-round federated learning on non-IID image data, such as medical images, by proposing a novel approach that integrates feature-aware hierarchical token sequence generation with knowledge distillation. Each client leverages a pretrained vision encoder to extract multi-scale semantic features and employs an autoregressive Transformer to generate synthetic token sequences, which are then uploaded to the server. The server aggregates these tokens and constructs a global model by combining local classifiers with knowledge distillation. This method is the first to incorporate hierarchical token sequences and knowledge distillation into single-round federated learning, substantially reducing reliance on precise data distribution modeling. Experiments demonstrate an average accuracy improvement of 9.58% over existing methods on both medical and natural image datasets.
📝 Abstract
One-shot federated learning (OSFL) reduces the communication cost and privacy risks of iterative federated learning by constructing a global model with a single round of communication. However, most existing methods struggle to achieve robust performance on real-world domains such as medical imaging, or are inefficient when handling non-IID (Independent and Identically Distributed) data. To address these limitations, we introduce FALCON, a framework that enhances the effectiveness of OSFL over non-IID image data. The core idea of FALCON is to leverage the feature-aware hierarchical token sequences generation and knowledge distillation into OSFL. First, each client leverages a pretrained visual encoder with hierarchical scale encoding to compress images into hierarchical token sequences, which capture multi-scale semantics. Second, a multi-scale autoregressive transformer generator is used to model the distribution of these token sequences and generate the synthetic sequences. Third, clients upload the synthetic sequences along with the local classifier trained on the real token sequences to the server. Finally, the server incorporates knowledge distillation into global training to reduce reliance on precise distribution modeling. Experiments on medical and natural image datasets validate the effectiveness of FALCON in diverse non-IID scenarios, outperforming the best OSFL baselines by 9.58% in average accuracy.