🤖 AI Summary
In federated learning (FL), malicious servers can reconstruct clients’ private data from model updates; however, existing reconstruction attacks rely on strong assumptions about data distribution and fail for large batch sizes (>几十 samples). This work proposes a prior-free, distribution-agnostic reconstruction attack. We introduce a novel geometric perspective on fully connected layer weights—interpreting them as hyperplanes—and jointly design malicious parameters, gradient inversion mappings, and classification-feature decoupling to enable high-fidelity reconstruction of arbitrarily large batches (up to thousands of samples). Evaluated on image and tabular datasets, our method achieves significantly higher reconstruction fidelity than state-of-the-art approaches, scaling batch size by over two orders of magnitude. These results expose severe privacy vulnerabilities in FL under realistic, non-ideal server assumptions.
📝 Abstract
Federated Learning (FL) enables collaborative training of machine learning models across distributed clients without sharing raw data, ostensibly preserving data privacy. Nevertheless, recent studies have revealed critical vulnerabilities in FL, showing that a malicious central server can manipulate model updates to reconstruct clients' private training data. Existing data reconstruction attacks have important limitations: they often rely on assumptions about the clients' data distribution or their efficiency significantly degrades when batch sizes exceed just a few tens of samples. In this work, we introduce a novel data reconstruction attack that overcomes these limitations. Our method leverages a new geometric perspective on fully connected layers to craft malicious model parameters, enabling the perfect recovery of arbitrarily large data batches in classification tasks without any prior knowledge of clients' data. Through extensive experiments on both image and tabular datasets, we demonstrate that our attack outperforms existing methods and achieves perfect reconstruction of data batches two orders of magnitude larger than the state of the art.