🤖 AI Summary
This paper identifies a novel stealthy attack in federated learning (FL), where a malicious client deliberately injects bias by optimizing fairness-aware loss functions (e.g., demographic parity, equal opportunity) under homogeneous data distributions—thereby degrading global model fairness while preserving high accuracy and evading existing Byzantine-robust and fairness-aware aggregation defenses.
Method: We propose the first fairness-constrained targeted optimization attack framework, embedding adversarial fairness objectives into local client training to degrade fairness metrics (e.g., disparity measures) without compromising accuracy.
Contribution/Results: A single malicious client suffices to increase system-wide bias by up to 90%. Extensive experiments demonstrate strong evasion capabilities against state-of-the-art defenses—including FedAvg augmented with fairness-aware aggregation—revealing a critical vulnerability in current FL systems’ fairness guarantees. Our work provides both a rigorous benchmark and urgent guidance for developing robust, fairness-preserving FL algorithms.
📝 Abstract
Federated learning (FL) is a privacy-preserving machine learning technique that facilitates collaboration among participants across demographics. FL enables model sharing, while restricting the movement of data. Since FL provides participants with independence over their training data, it becomes susceptible to poisoning attacks. Such collaboration also propagates bias among the participants, even unintentionally, due to different data distribution or historical bias present in the data. This paper proposes an intentional fairness attack, where a client maliciously sends a biased model, by increasing the fairness loss while training, even considering homogeneous data distribution. The fairness loss is calculated by solving an optimization problem for fairness metrics such as demographic parity and equalized odds. The attack is insidious and hard to detect, as it maintains global accuracy even after increasing the bias. We evaluate our attack against the state-of-the-art Byzantine-robust and fairness-aware aggregation schemes over different datasets, in various settings. The empirical results demonstrate the attack efficacy by increasing the bias up to 90%, even in the presence of a single malicious client in the FL system.