Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research on poisoning attacks in federated learning often relies on idealized assumptions that fail to capture real-world deployment risks. This work addresses this gap by empirically constructing TFLlib, a unified evaluation framework supporting vision, text, and tabular tasks, which incorporates heterogeneous client compositions, realistic participation patterns, and multidimensional evaluation metrics. Through systematic reproduction of representative attacks under more practical settings, the study reveals three critical mismatches between current research and real-world scenarios: overestimated attack effectiveness, insufficient temporal stability, and significant utility degradation on benign tasks. To bridge these gaps, the paper advocates for a holistic security evaluation paradigm that jointly considers attack efficacy, temporal stability, and utility loss.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly understood. In particular, a large body of poisoning research is evaluated under idealized assumptions about attacker participation, client homogeneity, and success metrics, which can substantially distort how security risks are perceived in deployed FL systems. This paper revisits FL security from a measurement perspective. We systematize three major sources of mismatch between research and practice: unrealistic poisoning threat models, the omission of hybrid heterogeneity, and incomplete metrics that overemphasize peak attack success while ignoring stability and utility cost. To study these gaps, we build TFLlib, a uniform evaluation framework that supports image, text, and tabular FL tasks and re-implements representative poisoning attacks under practical settings. Our empirical study shows that idealized evaluation often overstates security risk. Under practical settings, attack performance becomes markedly more dataset-dependent and unstable, and several attacks that appear consistently strong in idealized FL lose effectiveness or incur clear benign-task degradation once practical constraints are enforced. These findings further show that final-round attack success alone is insufficient for security assessment; practical measurement must jointly consider effectiveness, temporal stability, and collateral utility loss. Overall, this work argues that many conclusions in the FL poisoning literature are not directly transferable to real deployments. By tightening the threat model and using measurement protocols aligned with practice, we provide a more realistic view of the security risks faced by contemporary FL systems and distill concrete guidance for future FL security evaluation. Our code is available at https://github.com/xaddwell/TFLlib
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Security Risks
Poisoning Attacks
Threat Model
Practical Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

federated learning security
poisoning attacks
practical evaluation
heterogeneity
measurement study
🔎 Similar Papers
No similar papers found.