Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the practical privacy guarantees of federated learning (FL) for fine-tuning large language models (LLMs) on private client data. We find that, despite avoiding raw-data sharing, the global model can still leak client training samples via parameter updates—a risk that intensifies with model scale. To quantify this threat, we propose an enhanced generative attack that reconstructs training data by tracking multiple rounds of global model updates. We systematically evaluate defenses including differential privacy, gradient regularization, and secure alignment of LLMs. Experiments demonstrate that standard FL poses substantial privacy risks; however, combining safety-aligned LLMs with constrained model updates significantly mitigates leakage. Our findings challenge the common “federated = secure” assumption and provide empirically grounded, deployable privacy-enhancement strategies for LLM federated fine-tuning.

Technology Category

Application Category

📝 Abstract
Fine-tuning large language models (LLMs) with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific domains. Given the shared characteristics in data across different organizations, the idea of collaboratively fine-tuning an LLM using data from multiple sources presents an appealing opportunity. However, organizations are often reluctant to share local data, making centralized fine-tuning impractical. Federated learning (FL), a privacy-preserving framework, enables clients to retain local data while sharing only model parameters for collaborative training, offering a potential solution. While fine-tuning LLMs on centralized datasets risks data leakage through next-token prediction, the iterative aggregation process in FL results in a global model that encapsulates generalized knowledge, which some believe protects client privacy. In this paper, however, we present contradictory findings through extensive experiments. We show that attackers can still extract training data from the global model, even using straightforward generation methods, with leakage increasing as the model size grows. Moreover, we introduce an enhanced attack strategy tailored to FL, which tracks global model updates during training to intensify privacy leakage. To mitigate these risks, we evaluate privacy-preserving techniques in FL, including differential privacy, regularization-constrained updates and adopting LLMs with safety alignment. Our results provide valuable insights and practical guidelines for reducing privacy risks when training LLMs with FL.
Problem

Research questions and friction points this paper is trying to address.

Evaluating privacy vulnerabilities in federated learning for LLM fine-tuning
Developing enhanced attack methods to extract training data from global models
Assessing defense techniques to mitigate privacy risks in FL-based LLM training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated differential privacy for FL protection
Assessed regularization-constrained update strategies
Tested safety-aligned LLMs to reduce leakage
🔎 Similar Papers
No similar papers found.
W
Wenkai Guo
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
X
Xuefeng Liu
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute of Beihang University, Zhejiang Key Laboratory of Industrial Big Data and Robot Intelligent Systems, Hangzhou, China; Zhongguancun Laboratory, Beijing, China
Haolin Wang
Haolin Wang
Ph.D. Student. Georgia Institute of Technology
infrastructure monitoringasset managementAIMLcomputer vision
J
Jianwei Niu
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute of Beihang University, Zhejiang Key Laboratory of Industrial Big Data and Robot Intelligent Systems, Hangzhou, China; Zhongguancun Laboratory, Beijing, China
Shaojie Tang
Shaojie Tang
University at Buffalo
OptimizationMachine Learning
J
Jing Yuan
University of North Texas, Denton, Texas, USA