🤖 AI Summary
This work addresses the security threat posed by malicious participants in federated fine-tuning of large language models, who can degrade global performance through manipulative updates. To this end, the paper proposes AugMP, a novel attack strategy that, for the first time, integrates graph representation learning into federated fine-tuning. AugMP employs graph neural networks to model feature correlations among benign updates, thereby generating malicious updates that are both highly effective and stealthy. It further leverages an augmented Lagrangian dual optimization framework to embed adversarial objectives while preserving the statistical characteristics of benign parameter distributions. Experimental results demonstrate that AugMP can reduce global accuracy by up to 26% and local proxy accuracy by up to 22% across multiple large language models, while effectively evading mainstream defenses based on distance- or similarity-based detection mechanisms.
📝 Abstract
Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM updates without sharing local raw data. However, FFT-based LLMs remain vulnerable to model manipulation threats, in which adversarial participants upload manipulated LLM updates that corrupt the aggregation process and degrade the performance of the global LLM. In this paper, we propose an Augmented Model maniPulation (AugMP) strategy against FFT-based LLMs. Specifically, we design a novel graph representation learning framework that captures feature correlations among benign LLM updates to guide the generation of malicious updates. To enhance manipulation effectiveness and stealthiness, we develop an iterative manipulation algorithm based on an augmented Lagrangian dual formulation. Through this formulation, malicious updates are optimized to embed adversarial objectives while preserving benign-like parameter characteristics. Experimental results across multiple LLM backbones demonstrate that the AugMP strategy achieves the strongest manipulation performance among all competing baselines, reducing the global LLM accuracy by up to 26% and degrading the average accuracy of local LLM agents by up to 22%. Meanwhile, AugMP maintains high statistical and geometric consistency with benign updates, enabling it to evade conventional distance- and similarity-based defense methods.