🤖 AI Summary
To address the vulnerability of sensitive data to membership inference attacks (MIAs) during large language model (LLM) fine-tuning, this paper proposes an influence-driven selective data obfuscation defense. First, it systematically uncovers the intrinsic mechanism by which MIAs exploit loss reduction patterns during fine-tuning to infer membership. Second, it models sample importance via gradient-based influence estimation and selectively injects controllable noise only into high-influence training samples—thereby enabling explicit privacy–utility trade-offs. Extensive experiments across six domains and multiple LLM scales demonstrate that the method reduces MIA success rates by 38.7% on average while degrading fine-tuning accuracy by less than 1.2%. It significantly outperforms existing defenses, achieving both strong privacy protection and high practical utility.
📝 Abstract
Large language models (LLMs) have achieved remarkable success and are widely adopted for diverse applications. However, fine-tuning these models often involves private or sensitive information, raising critical privacy concerns. In this work, we conduct the first comprehensive study evaluating the vulnerability of fine-tuned LLMs to membership inference attacks (MIAs). Our empirical analysis demonstrates that MIAs exploit the loss reduction during fine-tuning, making them highly effective in revealing membership information. These findings motivate the development of our defense. We propose SOFT ( extbf{S}elective data extbf{O}bfuscation in LLM extbf{F}ine- extbf{T}uning), a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection. Our extensive experiments span six diverse domains and multiple LLM architectures and scales. Results show that SOFT effectively reduces privacy risks while maintaining competitive model performance, offering a practical and scalable solution to safeguard sensitive information in fine-tuned LLMs.