🤖 AI Summary
To address privacy and security risks arising from residual sensitive information in fine-tuned large language models (LLMs), this paper proposes LLMEraser, an instance-level efficient unlearning framework. Methodologically, LLMEraser leverages influence functions to precisely localize sample-level influences in a class-aware manner, and integrates LoRA adapter perturbation with reverse calibration to achieve fine-grained, task-adaptive parameter erasure—without full retraining. It introduces the first unified, parameter-efficient unlearning paradigm compatible with mainstream PEFT architectures, simultaneously supporting diverse unlearning tasks and preserving model utility. Empirical evaluation across multiple benchmark datasets demonstrates that LLMEraser achieves over 92% unlearning success rate while incurring less than 1.5% degradation in downstream task accuracy—significantly outperforming existing baselines.
📝 Abstract
The advent of Large Language Models (LLMs) has revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks. Fine-tuning these models for specific domains, particularly through Parameter-Efficient Fine-Tuning (PEFT) strategies like LoRA, has become a prevalent practice due to its efficiency. However, this raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information. To address these issues, we introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise parameter adjustments using influence functions. Unlike traditional unlearning techniques that are often limited in scope and require extensive retraining, LLMEraser is designed to handle a broad spectrum of unlearning tasks without compromising model performance. Extensive experiments on benchmark datasets demonstrate that LLMEraser excels in efficiently managing various unlearning scenarios while maintaining the overall integrity and efficacy of the models.