Representation-Guided Parameter-Efficient LLM Unlearning

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Large language models are prone to memorizing sensitive or harmful information, yet existing parameter-efficient unlearning methods struggle to erase target knowledge without degrading performance on unrelated tasks, primarily because their parameter importance metrics fail to effectively disentangle the semantics of the forget set from the retain set. This work proposes REGLU, the first approach to incorporate the geometric structure of representation spaces into parameter-efficient unlearning. REGLU employs representation-guided LoRA initialization to identify an optimal forgetting subspace and introduces a regularization loss that enforces orthogonality between LoRA updates and the representation subspace of the retain set, thereby achieving clean decoupling between forgetting and retention. Experiments demonstrate that REGLU significantly outperforms current methods on the TOFU and WMDP benchmarks, achieving both more thorough unlearning and superior retention across multiple model architectures.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often memorize sensitive or harmful information, necessitating effective machine unlearning techniques. While existing parameter-efficient unlearning methods have shown promise, they still struggle with the forget-retain trade-off. This can be attributed to their reliance on parameter importance metrics to identify parameters that are important exclusively for the forget set, which is fundamentally limited by the superposition phenomenon. Due to the polysemantic nature of LLM parameters, such an importance metric may struggle to disentangle parameters associated with the forget and retain sets. In this work, we propose Representation-Guided Low-rank Unlearning (REGLU), a novel approach that leverages the geometric properties of representation spaces to achieve robust and precise unlearning. First, we develop a representation-guided initialization for LoRA that identifies the optimal subspace for selective forgetting. Second, we introduce a regularization loss that constrains the outputs of the LoRA update to lie in the orthogonal complement of the retain set's representation subspace, thereby minimizing interference with the model's performance on the retain set. We evaluate REGLU on the TOFU and WMDP benchmarks across multiple models. Our results demonstrate that REGLU consistently outperforms state-of-the-art baselines, achieving superior unlearning quality while maintaining higher model utility.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

forget-retain trade-off

parameter-efficient fine-tuning

representation space

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

representation-guided unlearning

parameter-efficient unlearning

LoRA