LLM Unlearning Should Be Form-Independent

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM unlearning methods heavily rely on the specific surface forms of training samples, exhibiting poor generalization and failing to handle the diverse linguistic expressions of knowledge in real-world scenarios. This paper formally defines the “form-dependency bias” and proposes Rank-one Concept Redirection (ROCR), a training-free, sub-second unlearning method. ROCR identifies hazardous concepts via concept activation analysis and redirects their representations through rank-one parameter remapping—enabling form-agnostic unlearning. It requires no retraining, auxiliary data, or architectural modification, and is compatible with mainstream LLMs. Evaluated on the newly constructed ORT benchmark, ROCR significantly outperforms prior approaches across three key dimensions: unlearning effectiveness, output fluency, and robustness to lexical and syntactic variations in target concept expression.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) unlearning aims to erase or suppress undesirable knowledge within the model, offering promise for controlling harmful or private information to prevent misuse. However, recent studies highlight its limited efficacy in real-world scenarios, hindering practical adoption. In this study, we identify a pervasive issue underlying many downstream failures: the effectiveness of existing unlearning methods heavily depends on the form of training samples and frequently fails to generalize to alternate expressions of the same knowledge. We formally characterize this problem as Form-Dependent Bias and systematically investigate its specific manifestation patterns across various downstream tasks. To quantify its prevalence and support future research, we introduce ORT, a novel benchmark designed to evaluate the robustness of unlearning methods against variations in knowledge expression. Results reveal that Form-Dependent Bias is both widespread and severe among current techniques. We argue that LLM unlearning should be form-independent to address the endless forms of downstream tasks encountered in real-world security-critical scenarios. Towards this goal, we introduce Rank-one Concept Redirection (ROCR), a novel training-free method, as a promising solution path. ROCR performs unlearning by targeting the invariants in downstream tasks, specifically the activated dangerous concepts. It is capable of modifying model parameters within seconds to redirect the model's perception of a specific unlearning target concept to another harmless concept. Extensive experiments demonstrate that ROCR significantly improves unlearning effectiveness compared to traditional methods while generating highly natural outputs.
Problem

Research questions and friction points this paper is trying to address.

LLM unlearning effectiveness depends on training sample form
Form-Dependent Bias limits generalization to alternate knowledge expressions
Current unlearning methods struggle with real-world security-critical scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Rank-one Concept Redirection (ROCR)
Targets invariants in downstream tasks
Modifies model parameters in seconds
🔎 Similar Papers
No similar papers found.
Xiaotian Ye
Xiaotian Ye
Beijing University of Posts and Telecommunications
Natural Language ProcessingKnowledge RepresentationLarge Language Models
M
Mengqi Zhang
Shandong University
S
Shu Wu
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences