REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address pervasive overfitting in large language model (LLM) knowledge editing—where factual updates propagate beyond the target scope and excessively activate in irrelevant contexts—this paper proposes a two-stage controllable editing framework. The method first extracts direction vectors grounded in “belief shift” to precisely identify the knowledge representation subspace for editing. Second, it introduces a context-aware gating perturbation mechanism that jointly modulates hidden-state perturbation magnitude via a learnable linear transformation and a pretrained classifier. Evaluated on standard benchmarks, the approach significantly suppresses overfitting on EVOKE, while achieving high reliability, strong locality, and robust generalization on COUNTERFACT and MQuAKE. It unifies editing accuracy, necessity, and controllability—advancing both fidelity and interpretability in LLM knowledge modification.

Technology Category

Application Category

📝 Abstract
Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editing. In the initial phase, we utilize tailored stimuli to extract latent factual representations and apply Principal Component Analysis with a simple learnbale linear transformation to compute a directional"belief shift"vector for each instance. In the second phase, we apply controllable perturbations to hidden states using the obtained vector with a magnitude scalar, gated by a pre-trained classifier that permits edits only when contextually necessary. Relevant experiments on EVOKE benchmarks demonstrate that REACT significantly reduces overfitting across nearly all evaluation metrics, and experiments on COUNTERFACT and MQuAKE shows that our method preserves balanced basic editing performance (reliability, locality, and generality) under diverse editing scenarios.
Problem

Research questions and friction points this paper is trying to address.

Overfitting in large language model knowledge editing methods
Uncontrolled propagation of factual updates beyond intended scope
Balancing precision and controllability in knowledge edits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts latent representations using tailored stimuli
Applies controllable perturbations with belief shift vectors
Gates edits via pre-trained classifier for contextual necessity
🔎 Similar Papers
No similar papers found.
Haitian Zhong
Haitian Zhong
Institute of Automation, Chinese Academy of Sciences
Large Language ModelsTrustworthy AIAI for Science
Y
Yuhuan Liu
Cuiying Honors College, Lanzhou University
Ziyang Xu
Ziyang Xu
The Chinese University of Hong Kong
AI for ScienceBioinformaticsMedical Image Processing
G
Guofan Liu
NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences; Tencent
Q
Qiang Liu
NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences
S
Shu Wu
NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences
Z
Zhe Zhao
Tencent
L
Liang Wang
NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences
Tieniu Tan
Tieniu Tan
Institute of Automation, Chinese Academy of Sciences