Large Scale Knowledge Washing

πŸ“… 2024-05-26
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 5
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the security and compliance risks arising from large language models’ (LLMs) over-memorization of private, sensitive, or copyrighted content, formalizing the challenge as *large-scale knowledge scrubbing*β€”the controlled forgetting of vast amounts of factual knowledge. To this end, we propose LAW (Layer-wise Adaptive Weight editing), a method grounded in the hypothesis that knowledge representation and reasoning capabilities are decoupled in LLMs. LAW achieves targeted fact erasure by selectively optimizing weights in specific MLP layers of the decoder, without requiring downstream task data or full-model fine-tuning. Crucially, it preserves logical reasoning and linguistic fluency (>98% retention) while efficiently removing target facts. Extensive experiments across multiple knowledge-forgetting benchmarks demonstrate that LAW significantly outperforms existing methods, establishing a novel paradigm for safe, controllable knowledge forgetting in LLMs.

Technology Category

Application Category

πŸ“ Abstract
Large language models show impressive abilities in memorizing world knowledge, which leads to concerns regarding memorization of private information, toxic or sensitive knowledge, and copyrighted content. We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge. Previous unlearning methods usually define the reverse loss and update the model via backpropagation, which may affect the model's fluency and reasoning ability or even destroy the model due to extensive training with the reverse loss. Existing works introduce additional data from downstream tasks to prevent the model from losing capabilities, which requires downstream task awareness. Controlling the tradeoff of unlearning and maintaining existing capabilities is also challenging. To this end, we propose LAW (Large Scale Washing) to update the MLP layers in decoder-only large language models to perform knowledge washing, as inspired by model editing methods and based on the hypothesis that knowledge and reasoning are disentanglable. We derive a new objective with the knowledge to be unlearned to update the weights of certain MLP layers. Experimental results demonstrate the effectiveness of LAW in forgetting target knowledge while maintaining reasoning ability. The code will be open-sourced at https://github.com/wangyu-ustc/LargeScaleWashing.
Problem

Research questions and friction points this paper is trying to address.

Unlearning private and sensitive information
Maintaining model fluency and reasoning
Updating MLP layers for knowledge washing
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLP layer updates
Knowledge and reasoning disentanglement
Targeted knowledge unlearning
πŸ”Ž Similar Papers
No similar papers found.