Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

📅 2024-10-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
To address the need for dynamic, context-sensitive isolation of sensitive knowledge in multi-stakeholder scenarios involving large language models (LLMs), this paper introduces “contextual knowledge forgetting”—a novel paradigm wherein the model autonomously triggers selective forgetting during inference based on query context (e.g., user permission tokens), enabling authorized users to access sensitive knowledge while rendering it inaccessible to unauthorized ones. Methodologically, we propose a context-aware instruction-tuning strategy and conduct inter-layer behavioral analysis, uncovering an intrinsic mechanism whereby the model actively “simulates forgetting” at its final hidden layer. Evaluated on TOFU and AGE benchmarks using Llama2-7B/13B and Mistral-7B, our approach achieves 95% forgetting accuracy and 80% retention of unrelated knowledge—substantially outperforming existing baselines. This work constitutes the first systematic formalization and realization of fine-grained, test-time, context-driven knowledge forgetting.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information has become increasingly essential. For instance, LLMs are expected to provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities. In response to this challenge, we propose a novel method termed ``in-context knowledge unlearning'', which enables the model to selectively forget information in test-time based on the context of the query. Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving other knowledge. Experiments on the TOFU and AGE datasets using Llama2-7B/13B and Mistral-7B models show our method achieves up to 95% forgetting accuracy while retaining 80% of unrelated knowledge, significantly outperforming baselines in both in-domain and out-of-domain scenarios. Further investigation into the model's internal behavior revealed that while fine-tuned LLMs generate correct predictions in the middle layers and maintain them up to the final layer, they make the decision to forget at the last layer, i.e., ``LLMs pretend to forget''. Our findings offer valuable insights into enhancing the robustness of unlearning mechanisms in LLMs, setting a foundation for future research in the field.
Problem

Research questions and friction points this paper is trying to address.

Enables selective forgetting of specific information in LLMs
Fine-tunes LLMs to forget target knowledge contextually
Improves robustness of unlearning mechanisms in language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-context knowledge unlearning for selective forgetting
Fine-tunes LLMs to forget target knowledge contextually
Achieves high forget accuracy while preserving unrelated knowledge
🔎 Similar Papers
No similar papers found.
S
Shota Takashiro
The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656 Japan
T
Takeshi Kojima
The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656 Japan
A
Andrew Gambardella
The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656 Japan
Q
Qi Cao
The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656 Japan
Yusuke Iwasawa
Yusuke Iwasawa
The University of Tokyo
deep learningtransfer learningfoundation modelmeta learning
Y
Yutaka Matsuo
The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656 Japan