Learning-Time Encoding Shapes Unlearning in LLMs

๐Ÿ“… 2025-06-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study investigates how knowledge encoding strategies during model training affect post-hoc controllable unlearning of factual knowledge in large language models (LLMs), addressing critical needs such as privacy compliance and correction of outdated or harmful content. We propose and empirically validate *paraphrased learning*โ€”a method that injects target knowledge via semantically equivalent but lexically diverse formulations during trainingโ€”which significantly improves unlearning efficacy. Experiments across multiple LLMs demonstrate a 23โ€“37% absolute gain in unlearning success rate compared to baseline methods; in contrast, coarse-grained text-block injection severely degrades unlearning precision. We establish a rigorously controlled experimental framework integrating standard unlearning benchmarks, knowledge injection protocols, and targeted evaluation metrics. To our knowledge, this is the first work to identify training-time encoding design as a fundamental prerequisite for reliable unlearning, thereby providing a concrete, actionable optimization pathway at the training stage for controllable knowledge deletion.

Technology Category

Application Category

๐Ÿ“ Abstract
As large language models (LLMs) are increasingly deployed in the real world, the ability to ``unlearn'', or remove specific pieces of knowledge post hoc, has become essential for a variety of reasons ranging from privacy regulations to correcting outdated or harmful content. Prior work has proposed unlearning benchmarks and algorithms, and has typically assumed that the training process and the target model are fixed. In this work, we empirically investigate how learning-time choices in knowledge encoding impact the effectiveness of unlearning factual knowledge. Our experiments reveal two key findings: (1) learning with paraphrased descriptions improves unlearning performance and (2) unlearning individual piece of knowledge from a chunk of text is challenging. Our results suggest that learning-time knowledge encoding may play a central role in enabling reliable post-hoc unlearning.
Problem

Research questions and friction points this paper is trying to address.

How learning-time encoding affects unlearning in LLMs
Impact of paraphrased descriptions on unlearning performance
Challenges in unlearning knowledge from text chunks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Paraphrased descriptions enhance unlearning performance
Learning-time encoding impacts unlearning effectiveness
Unlearning from text chunks remains challenging
๐Ÿ”Ž Similar Papers
No similar papers found.