Learning-Time Encoding Shapes Unlearning in LLMs

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This study investigates how knowledge encoding strategies during model training affect post-hoc controllable unlearning of factual knowledge in large language models (LLMs), addressing critical needs such as privacy compliance and correction of outdated or harmful content. We propose and empirically validate *paraphrased learning*—a method that injects target knowledge via semantically equivalent but lexically diverse formulations during training—which significantly improves unlearning efficacy. Experiments across multiple LLMs demonstrate a 23–37% absolute gain in unlearning success rate compared to baseline methods; in contrast, coarse-grained text-block injection severely degrades unlearning precision. We establish a rigorously controlled experimental framework integrating standard unlearning benchmarks, knowledge injection protocols, and targeted evaluation metrics. To our knowledge, this is the first work to identify training-time encoding design as a fundamental prerequisite for reliable unlearning, thereby providing a concrete, actionable optimization pathway at the training stage for controllable knowledge deletion.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly deployed in the real world, the ability to ``unlearn'', or remove specific pieces of knowledge post hoc, has become essential for a variety of reasons ranging from privacy regulations to correcting outdated or harmful content. Prior work has proposed unlearning benchmarks and algorithms, and has typically assumed that the training process and the target model are fixed. In this work, we empirically investigate how learning-time choices in knowledge encoding impact the effectiveness of unlearning factual knowledge. Our experiments reveal two key findings: (1) learning with paraphrased descriptions improves unlearning performance and (2) unlearning individual piece of knowledge from a chunk of text is challenging. Our results suggest that learning-time knowledge encoding may play a central role in enabling reliable post-hoc unlearning.

Problem

Research questions and friction points this paper is trying to address.

How learning-time encoding affects unlearning in LLMs

Impact of paraphrased descriptions on unlearning performance

Challenges in unlearning knowledge from text chunks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Paraphrased descriptions enhance unlearning performance

Learning-time encoding impacts unlearning effectiveness

Unlearning from text chunks remains challenging

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning