π€ AI Summary
This work addresses the challenge of selective forgetting in large language models (LLMs)βprecisely removing specific information without access to original training data while preserving non-target knowledge. To this end, the authors propose Geometric Unlearning (GU), a method that constructs a low-rank geometric representation of safe model behavior using only a small set of safe reference prompts. During inference, GU employs lightweight synthetic prompts to locally align hidden states via projection, thereby achieving efficient and localized suppression of target content. Notably, GU is the first approach to combine context-anchor synthesis with teacher-distillation regularization without requiring original training data. It achieves significant suppression of target information on benchmarks such as ToFU and UnlearnPII, while maintaining near-perfect performance on non-target tasks, effectively reconciling the trade-offs among forgetting efficacy, knowledge retention, and data availability.
π Abstract
As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-time planning states without access to the original training corpus. GU distills a compact, low-rank geometry of desired safe behavior from a small set of safe reference prompts, and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden planning representations to this safe geometry. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.