🤖 AI Summary
Existing unlearning methods for large language models treat forget samples as isolated instances, neglecting their structural interdependencies—such as logical entailments in knowledge graphs, batch-level correlations, and domain-level shifts—leading to incomplete or cascading forgetting.
Method: We propose *structural unlearning*, a novel paradigm grounded in knowledge graph semantics. We introduce StructUnlearn, the first knowledge graph–driven structural unlearning benchmark, featuring multi-hop fact perturbation, domain shift simulation, and batch dependency modeling, supported by a reproducible synthetic data generation pipeline. We establish a cross-model evaluation framework using Llama2-7B and Mistral-7B.
Contributions/Results: Our systematic evaluation exposes critical failure modes of four mainstream unlearning approaches under interconnected fact scenarios. We provide the first empirical evidence that pretraining model selection significantly impacts forgetting robustness. Moreover, we rigorously characterize the fundamental trade-off between utility preservation and complete forgetting—highlighting structural dependencies as a key determinant.
📝 Abstract
Recently, machine unlearning, which seeks to erase specific data stored in the pre-trained or fine-tuned models, has emerged as a crucial protective measure for LLMs. However, unlearning approaches for LLMs that have been considered thus far have focused on the removal of independent data points and have not taken into account that the stored facts are logically connected to one another and form an implicit knowledge graph. To facilitate the development of structural unlearning methods, which are essential for the practical application of unlearning, we propose PISTOL, a pipeline for compiling multi-scenario datasets for benchmarking structural LLM unlearning. Additionally, leveraging sample datasets synthesized using PISTOL, we conducted benchmarks with four distinct unlearning methods on both Llama2-7B and Mistral-7B models. This analysis helps to illustrate the prevailing challenges in effectively and robustly removing highly inter-connected data, batched data, or data skewed towards a specific domain. It also highlights the choice of pre-trained model can impact unlearning performance. This work not only advances our understandings on the limitation of current LLMs unlearning methods and proposes future research directions, but also provides a replicable framework for ongoing exploration and validation in the field.