🤖 AI Summary
Existing machine unlearning (MU) methods for large language models (LLMs) only support sentence-level target removal, failing to achieve fine-grained forgetting of abstract concepts such as persons or events. To address this limitation, we propose *Conceptual Forgetting* (CU), the first paradigm enabling concept-level unlearning. CU models internal knowledge as a self-constructed knowledge graph and precisely identifies and removes target concepts via knowledge triplets. Our method integrates a graph-structure-driven forgetting mechanism, interpretable sentence-guided supervision, and self-generated triplet reasoning—ensuring preservation of unrelated knowledge while enhancing privacy and copyright compliance. Extensive experiments on both real-world and synthetic datasets demonstrate CU’s effectiveness: it significantly outperforms baseline MU approaches in concept-level forgetting accuracy, exhibits strong generalization across unseen concepts, and maintains robustness under distributional shifts and adversarial perturbations.
📝 Abstract
Machine Unlearning (MU) has recently attracted considerable attention as a solution to privacy and copyright issues in large language models (LLMs). Existing MU methods aim to remove specific target sentences from an LLM while minimizing damage to unrelated knowledge. However, these approaches require explicit target sentences and do not support removing broader concepts, such as persons or events. To address this limitation, we introduce Concept Unlearning (CU) as a new requirement for LLM unlearning. We leverage knowledge graphs to represent the LLM's internal knowledge and define CU as removing the forgetting target nodes and associated edges. This graph-based formulation enables a more intuitive unlearning and facilitates the design of more effective methods. We propose a novel method that prompts the LLM to generate knowledge triplets and explanatory sentences about the forgetting target and applies the unlearning process to these representations. Our approach enables more precise and comprehensive concept removal by aligning the unlearning process with the LLM's internal knowledge representations. Experiments on real-world and synthetic datasets demonstrate that our method effectively achieves concept-level unlearning while preserving unrelated knowledge.