Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing machine unlearning (MU) methods for large language models (LLMs) only support sentence-level target removal, failing to achieve fine-grained forgetting of abstract concepts such as persons or events. To address this limitation, we propose *Conceptual Forgetting* (CU), the first paradigm enabling concept-level unlearning. CU models internal knowledge as a self-constructed knowledge graph and precisely identifies and removes target concepts via knowledge triplets. Our method integrates a graph-structure-driven forgetting mechanism, interpretable sentence-guided supervision, and self-generated triplet reasoning—ensuring preservation of unrelated knowledge while enhancing privacy and copyright compliance. Extensive experiments on both real-world and synthetic datasets demonstrate CU’s effectiveness: it significantly outperforms baseline MU approaches in concept-level forgetting accuracy, exhibits strong generalization across unseen concepts, and maintains robustness under distributional shifts and adversarial perturbations.

Technology Category

Application Category

📝 Abstract

Machine Unlearning (MU) has recently attracted considerable attention as a solution to privacy and copyright issues in large language models (LLMs). Existing MU methods aim to remove specific target sentences from an LLM while minimizing damage to unrelated knowledge. However, these approaches require explicit target sentences and do not support removing broader concepts, such as persons or events. To address this limitation, we introduce Concept Unlearning (CU) as a new requirement for LLM unlearning. We leverage knowledge graphs to represent the LLM's internal knowledge and define CU as removing the forgetting target nodes and associated edges. This graph-based formulation enables a more intuitive unlearning and facilitates the design of more effective methods. We propose a novel method that prompts the LLM to generate knowledge triplets and explanatory sentences about the forgetting target and applies the unlearning process to these representations. Our approach enables more precise and comprehensive concept removal by aligning the unlearning process with the LLM's internal knowledge representations. Experiments on real-world and synthetic datasets demonstrate that our method effectively achieves concept-level unlearning while preserving unrelated knowledge.

Problem

Research questions and friction points this paper is trying to address.

Removing broader concepts like persons or events from LLMs

Addressing privacy and copyright issues through concept unlearning

Aligning unlearning process with LLM's internal knowledge representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using knowledge graphs for concept representation

Prompting LLMs to generate knowledge triplets

Aligning unlearning with internal knowledge structures

🔎 Similar Papers

Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces