🤖 AI Summary
This work addresses the pressing need for effective mechanisms to enable selective forgetting in large language models (LLMs) due to risks related to privacy, copyright, safety, and bias. It presents the first systematic survey and taxonomy of existing machine unlearning approaches, categorizing them into data-, parameter-, architecture-, and hybrid-based strategies. The study establishes a unified evaluation framework and quantitatively analyzes these methods across three key dimensions: unlearning efficacy, knowledge retention, and robustness. Furthermore, it consolidates relevant benchmarks, metrics, and datasets, and identifies critical challenges—including scalability, formal guarantees, cross-lingual applicability, and multimodal unlearning—to lay a theoretical foundation and provide a technical roadmap for developing reliable and responsible unlearning mechanisms in LLMs.
📝 Abstract
Large language models (LLMs) have achieved remarkable success across natural language processing tasks, yet their widespread deployment raises pressing concerns around privacy, copyright, security, and bias. Machine unlearning has emerged as a promising paradigm for selectively removing knowledge or data from trained models without full retraining. In this survey, we provide a structured overview of unlearning methods for LLMs, categorizing existing approaches into data-centric, parameter-centric, architecture-centric, hybrid, and other strategies. We also review the evaluation ecosystem, including benchmarks, metrics, and datasets designed to measure forgetting effectiveness, knowledge retention, and robustness. Finally, we outline key challenges and open problems, such as scalable efficiency, formal guarantees, cross-language and multimodal unlearning, and robustness against adversarial relearning. By synthesizing current progress and highlighting open directions, this paper aims to serve as a roadmap for developing reliable and responsible unlearning techniques in large language models.