Exclusive Unlearning

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of harmful content generation by large language models in sensitive domains such as healthcare and education, where existing unlearning methods struggle to comprehensively eliminate diverse forms of unsafe knowledge. The authors propose a novel paradigm termed “exclusive unlearning,” which departs from conventional approaches that specify individual forgetting targets. Instead, it adopts a “retain-only-safe” principle, preserving solely the knowledge and expressive capabilities essential to specific domains—such as medicine or mathematics—while broadly unlearning all other content. By designing a reverse training framework centered on knowledge retention and integrating domain-specific instruction tuning with safety constraints, the resulting model effectively resists a wide range of harmful inputs and jailbreaking attacks while maintaining strong instruction-following performance in its designated professional domain.
📝 Abstract
When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensive removal difficult. In this study, instead of individually listing targets for forgetting, we propose Exclusive Unlearning (EU), which aims for broad harm removal by extensively forgetting everything except for the knowledge and expressions we wish to retain. We demonstrate that through Exclusive Unlearning, it is possible to obtain a model that ensures safety against a wide range of inputs, including jailbreaks, while maintaining the ability to respond to diverse instructions related to specific domains such as medicine and mathematics.
Problem

Research questions and friction points this paper is trying to address.

Exclusive Unlearning
harmful content
machine unlearning
large language models
safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exclusive Unlearning
machine unlearning
harmful content removal
LLM safety
knowledge retention
🔎 Similar Papers
No similar papers found.