Exclusive Unlearning

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge of harmful content generation by large language models in sensitive domains such as healthcare and education, where existing unlearning methods struggle to comprehensively eliminate diverse forms of unsafe knowledge. The authors propose a novel paradigm termed “exclusive unlearning,” which departs from conventional approaches that specify individual forgetting targets. Instead, it adopts a “retain-only-safe” principle, preserving solely the knowledge and expressive capabilities essential to specific domains—such as medicine or mathematics—while broadly unlearning all other content. By designing a reverse training framework centered on knowledge retention and integrating domain-specific instruction tuning with safety constraints, the resulting model effectively resists a wide range of harmful inputs and jailbreaking attacks while maintaining strong instruction-following performance in its designated professional domain.

Technology Category

Application Category

📝 Abstract

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful content makes comprehensive removal difficult. In this study, instead of individually listing targets for forgetting, we propose Exclusive Unlearning (EU), which aims for broad harm removal by extensively forgetting everything except for the knowledge and expressions we wish to retain. We demonstrate that through Exclusive Unlearning, it is possible to obtain a model that ensures safety against a wide range of inputs, including jailbreaks, while maintaining the ability to respond to diverse instructions related to specific domains such as medicine and mathematics.

Problem

Research questions and friction points this paper is trying to address.

Exclusive Unlearning

harmful content

machine unlearning

large language models

safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exclusive Unlearning

machine unlearning

harmful content removal