SEPS: A Separability Measure for Robust Unlearning in LLMs

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing machine learning unlearning methods perform poorly under mixed-query scenarios—where both forget and retain targets coexist within a single prompt—suffering from non-target forgetting (over-removal) and target forgetting failure due to overfitting on individual queries. Method: This paper introduces a novel selective unlearning paradigm for large language models (LLMs), proposing the first SEPS (Separability Evaluation for Prompt-based Unlearning) framework to formally model the interplay between forgetting and retention capabilities within a single prompt. We design Mixed Prompt unified training, integrating multi-benchmark mixed evaluation, joint optimization objectives, controllable gradient masking, and prompt-aware loss functions. Contribution/Results: Evaluated across three benchmark categories, our method supports up to eight concurrent mixed queries per prompt, achieving a 32.7% improvement in forgetting accuracy while maintaining retention accuracy above 94%. This significantly enhances the robustness and practicality of selective unlearning in LLMs.

Technology Category

Application Category

📝 Abstract
Machine unlearning aims to selectively remove targeted knowledge from Large Language Models (LLMs), ensuring they forget specified content while retaining essential information. Existing unlearning metrics assess whether a model correctly answers retain queries and rejects forget queries, but they fail to capture real-world scenarios where forget queries rarely appear in isolation. In fact, forget and retain queries often coexist within the same prompt, making mixed-query evaluation crucial. We introduce SEPS, an evaluation framework that explicitly measures a model's ability to both forget and retain information within a single prompt. Through extensive experiments across three benchmarks, we identify two key failure modes in existing unlearning methods: (1) untargeted unlearning indiscriminately erases both forget and retain content once a forget query appears, and (2) targeted unlearning overfits to single-query scenarios, leading to catastrophic failures when handling multiple queries. To address these issues, we propose Mixed Prompt (MP) unlearning, a strategy that integrates both forget and retain queries into a unified training objective. Our approach significantly improves unlearning effectiveness, demonstrating robustness even in complex settings with up to eight mixed forget and retain queries in a single prompt.
Problem

Research questions and friction points this paper is trying to address.

Measure LLM's ability to forget and retain information simultaneously
Address failure modes in existing unlearning methods for mixed queries
Propose robust unlearning strategy for complex multi-query prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

SEPS measures unlearning in mixed-query prompts
Mixed Prompt unlearning integrates forget and retain queries
Robust unlearning for up to eight mixed queries
🔎 Similar Papers
No similar papers found.