Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

This work addresses the challenge that large language models often suffer degradation in general capabilities when selectively forgetting specific knowledge, reflecting a fundamental trade-off between forgetting and retention. To tackle this issue, the paper formulates the problem as an asymmetric dual-task learning framework prioritizing knowledge retention over forgetting. It introduces SAGO, a retention-prioritized gradient synthesis framework that constructively aligns gradients of the retention task through sign constraints, integrates an enhanced PCGrad-based conflict resolution mechanism, and employs a gradient geometry reshaping strategy to optimize the multi-task gradient structure. Experimental results demonstrate that SAGO substantially advances the Pareto frontier, recovering MMLU performance from 44.6% to 96.0% on the WMDP Bio benchmark while maintaining strong forgetting efficacy.

Technology Category

Application Category

📝 Abstract

Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM unlearning as an asymmetric two-task problem: retention is the primary objective and forgetting is an auxiliary. From this perspective, we propose a retention-prioritized gradient synthesis framework that decouples task-specific gradient extraction from conflict-aware combination. Instantiating the framework, we adapt established PCGrad to resolve gradient conflicts, and introduce SAGO, a novel retention-prioritized gradient synthesis method. Theoretically, both variants ensure non-negative cosine similarity with the retain gradient, while SAGO achieves strictly tighter alignment through constructive sign-constrained synthesis. Empirically, on WMDP Bio/Cyber and RWKU benchmarks, SAGO consistently pushes the Pareto frontier: e.g., on WMDP Bio (SimNPO+GD), recovery of target model MMLU performance progresses from 44.6% (naive) to 94.0% (+PCGrad) and further to 96.0% (+SAGO), while maintaining comparable forgetting strength. Our results show that re-shaping gradient geometry, rather than re-balancing losses, is the key to mitigating unlearning-retention trade-offs.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

large language models

knowledge removal

capability retention

unlearning-retention trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM unlearning

asymmetric two-task learning

gradient synthesis