SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address privacy risks arising from sensitive information memorization during large language model (LLM) training—and the performance degradation commonly induced by existing machine unlearning methods—this paper proposes a two-stage constrained unloading framework. First, leveraging causal mediation analysis, we identify, for the first time, the critical role of MLP modules in the initial five Transformer layers in encoding subject-attribute knowledge associations. Second, we introduce a layer-specific optimization paradigm that jointly applies cross-entropy penalization and adaptive regularization, enabling hierarchical parameter freezing and knowledge-isolated training. Evaluated on SemEval-2025 Task 4, our method achieves second place in the 1B-model track, demonstrating effective selective forgetting while preserving 88% of the original MMLU accuracy—thereby balancing unlearning efficacy with model capability stability.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) frequently memorize sensitive information during training, posing risks when deploying publicly accessible models. Current machine unlearning methods struggle to selectively remove specific data associations without degrading overall model capabilities. This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which introduces a two-stage methodology that combines causal mediation analysis with layer-specific optimization. Through systematic causal tracing experiments on OLMo architectures (1B and 7B parameters), we identify the critical role of the first few transformer layers (layers 0-5) in storing subject-attribute associations within MLP modules. Building on this insight, we develop a constrained optimization approach that freezes upper layers while applying a novel joint loss function to lower layers-simultaneously maximizing forget set loss via output token cross-entropy penalties and minimizing retain set deviation through adaptive regularization. Our method achieves 2nd place in the 1B model track, demonstrating strong task performance while maintaining 88% of baseline MMLU accuracy. These results establish causal-informed layer optimization as a promising paradigm for efficient, precise unlearning in LLMs, offering a significant step forward in addressing data privacy concerns in AI systems.

Problem

Research questions and friction points this paper is trying to address.

Selectively remove sensitive data from LLMs without performance loss

Identify key layers storing subject-attribute associations in transformers

Develop constrained optimization for precise unlearning while preserving accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage method with causal mediation analysis

Layer-specific optimization targeting lower layers

Joint loss function for selective unlearning

🔎 Similar Papers

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis