NegMerge: Consensual Weight Negation for Strong Machine Unlearning

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

188K/year

🤖 AI Summary

Machine unlearning faces challenges including hyperparameter sensitivity and unstable unlearning efficacy. This paper proposes a consensus-based task vector unlearning paradigm that incurs no additional training or validation overhead: multiple fine-tuned models are obtained under diverse hyperparameter configurations; task vectors with consistent sign patterns across configurations are extracted; a robust consensus vector is synthesized via sign-consensus merging; and model-level negative merging is performed. By circumventing bias from single hyperparameter selection, the method significantly enhances unlearning strength while improving retention-set stability. Evaluated on vision-language models and image classification models, it outperforms state-of-the-art methods: retention-set accuracy degradation is reduced by up to 42%, with zero added computational cost.

Technology Category

Application Category

📝 Abstract

Machine unlearning aims to selectively remove specific knowledge from a model. Current methods, such as task arithmetic, rely on fine-tuning models on the forget set, generating a task vector, and subtracting it from the original model. However, we argue the effectiveness of this approach is highly sensitive to hyperparameter selection, necessitating careful validation to identify the best model among many fine-tuned candidates. In this paper, we propose a novel method that leverages all given fine-tuned models rather than selecting a single one. By constructing task vectors from models trained with varied hyperparameters and merging only the components of the task vectors with consistent signs, we perform unlearning by negating the merged task vector from the original model. Given that existing methods also utilize multiple fine-tuned models, our approach delivers more effective unlearning without incurring additional computational costs. We demonstrate the effectiveness of our method on both vision-language models and standard image classification models, showing improved unlearning performance with minimal degradation on the retain set, outperforming state-of-the-art techniques.

Problem

Research questions and friction points this paper is trying to address.

Selectively remove knowledge from trained models

Improve hyperparameter sensitivity in unlearning methods

Enhance unlearning performance with sign-consensual merging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates task vectors with consistent signs

Negates merged vector for effective unlearning

Outperforms state-of-the-art with fewer resources

🔎 Similar Papers

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models