Selective Fine-Tuning for Targeted and Robust Concept Unlearning

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the vulnerability of text-guided diffusion models to misuse for generating harmful content and the limitations of existing concept erasure methods, which are computationally expensive and struggle with complex or compositional concepts. To overcome these challenges, the authors propose TRUST, a novel approach that dynamically identifies neurons associated with target concepts and applies Hessian-regularized selective fine-tuning. TRUST enables efficient erasure of individual, compositional, and conditional concepts while significantly improving forgetting efficacy and inference speed. Moreover, it preserves high generation quality and enhances robustness against adversarial prompts, outperforming multiple state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Text guided diffusion models are used by millions of users, but can be easily exploited to produce harmful content. Concept unlearning methods aim at reducing the models'likelihood of generating harmful content. Traditionally, this has been tackled at an individual concept level, with only a handful of recent works considering more realistic concept combinations. However, state of the art methods depend on full finetuning, which is computationally expensive. Concept localisation methods can facilitate selective finetuning, but existing techniques are static, resulting in suboptimal utility. In order to tackle these challenges, we propose TRUST (Targeted Robust Selective fine Tuning), a novel approach for dynamically estimating target concept neurons and unlearning them through selective finetuning, empowered by a Hessian based regularization. We show experimentally, against a number of SOTA baselines, that TRUST is robust against adversarial prompts, preserves generation quality to a significant degree, and is also significantly faster than the SOTA. Our method achieves unlearning of not only individual concepts but also combinations of concepts and conditional concepts, without any specific regularization.

Problem

Research questions and friction points this paper is trying to address.

concept unlearning

diffusion models

harmful content

selective fine-tuning

concept combinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

selective fine-tuning

concept unlearning

Hessian regularization