Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation regarding the impact of posterior unlearning methods on compositional generation capabilities when removing undesirable concepts—such as explicit content—from text-to-image diffusion models. Focusing on Stable Diffusion 1.4, the work presents the first empirical analysis of state-of-the-art unlearning approaches through the lens of compositional generation, leveraging benchmarks including T2I-CompBench++, GenEval, and established unlearning evaluations. The investigation specifically examines dimensions such as attribute binding, spatial reasoning, and counting. Results reveal a pervasive trade-off between effective concept removal and preservation of compositional integrity: aggressive unlearning significantly degrades compositional abilities, whereas methods preserving semantic structure often fail to sufficiently suppress the target concept. This highlights a fundamental limitation in current unlearning paradigms in simultaneously achieving targeted suppression and semantic fidelity.

Technology Category

Application Category

📝 Abstract

Post-hoc unlearning has emerged as a practical mechanism for removing undesirable concepts from large text-to-image diffusion models. However, prior work primarily evaluates unlearning through erasure success; its impact on broader generative capabilities remains poorly understood. In this work, we conduct a systematic empirical study of concept unlearning through the lens of compositional text-to-image generation. Focusing on nudity removal in Stable Diffusion 1.4, we evaluate a diverse set of state-of-the-art unlearning methods using T2I-CompBench++ and GenEval, alongside established unlearning benchmarks. Our results reveal a consistent trade-off between unlearning effectiveness and compositional integrity: methods that achieve strong erasure frequently incur substantial degradation in attribute binding, spatial reasoning, and counting. Conversely, approaches that preserve compositional structure often fail to provide robust erasure. These findings highlight limitations of current evaluation practices and underscore the need for unlearning objectives that explicitly account for semantic preservation beyond targeted suppression.

Problem

Research questions and friction points this paper is trying to address.

unlearning

text-to-image diffusion models

compositional generation

concept erasure

semantic preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept unlearning

compositional generation

text-to-image diffusion models