ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing unlearning methods for vision-language models typically operate at the image or instance level, making it difficult to precisely remove specific knowledge without impairing unrelated semantics—especially when images contain multiple entangled concepts. This work proposes the first concept-level unlearning framework: leveraging a multimodal large language model to construct a task-relevant concept vocabulary from the forgetting set, and decomposing visual representations into sparse, non-negative combinations of semantic concepts. This approach enables fine-grained and interpretable knowledge manipulation. Experiments demonstrate that the method significantly enhances the completeness of target concept removal under both in-domain and out-of-domain settings, while better preserving non-target semantics and cross-modal global knowledge, all without compromising model utility compared to existing approaches.

📝 Abstract

Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced since a single image often contains multiple entangled concepts, including both target concepts to be forgotten and contextual information that should be preserved. In this paper, we propose an interpretable concept-level unlearning framework for VLMs, which constructs a compact task-specific concept vocabulary from the forgetting set using a multimodal large language model. In addition to modality alignment, visual representations are decomposed into sparse, nonnegative combinations of semantic concepts, providing an explicit interface for fine-grained knowledge manipulation. Based on this decomposition, our method formulates unlearning as concept-level optimization, where target concepts are selectively suppressed while intra-instance non-target semantics and global cross-modal knowledge are preserved. Extensive experiments across both in-domain and out-of-domain forgetting settings demonstrate that our method enables more comprehensive target forgetting, better preserves non-target knowledge within the same image, and maintains competitive model utility compared with existing VLM unlearning methods.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

vision-language models

concept-level forgetting

semantic preservation

knowledge removal

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept-level unlearning

interpretable concept decomposition

vision-language models