ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Existing unlearning methods for vision-language models typically operate at the image or instance level, making it difficult to precisely remove specific knowledge without impairing unrelated semantics—especially when images contain multiple entangled concepts. This work proposes the first concept-level unlearning framework: leveraging a multimodal large language model to construct a task-relevant concept vocabulary from the forgetting set, and decomposing visual representations into sparse, non-negative combinations of semantic concepts. This approach enables fine-grained and interpretable knowledge manipulation. Experiments demonstrate that the method significantly enhances the completeness of target concept removal under both in-domain and out-of-domain settings, while better preserving non-target semantics and cross-modal global knowledge, all without compromising model utility compared to existing approaches.
📝 Abstract
Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced since a single image often contains multiple entangled concepts, including both target concepts to be forgotten and contextual information that should be preserved. In this paper, we propose an interpretable concept-level unlearning framework for VLMs, which constructs a compact task-specific concept vocabulary from the forgetting set using a multimodal large language model. In addition to modality alignment, visual representations are decomposed into sparse, nonnegative combinations of semantic concepts, providing an explicit interface for fine-grained knowledge manipulation. Based on this decomposition, our method formulates unlearning as concept-level optimization, where target concepts are selectively suppressed while intra-instance non-target semantics and global cross-modal knowledge are preserved. Extensive experiments across both in-domain and out-of-domain forgetting settings demonstrate that our method enables more comprehensive target forgetting, better preserves non-target knowledge within the same image, and maintains competitive model utility compared with existing VLM unlearning methods.
Problem

Research questions and friction points this paper is trying to address.

machine unlearning
vision-language models
concept-level forgetting
semantic preservation
knowledge removal
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept-level unlearning
interpretable concept decomposition
vision-language models
semantic concept vocabulary
machine unlearning
🔎 Similar Papers