Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing training-free unlearning methods for vision-language models (VLMs) lack systematic evaluation, making it difficult to distinguish genuine concept forgetting from mere instruction following. To address this gap, this work introduces VLM-UnBench, the first benchmark dedicated to evaluating training-free visual concept unlearning. It features multi-granularity forgetting levels, three probing mechanisms, and five evaluation conditions, enabling large-scale assessment across seven datasets, eleven concept axes, and thirteen VLM configurations. Experiments reveal that standard unlearning prompts barely suppress target concepts; only under an “oracle” condition—where the target is known a priori—do they show limited efficacy, with object- and scene-related concepts proving most resistant to removal. This study is the first to clearly differentiate instruction compliance from true unlearning, exposing the practical limitations of current approaches in real-world scenarios.
📝 Abstract
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep this by suppressing concepts through prompts or system instructions, but no rigorous benchmark exists for evaluating them on visual tasks. We introduce VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs. It covers four forgetting levels, 7 source datasets, and 11 concept axes, and pairs a three-level probe taxonomy with five evaluation conditions to separate genuine forgetting from instruction compliance. Across 8 evaluation settings and 13 VLM configurations, realistic unlearning prompts leave forget accuracy near the no-instruction baseline; meaningful reductions appear only under oracle conditions that disclose the target concept to the model. Object and scene concepts are the most resistant to suppression, and stronger instruction-tuned models remain capable despite explicit forget instructions. These results expose a clear gap between prompt-level suppression and true visual concept erasure.
Problem

Research questions and friction points this paper is trying to address.

visual concept unlearning
training-free unlearning
vision-language models
forgetting benchmark
concept suppression
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free unlearning
visual concept forgetting
VLM benchmark
instruction compliance
visual-language models
🔎 Similar Papers
No similar papers found.