Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Current evaluations of forgetting in large vision-language models (LVLMs) often lack reliability due to insufficient verification of whether the models have effectively memorized target information. To address this limitation, this work introduces ReMem, a principled benchmark that establishes a robust memory–forgetting evaluation framework through systematic data augmentation, reasoning-aware question-answer pairs, and multi-image contextual settings. Furthermore, it proposes a novel Exposure metric to quantify the depth of information erasure at the level of probability distributions. Experimental results demonstrate that ReMem effectively diagnoses under-memorization issues in LVLMs during initial learning phases, substantially enhancing the rigor and credibility of forgetting assessments.

📝 Abstract

While Large Vision-Language Models (LVLMs) offer powerful capabilities, they pose privacy risks by unintentionally memorizing sensitive personal information. Current unlearning benchmarks attempt to mitigate this using fictitious identities but overlook a critical stage 1 failure: models fail to effectively memorize target information initially, rendering subsequent unlearning evaluations unreliable. Diagnosing under-memorization and the multi-hop curse as root causes, we introduce ReMem, a Reliable Multi-hop and Multi-image Memorization Benchmark. ReMem ensures robust foundational learning through principled data scaling, reasoning-aware QA pairs, and diverse visual contexts. Additionally, we propose a novel Exposure metric to quantify the depth of information erasure from the model's internal probability distribution. Extensive experiments demonstrate that ReMem provides a rigorous and trustworthy framework for diagnosing both learning and unlearning behaviors in LVLMs.

Problem

Research questions and friction points this paper is trying to address.

unlearning benchmarks

foundational learning failures

under-memorization

Large Vision-Language Models

information memorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

unlearning benchmark

foundational learning

multi-hop memorization