OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Existing multimodal large language model (MLLM) machine unlearning (MU) benchmarks suffer from low image diversity, inaccurate annotations, and narrow evaluation scenarios, failing to reflect the complexity of multimodal false information unlearning in real-world applications. To address this, we propose OFFSIDE—the first MLLM unlearning benchmark specifically designed for football transfer rumors—comprising 15.68K human-annotated samples and four distinct test sets that support selective unlearning, text-only unlearning, and corrective relearning. Through systematic evaluation of unlearning efficacy, generalization, utility preservation, and robustness, we identify five critical challenges, including the persistence of visual misinformation and susceptibility to prompt-based recovery attacks. Empirical results reveal that current methods heavily rely on catastrophic forgetting for multimodal unlearning and exhibit insufficient robustness. OFFSIDE establishes a new benchmark and delivers key insights for developing trustworthy MLLMs.

Technology Category

Application Category

📝 Abstract
Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of learned information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, potential inaccuracies, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applications. To facilitate the development of MLLMs unlearning and alleviate the aforementioned limitations, we introduce OFFSIDE, a novel benchmark for evaluating misinformation unlearning in MLLMs based on football transfer rumors. This manually curated dataset contains 15.68K records for 80 players, providing a comprehensive framework with four test sets to assess forgetting efficacy, generalization, utility, and robustness. OFFSIDE supports advanced settings like selective unlearning and corrective relearning, and crucially, unimodal unlearning (forgetting only text data). Our extensive evaluation of multiple baselines reveals key findings: (1) Unimodal methods (erasing text-based knowledge) fail on multimodal rumors; (2) Unlearning efficacy is largely driven by catastrophic forgetting; (3) All methods struggle with "visual rumors" (rumors appear in the image); (4) The unlearned rumors can be easily recovered and (5) All methods are vulnerable to prompt attacks. These results expose significant vulnerabilities in current approaches, highlighting the need for more robust multimodal unlearning solutions. The code is available at href{https://github.com/zh121800/OFFSIDE}{https://github.com/zh121800/OFFSIDE}.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking misinformation unlearning in multimodal language models
Addressing limitations in image diversity and evaluation scenarios
Evaluating forgetting efficacy and robustness against prompt attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces OFFSIDE benchmark for misinformation unlearning
Manually curated dataset with 15.68K football transfer records
Supports selective unlearning and corrective relearning settings
🔎 Similar Papers