🤖 AI Summary
Existing multimodal large language model (MLLM) machine unlearning (MU) benchmarks suffer from low image diversity, inaccurate annotations, and narrow evaluation scenarios, failing to reflect the complexity of multimodal false information unlearning in real-world applications. To address this, we propose OFFSIDE—the first MLLM unlearning benchmark specifically designed for football transfer rumors—comprising 15.68K human-annotated samples and four distinct test sets that support selective unlearning, text-only unlearning, and corrective relearning. Through systematic evaluation of unlearning efficacy, generalization, utility preservation, and robustness, we identify five critical challenges, including the persistence of visual misinformation and susceptibility to prompt-based recovery attacks. Empirical results reveal that current methods heavily rely on catastrophic forgetting for multimodal unlearning and exhibit insufficient robustness. OFFSIDE establishes a new benchmark and delivers key insights for developing trustworthy MLLMs.
📝 Abstract
Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of learned information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, potential inaccuracies, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applications. To facilitate the development of MLLMs unlearning and alleviate the aforementioned limitations, we introduce OFFSIDE, a novel benchmark for evaluating misinformation unlearning in MLLMs based on football transfer rumors. This manually curated dataset contains 15.68K records for 80 players, providing a comprehensive framework with four test sets to assess forgetting efficacy, generalization, utility, and robustness. OFFSIDE supports advanced settings like selective unlearning and corrective relearning, and crucially, unimodal unlearning (forgetting only text data). Our extensive evaluation of multiple baselines reveals key findings: (1) Unimodal methods (erasing text-based knowledge) fail on multimodal rumors; (2) Unlearning efficacy is largely driven by catastrophic forgetting; (3) All methods struggle with "visual rumors" (rumors appear in the image); (4) The unlearned rumors can be easily recovered and (5) All methods are vulnerable to prompt attacks. These results expose significant vulnerabilities in current approaches, highlighting the need for more robust multimodal unlearning solutions. The code is available at href{https://github.com/zh121800/OFFSIDE}{https://github.com/zh121800/OFFSIDE}.