π€ AI Summary
Large vision-language models (LVLMs) trained on web-sourced data pose significant privacy risks, yet existing machine unlearning methods often neglect post-unlearning response quality, leading to degradation, hallucination, or excessive refusal. Method: We propose PUBGβthe first generative unlearning framework explicitly optimizing for βpost-unlearning response quality.β PUBG jointly models vision-language constraints via contrastive learning and output distribution alignment, and introduces a controllable response guidance mechanism to enforce privacy safety, informativeness, and visual grounding in unlearned outputs. Contribution/Results: Experiments demonstrate that PUBG achieves zero privacy leakage while improving response accuracy on unlearned samples by 32% over baselines; it also attains state-of-the-art performance in visual relevance and information richness.
π Abstract
Machine unlearning is used to mitigate the privacy risks of Large Vision-Language Models (LVLMs) arising from training on large-scale web data. However, existing unlearning methods often fail to carefully select substitute outputs for forget targets, resulting in Unlearning Aftermaths-undesirable behaviors such as degenerate, hallucinated, or excessively refused responses. We highlight that, especially for generative LVLMs, it is crucial to consider the quality and informativeness of post-unlearning responses rather than relying solely on naive suppression. To address this, we introduce a new unlearning task for LVLMs that requires models to provide privacy-preserving yet informative and visually grounded responses. We also propose PUBG, a novel unlearning method that explicitly guides post-unlearning behavior toward a desirable output distribution. Experiments show that, while existing methods suffer from Unlearning Aftermaths despite successfully preventing privacy violations, PUBG effectively mitigates these issues, generating visually grounded and informative responses without privacy leakage for forgotten targets.