🤖 AI Summary
This study investigates the redistribution of bias during visual machine unlearning: specifically, whether forgetting a particular demographic group—such as young women—leads to degraded fairness. Using the CelebA dataset, the authors evaluate three unlearning methods—Prompt Erasure, Prompt Reweighting, and Refusal Vector—under zero-shot classification with CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) across age–gender intersectional groups. The work reveals, for the first time, a gender-dominant structure in CLIP’s embedding space, where unlearning young women significantly improves performance on older women, indicating that bias shifts along gender rather than age dimensions. Findings demonstrate that current unlearning approaches fail to eliminate bias and instead redistribute it across gender groups; while Refusal Vector partially mitigates this effect, it achieves incomplete unlearning and impairs performance on retained tasks.
📝 Abstract
Machine unlearning enables models to selectively forget training data, driven by privacy regulations such as GDPR and CCPA. However, its fairness implications remain underexplored: when a model forgets a demographic group, does it neutralize that concept or redistribute it to correlated groups, potentially amplifying bias? We investigate this bias redistribution phenomenon on CelebA using CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) under a zero-shot classification setting across intersectional groups defined by age and gender. We evaluate three unlearning methods, Prompt Erasure, Prompt Reweighting, and Refusal Vector using per-group accuracy shifts, demographic parity gaps, and a redistribution score. Our results show that unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance. These findings highlight a fundamental limitation of current unlearning methods: without accounting for embedding geometry, they risk amplifying bias in retained groups.