🤖 AI Summary
Models trained on real-world data often implicitly encode and amplify societal biases, yet existing debiasing methods rely on predefined bias types and labeled protected attributes—limiting their generalizability and practical applicability. Method: We propose a novel multi-layer adversarial debiasing framework that requires neither prior bias knowledge nor demographic annotations. It dynamically deploys unsupervised auxiliary bias detectors on feature maps across multiple layers of the main model and employs multi-scale adversarial training for fine-grained, adaptive identification and suppression of implicit biases. Contribution/Results: Our approach introduces the first “bias-agnostic + annotation-free” synergistic mechanism, enabling end-to-end bias representation learning. Experiments on sentiment and occupation classification tasks demonstrate significant reductions in gender and racial bias—achieving performance on par with or superior to state-of-the-art supervised debiasing methods, while operating entirely without protected attribute labels.
📝 Abstract
Models trained on real-world data often mirror and exacerbate existing social biases. Traditional methods for mitigating these biases typically require prior knowledge of the specific biases to be addressed, and the social groups associated with each instance. In this paper, we introduce a novel adversarial training strategy that operates withour relying on prior bias-type knowledge (e.g., gender or racial bias) and protected attribute labels. Our approach dynamically identifies biases during model training by utilizing auxiliary bias detector. These detected biases are simultaneously mitigated through adversarial training. Crucially, we implement these bias detectors at various levels of the feature maps of the main model, enabling the detection of a broader and more nuanced range of bias features. Through experiments on racial and gender biases in sentiment and occupation classification tasks, our method effectively reduces social biases without the need for demographic annotations. Moreover, our approach not only matches but often surpasses the efficacy of methods that require detailed demographic insights, marking a significant advancement in bias mitigation techniques.