MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge

📅 2024-08-10

🏛️ AAAI Conference on Artificial Intelligence

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Models trained on real-world data often implicitly encode and amplify societal biases, yet existing debiasing methods rely on predefined bias types and labeled protected attributes—limiting their generalizability and practical applicability. Method: We propose a novel multi-layer adversarial debiasing framework that requires neither prior bias knowledge nor demographic annotations. It dynamically deploys unsupervised auxiliary bias detectors on feature maps across multiple layers of the main model and employs multi-scale adversarial training for fine-grained, adaptive identification and suppression of implicit biases. Contribution/Results: Our approach introduces the first “bias-agnostic + annotation-free” synergistic mechanism, enabling end-to-end bias representation learning. Experiments on sentiment and occupation classification tasks demonstrate significant reductions in gender and racial bias—achieving performance on par with or superior to state-of-the-art supervised debiasing methods, while operating entirely without protected attribute labels.

Technology Category

Application Category

📝 Abstract

Models trained on real-world data often mirror and exacerbate existing social biases. Traditional methods for mitigating these biases typically require prior knowledge of the specific biases to be addressed, and the social groups associated with each instance. In this paper, we introduce a novel adversarial training strategy that operates withour relying on prior bias-type knowledge (e.g., gender or racial bias) and protected attribute labels. Our approach dynamically identifies biases during model training by utilizing auxiliary bias detector. These detected biases are simultaneously mitigated through adversarial training. Crucially, we implement these bias detectors at various levels of the feature maps of the main model, enabling the detection of a broader and more nuanced range of bias features. Through experiments on racial and gender biases in sentiment and occupation classification tasks, our method effectively reduces social biases without the need for demographic annotations. Moreover, our approach not only matches but often surpasses the efficacy of methods that require detailed demographic insights, marking a significant advancement in bias mitigation techniques.

Problem

Research questions and friction points this paper is trying to address.

Removes social biases in models without prior bias knowledge

Detects biases during training using auxiliary models

Reduces biases without needing demographic annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial training without prior bias knowledge

Auxiliary models predict main model performance

Multi-level feature map bias detection

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

2024-02-18arXiv.orgCitations: 0

Instacart

CA, NY, CT, NJ$240,000—$253,500 USDWA$230,000—$243,000 USDOR, DE, ME, MA, MD, NH, RI, VT, DC, PA, VA, CO, TX, IL, HI$221,000—$233,000 USDAll other states$201,000—$212,000 USD

remote

Machine Learning Engineer