🤖 AI Summary
In text classification, language models often exhibit bias by over-relying on linguistic cues correlated with protected attributes (e.g., gender, race). To address this, we propose a novel unsupervised bias mitigation paradigm: zero-shot general-purpose text simplification using ChatGPT. Without requiring sensitive attribute annotations or architectural modifications to downstream models, our method standardizes linguistic expressions across subpopulations while preserving semantic fidelity—thereby attenuating statistical associations between sensitive cues and textual features. Experiments demonstrate that simplified texts reduce predictability of protected attributes by up to 17%, significantly weakening discriminatory correlations. To our knowledge, this is the first work to systematically employ large language model–driven general text simplification for bias mitigation in NLP. Our approach offers a lightweight, annotation-free, and fine-tuning–free pathway toward fairer language processing.
📝 Abstract
The presence of specific linguistic signals particular to a certain sub-group of people can be picked up by language models during training. If the model begins to associate specific language with a distinct group, any decisions made based upon this language would hold a strong correlation to a decision based upon their protected characteristic, leading to possible discrimination. We explore a potential technique for bias mitigation in the form of simplification of text. The driving force of this idea is that simplifying text should standardise language between different sub-groups to one way of speaking while keeping the same meaning. The experiment shows promising results as the classifier accuracy for predicting the sensitive attribute drops by up to 17% for the simplified data.