🤖 AI Summary
Large language models (LLMs) often amplify societal biases, and existing debiasing methods independently model biased and unbiased samples, overlooking their implicit negative coupling—leading to inter-group performance trade-offs and residual bias. To address this, we propose a triplet-based contrastive learning framework for disentangled debiasing. Our method introduces a disentanglement loss that explicitly decouples representational associations between biased and unbiased samples, and jointly optimizes contrastive learning with the language modeling objective to prevent interference from positive/negative sample misalignment. This end-to-end approach significantly reduces discriminatory outputs across multiple bias evaluation benchmarks—achieving an average 23.6% improvement in fairness metrics—while preserving or even enhancing downstream task performance, outperforming state-of-the-art debiasing methods. The core contribution lies in explicitly modeling and breaking the adverse coupling between biased and unbiased samples, thereby enabling synergistic optimization of fairness and linguistic capability.
📝 Abstract
The increasing utilization of large language models raises significant concerns about the propagation of social biases, which may result in harmful and unfair outcomes. However, existing debiasing methods treat the biased and unbiased samples independently, thus ignoring their mutual relationship. This oversight enables a hidden negative-positive coupling, where improvements for one group inadvertently compromise the other, allowing residual social bias to persist. In this paper, we introduce TriCon-Fair, a contrastive learning framework that employs a decoupled loss that combines triplet and language modeling terms to eliminate positive-negative coupling. Our TriCon-Fair assigns each anchor an explicitly biased negative and an unbiased positive, decoupling the push-pull dynamics and avoiding positive-negative coupling, and jointly optimizes a language modeling (LM) objective to preserve general capability. Experimental results demonstrate that TriCon-Fair reduces discriminatory output beyond existing debiasing baselines while maintaining strong downstream performance. This suggests that our proposed TriCon-Fair offers a practical and ethical solution for sensitive NLP applications.