TriCon-Fair: Triplet Contrastive Learning for Mitigating Social Bias in Pre-trained Language Models

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Large language models (LLMs) often amplify societal biases, and existing debiasing methods independently model biased and unbiased samples, overlooking their implicit negative coupling—leading to inter-group performance trade-offs and residual bias. To address this, we propose a triplet-based contrastive learning framework for disentangled debiasing. Our method introduces a disentanglement loss that explicitly decouples representational associations between biased and unbiased samples, and jointly optimizes contrastive learning with the language modeling objective to prevent interference from positive/negative sample misalignment. This end-to-end approach significantly reduces discriminatory outputs across multiple bias evaluation benchmarks—achieving an average 23.6% improvement in fairness metrics—while preserving or even enhancing downstream task performance, outperforming state-of-the-art debiasing methods. The core contribution lies in explicitly modeling and breaking the adverse coupling between biased and unbiased samples, thereby enabling synergistic optimization of fairness and linguistic capability.

Technology Category

Application Category

📝 Abstract

The increasing utilization of large language models raises significant concerns about the propagation of social biases, which may result in harmful and unfair outcomes. However, existing debiasing methods treat the biased and unbiased samples independently, thus ignoring their mutual relationship. This oversight enables a hidden negative-positive coupling, where improvements for one group inadvertently compromise the other, allowing residual social bias to persist. In this paper, we introduce TriCon-Fair, a contrastive learning framework that employs a decoupled loss that combines triplet and language modeling terms to eliminate positive-negative coupling. Our TriCon-Fair assigns each anchor an explicitly biased negative and an unbiased positive, decoupling the push-pull dynamics and avoiding positive-negative coupling, and jointly optimizes a language modeling (LM) objective to preserve general capability. Experimental results demonstrate that TriCon-Fair reduces discriminatory output beyond existing debiasing baselines while maintaining strong downstream performance. This suggests that our proposed TriCon-Fair offers a practical and ethical solution for sensitive NLP applications.

Problem

Research questions and friction points this paper is trying to address.

Mitigating social bias in pre-trained language models

Addressing negative-positive coupling in debiasing methods

Reducing discriminatory outputs while preserving model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses triplet contrastive learning for debiasing

Decouples push-pull dynamics with biased negatives

Jointly optimizes language modeling and debiasing objectives

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings