Fairness Is Not Just Ethical: Performance Trade-Off via Data Correlation Tuning to Mitigate Bias in ML Software

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Traditional software fairness research predominantly addresses ethical concerns, overlooking fairness as a core software quality attribute—namely, performance disparities across sensitive groups. Existing bias-mitigation techniques struggle to balance generality and effectiveness. Method: This paper formally integrates fairness into the software quality framework and proposes CoT, a multi-objective correlation-tuning framework based on the Phi coefficient. CoT achieves preprocessing bias correction by explicitly modeling and regulating statistical correlations between sensitive attributes and prediction labels. Contribution/Results: CoT effectively mitigates proxy bias, improving the true positive rate for unprivileged groups by 17.5% on average. It reduces three key fairness metrics—Statistical Parity Difference (SPD), Average Odds Difference (AOD), and Equal Opportunity Difference (EOD)—by over 50% on average. In single- and multi-sensitive-attribute settings, CoT outperforms state-of-the-art methods by 3% and 10%, respectively.

Technology Category

Application Category

📝 Abstract

Traditional software fairness research typically emphasizes ethical and social imperatives, neglecting that fairness fundamentally represents a core software quality issue arising directly from performance disparities across sensitive user groups. Recognizing fairness explicitly as a software quality dimension yields practical benefits beyond ethical considerations, notably improved predictive performance for unprivileged groups, enhanced out-of-distribution generalization, and increased geographic transferability in real-world deployments. Nevertheless, existing bias mitigation methods face a critical dilemma: while pre-processing methods offer broad applicability across model types, they generally fall short in effectiveness compared to post-processing techniques. To overcome this challenge, we propose Correlation Tuning (CoT), a novel pre-processing approach designed to mitigate bias by adjusting data correlations. Specifically, CoT introduces the Phi-coefficient, an intuitive correlation measure, to systematically quantify correlation between sensitive attributes and labels, and employs multi-objective optimization to address the proxy biases. Extensive evaluations demonstrate that CoT increases the true positive rate of unprivileged groups by an average of 17.5% and reduces three key bias metrics, including statistical parity difference (SPD), average odds difference (AOD), and equal opportunity difference (EOD), by more than 50% on average. CoT outperforms state-of-the-art methods by three and ten percentage points in single attribute and multiple attributes scenarios, respectively. We will publicly release our experimental results and source code to facilitate future research.

Problem

Research questions and friction points this paper is trying to address.

Addresses bias mitigation in machine learning software.

Proposes a pre-processing method to adjust data correlations.

Improves fairness metrics and predictive performance for unprivileged groups.

Innovation

Methods, ideas, or system contributions that make the work stand out.

CoT adjusts data correlations to mitigate bias

Uses Phi-coefficient to measure sensitive attribute-label correlation

Employs multi-objective optimization to address proxy biases

🔎 Similar Papers

A Catalog of Fairness-Aware Practices in Machine Learning Engineering