🤖 AI Summary
Prior work on fairness testing for regression models remains limited, lacking systematic frameworks grounded in rigorous statistical fairness criteria.
Method: This paper proposes the first systematic fairness testing framework for regression models based on the expectation-based fairness criterion. It introduces the Wasserstein projection distance as a fairness metric; its dual reformulation yields an analytically tractable test statistic, whose asymptotic distribution and bounds are rigorously derived—achieving substantially higher specificity than permutation tests. The framework integrates optimal transport theory, statistical hypothesis testing, and optimal data perturbation to jointly detect and mitigate unfairness while preserving predictive accuracy.
Results: Evaluated on synthetic data and real-world tasks—including student performance prediction and house price forecasting—the method effectively reduces bias while maintaining model performance, demonstrating both statistical soundness and practical efficacy.
📝 Abstract
Fairness in machine learning is a critical concern, yet most research has focused on classification tasks, leaving regression models underexplored. This paper introduces a Wasserstein projection-based framework for fairness testing in regression models, focusing on expectation-based criteria. We propose a hypothesis-testing approach and an optimal data perturbation method to improve fairness while balancing accuracy. Theoretical results include a detailed categorization of fairness criteria for regression, a dual reformulation of the Wasserstein projection test statistic, and the derivation of asymptotic bounds and limiting distributions. Experiments on synthetic and real-world datasets demonstrate that the proposed method offers higher specificity compared to permutation-based tests, and effectively detects and mitigates biases in real applications such as student performance and housing price prediction.