Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching

๐Ÿ“… 2025-04-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper identifies a critical issue: when training and test data are drawn from the same source via random sampling, inherent data biases systematically distort fairness evaluation, leading to erroneous assessments for protected subgroups (e.g., gender, race). To address this, we propose FairMatchโ€”the first fairness diagnostic method integrating Propensity Score Matching (PSM). FairMatch constructs comparable inter-subgroup sample pairs on the test set, dynamically optimizes subgroup-specific decision thresholds, and applies fairness-aware probability calibration to unmatched samples. By enabling precise bias localization and hierarchical mitigation, FairMatch preserves model predictive performance while significantly enhancing the reliability of fairness assessment and the effectiveness of bias mitigation.

Technology Category

Application Category

๐Ÿ“ Abstract
Fairness-aware learning aims to mitigate discrimination against specific protected social groups (e.g., those categorized by gender, ethnicity, age) while minimizing predictive performance loss. Despite efforts to improve fairness in machine learning, prior studies have shown that many models remain unfair when measured against various fairness metrics. In this paper, we examine whether the way training and testing data are sampled affects the reliability of reported fairness metrics. Since training and test sets are often randomly sampled from the same population, bias present in the training data may still exist in the test data, potentially skewing fairness assessments. To address this, we propose FairMatch, a post-processing method that applies propensity score matching to evaluate and mitigate bias. FairMatch identifies control and treatment pairs with similar propensity scores in the test set and adjusts decision thresholds for different subgroups accordingly. For samples that cannot be matched, we perform probabilistic calibration using fairness-aware loss functions. Experimental results demonstrate that our approach can (a) precisely locate subsets of the test data where the model is unbiased, and (b) significantly reduce bias on the remaining data. Overall, propensity score matching offers a principled way to improve both fairness evaluation and mitigation, without sacrificing predictive performance.
Problem

Research questions and friction points this paper is trying to address.

Examines how data sampling affects fairness metric reliability
Proposes FairMatch to mitigate bias via propensity score matching
Improves fairness evaluation without sacrificing predictive performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses propensity score matching for bias evaluation
Adjusts decision thresholds for matched subgroups
Applies probabilistic calibration for unmatched samples
๐Ÿ”Ž Similar Papers
No similar papers found.