🤖 AI Summary
This paper identifies sampling bias as a systematic confounder in machine learning fairness evaluation, characterizing it through two asymmetric mechanisms: Sample Size Bias (SSB) and Under-Representation Bias (URB), thereby challenging the common “bias homogenization” assumption. Using Logistic Regression, XGBoost, and DNNs on benchmark datasets—including UCI, Adult, and COMPAS—we empirically isolate and formally define SSB and URB for the first time. Results demonstrate that URB severely distorts key fairness metrics—particularly Demographic Parity (DP) and Equal Opportunity (EO)—leading to substantial misestimation of discrimination against minority groups. In contrast, SSB exhibits comparatively weaker impact. To address this, we propose an actionable, end-to-end framework covering data collection, fairness assessment, and bias mitigation. This work establishes a methodological foundation for empirical fairness research, offering both theoretical clarification and practical guidance for robust fairness evaluation.
📝 Abstract
Accurately measuring discrimination is crucial to faithfully assessing fairness of trained machine learning (ML) models. Any bias in measuring discrimination leads to either amplification or underestimation of the existing disparity. Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups (e.g. females vs males, whites vs blacks, etc.). If, however, bias is born differently by different groups, it may exacerbate discrimination against specific sub-populations. Sampling bias, in particular, is inconsistently used in the literature to describe bias due to the sampling procedure. In this paper, we attempt to disambiguate this term by introducing clearly defined variants of sampling bias, namely, sample size bias (SSB) and underrepresentation bias (URB). Through an extensive set of experiments on benchmark datasets and using mainstream learning algorithms, we expose relevant observations in several model training scenarios. The observations are finally framed as actionable recommendations for practitioners.