Reliable fairness auditing with semi-supervised inference

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biomedical machine learning models frequently exhibit subgroup bias, and fairness auditing typically requires large labeled datasets—costly to obtain. This paper proposes Infairness, a semi-supervised fairness auditing framework that leverages only a small number of labeled samples alongside abundant unlabeled data. It employs nonlinear basis function regression to impute missing model predictions and unifies the estimation of multiple fairness metrics—including statistical parity and equal opportunity—within a single modeling framework. We theoretically establish consistency of the proposed estimator regardless of whether the imputation model is correctly specified; moreover, when the imputation model is well-specified, the estimator is asymptotically more efficient than fully supervised alternatives. Empirical evaluation demonstrates: (i) consistently higher accuracy in synthetic experiments; and (ii) a 64% reduction in estimation variance on a real-world electronic health record dataset for depression phenotyping, substantially improving reliability and precision of fairness assessment under limited labeling budgets.

Technology Category

Application Category

📝 Abstract
Machine learning (ML) models often exhibit bias that can exacerbate inequities in biomedical applications. Fairness auditing, the process of evaluating a model's performance across subpopulations, is critical for identifying and mitigating these biases. However, such audits typically rely on large volumes of labeled data, which are costly and labor-intensive to obtain. To address this challenge, we introduce $ extit{Infairness}$, a unified framework for auditing a wide range of fairness criteria using semi-supervised inference. Our approach combines a small labeled dataset with a large unlabeled dataset by imputing missing outcomes via regression with carefully selected nonlinear basis functions. We show that our proposed estimator is (i) consistent regardless of whether the ML or imputation models are correctly specified and (ii) more efficient than standard supervised estimation with the labeled data when the imputation model is correctly specified. Through extensive simulations, we also demonstrate that Infairness consistently achieves higher precision than supervised estimation. In a real-world application of phenotyping depression from electronic health records data, Infairness reduces variance by up to 64% compared to supervised estimation, underscoring its value for reliable fairness auditing with limited labeled data.
Problem

Research questions and friction points this paper is trying to address.

Auditing ML model fairness with limited labeled data
Reducing bias in biomedical applications via semi-supervised inference
Improving efficiency and precision in fairness evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised inference for fairness auditing
Imputing missing outcomes via nonlinear regression
Higher precision with limited labeled data
🔎 Similar Papers
No similar papers found.
Jianhui Gao
Jianhui Gao
University of Toronto
Statistics
J
Jessica Gronsbell
Department of Statistics, University of Toronto, Toronto, O N, Canada