Size-adaptive Hypothesis Testing for Fairness

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing fairness evaluation methods rely on point estimates compared against fixed thresholds, ignoring sampling uncertainty and applying uniform criteria across subgroup sizes—particularly failing in intersectional settings where sparse subgroup samples yield overly wide confidence intervals and unreliable statistical inference. This paper proposes the first adaptive, bimodal statistical testing framework for fairness assessment: Wald tests with theoretical guarantees for large subgroups, and calibrated Bayesian Dirichlet–multinomial estimation for small subgroups. The framework ensures interpretable and verifiable fairness decisions across the full spectrum of subgroup sizes. Through central limit theorem analysis, Monte Carlo–based confidence interval estimation, and extensive experiments on multiple benchmarks, we demonstrate that our method significantly improves statistical power and discriminative robustness—especially under data scarcity and high-dimensional intersectional scenarios.

Technology Category

Application Category

📝 Abstract

Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $alpha$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.

Problem

Research questions and friction points this paper is trying to address.

Addresses statistical brittleness in fairness metric comparisons

Improves fairness assessment for small intersectional demographic groups

Provides adaptive hypothesis testing for varying data availability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Size-adaptive hypothesis testing for fairness assessment

Central-Limit result for statistical parity difference

Bayesian Dirichlet-multinomial estimator for small groups

🔎 Similar Papers

A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research