Federated Measurement of Demographic Disparities from Quantile Sketches

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenge of jointly assessing group fairness across data silos under privacy constraints, where local and global fairness metrics often diverge. The authors propose an efficient auditing method based on horizontal federated learning that enables accurate estimation of the global Wasserstein–Fréchet variance without sharing raw data, requiring only grouped counts and quantile summaries of score distributions from each participant. By innovatively introducing an ANOVA-style decomposition of the Wasserstein distance, the method disentangles the contributions of selection bias and cross-silo heterogeneity to observed fairness disparities. The designed single-round, low-bias federated estimator integrates quantile sketches with nonparametric theory, achieving high-precision reconstruction and diagnosis of fairness gaps with as few as dozens of quantiles, as validated on both synthetic and COMPAS datasets, while maintaining low communication overhead and provable error bounds.

Technology Category

Application Category

📝 Abstract

Many fairness goals are defined at a population level that misaligns with siloed data collection, which remains unsharable due to privacy regulations. Horizontal federated learning (FL) enables collaborative modeling across clients with aligned features without sharing raw data. We study federated auditing of demographic parity through score distributions, measuring disparity as a Wasserstein--Frechet variance between sensitive-group score laws, and expressing the population metric in federated form that makes explicit how silo-specific selection drives local-global mismatch. For the squared Wasserstein distance, we prove an ANOVA-style decomposition that separates (i) selection-induced mixture effects from (ii) cross-silo heterogeneity, yielding tight bounds linking local and global metrics. We then propose a one-shot, communication-efficient protocol in which each silo shares only group counts and a quantile summary of its local score distributions, enabling the server to estimate global disparity and its decomposition, with $O(1/k)$ discretization bias ($k$ quantiles) and finite-sample guarantees. Experiments on synthetic data and COMPAS show that a few dozen quantiles suffice to recover global disparity and diagnose its sources.

Problem

Research questions and friction points this paper is trying to address.

federated learning

demographic disparity

fairness auditing

data silos

privacy-preserving

Innovation

Methods, ideas, or system contributions that make the work stand out.

federated auditing

demographic parity

Wasserstein distance