🤖 AI Summary
This work addresses the fairness generalization failure problem in machine learning—where fairness constraints satisfied on training data do not necessarily hold on unseen test data. We propose the first information-theoretic framework for bounding fairness generalization error, formalizing fairness overfitting via mutual information (MI) and conditional mutual information (CMI) between model parameters, training data, and sensitive attributes. Leveraging the Efron–Stein inequality, we derive a tight, computationally verifiable upper bound on fairness generalization error. The bound is algorithm-agnostic, applicable to diverse fairness-aware learners—including demographic parity, equalized odds, and counterfactual fairness models—and requires no distributional assumptions. Empirical evaluation across multiple benchmark datasets and fairness algorithms demonstrates both the tightness of the bound and its utility in guiding the design of fair models with provable generalization guarantees. Our framework establishes a new theoretical foundation and practical criterion for developing fairness-aware learning algorithms with rigorous generalization assurances.
📝 Abstract
Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron-Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.