Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the fairness generalization failure problem in machine learning—where fairness constraints satisfied on training data do not necessarily hold on unseen test data. We propose the first information-theoretic framework for bounding fairness generalization error, formalizing fairness overfitting via mutual information (MI) and conditional mutual information (CMI) between model parameters, training data, and sensitive attributes. Leveraging the Efron–Stein inequality, we derive a tight, computationally verifiable upper bound on fairness generalization error. The bound is algorithm-agnostic, applicable to diverse fairness-aware learners—including demographic parity, equalized odds, and counterfactual fairness models—and requires no distributional assumptions. Empirical evaluation across multiple benchmark datasets and fairness algorithms demonstrates both the tightness of the bound and its utility in guiding the design of fair models with provable generalization guarantees. Our framework establishes a new theoretical foundation and practical criterion for developing fairness-aware learning algorithms with rigorous generalization assurances.

Technology Category

Application Category

📝 Abstract

Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron-Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.

Problem

Research questions and friction points this paper is trying to address.

Analyzing fairness generalization error in machine learning models

Addressing lack of formal guarantees for fairness on unseen data

Proposing information-theoretic bounds for fairness overfitting prevention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Information-theoretic framework for fairness generalization

Efron-Stein inequality based bounding technique

Mutual and Conditional Mutual Information bounds

🔎 Similar Papers

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges