Learning Representational Disparities

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing fair machine learning methods overlook the impact of human decision-making on downstream outcomes, thus failing to mitigate real-world outcome inequality arising from representational bias. To address this, we introduce the novel concept of “representation mismatch,” formally modeling the causal discrepancy between actual human decisions and idealized counterfactual decisions. We cast this mismatch as an intervenable multi-objective optimization problem within a neural network framework. Theoretically, our learned interpretable weights provably eliminate downstream outcome inequality, enabling decision-maker–oriented explainable interventions (e.g., behavioral nudges). Our method integrates causal simplifying assumptions with weight-based representation modeling. Empirical evaluation on German Credit, Adult, and Heritage Health datasets confirms both the identifiability and remediability of representation mismatch, achieving complete mitigation of downstream inequality.

Technology Category

Application Category

📝 Abstract

We propose a fair machine learning algorithm to model interpretable differences between observed and desired human decision-making, with the latter aimed at reducing disparity in a downstream outcome impacted by the human decision. Prior work learns fair representations without considering the outcome in the decision-making process. We model the outcome disparities as arising due to the different representations of the input seen by the observed and desired decision-maker, which we term representational disparities. Our goal is to learn interpretable representational disparities which could potentially be corrected by specific nudges to the human decision, mitigating disparities in the downstream outcome; we frame this as a multi-objective optimization problem using a neural network. Under reasonable simplifying assumptions, we prove that our neural network model of the representational disparity learns interpretable weights that fully mitigate the outcome disparity. We validate objectives and interpret results using real-world German Credit, Adult, and Heritage Health datasets.

Problem

Research questions and friction points this paper is trying to address.

Model interpretable differences between observed and desired human decision-making

Mitigate outcome disparities via representational disparities in decision-making

Learn interpretable neural network weights to reduce outcome disparities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fair machine learning algorithm modeling disparities

Multi-objective optimization using neural networks

Interpretable weights mitigating outcome disparities

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models