🤖 AI Summary
This work addresses the challenge of achieving long-term fairness in selective-label settings—such as hiring or lending—where outcomes are observed only for positively selected individuals, rendering conventional fairness algorithms inapplicable. The paper formally defines the long-term fairness problem under this partial observability and introduces a novel framework that decomposes true fairness into observable fairness and prediction bias. By leveraging the confidence of a label prediction model, the authors derive sufficient conditions to guarantee true fairness without access to ground-truth labels. Building on this insight, they develop a reinforcement learning algorithm capable of making fair sequential decisions in the absence of complete outcome data. Empirical evaluations on semi-synthetic environments demonstrate that the proposed method achieves fairness and performance nearly matching those of an oracle agent with full label access.
📝 Abstract
Long-term fairness algorithms aim to satisfy fairness beyond static and short-term notions by accounting for the dynamics between decision-making policies and population behavior. Most previous approaches evaluate performance and fairness measures from observable features and a label, which is assumed to be fully observed. However, in scenarios such as hiring or lending, the labels (e.g., ability to repay the loan) are selective labels as they are only revealed based on positive decisions (e.g., when a loan is granted). In this paper, we study long-term fairness in the selective labels setting and analytically show that naive solutions do not guarantee fairness. To address this gap, we then introduce a novel framework that leverages both the observed data and a label predictor model to estimate the true fairness measure value by decomposing it into the observed fairness and bias from label predictions. This allows us to derive sufficient conditions to satisfy true fairness from observable quantities by using the confidence in the predictor model. Finally, we rely on our theoretical results to propose a novel reinforcement learning algorithm for effective long-term fair decision-making with selective labels. In semisynthetic environments, the proposed algorithm reached comparable fairness and performance to an agent with oracle access to the true labels.