🤖 AI Summary
In policy learning, action fairness—defined as equitable allocation of decisions—and outcome fairness—referring to equal downstream consequences—often conflict due to heterogeneous group responses. This work proposes the first dual-fairness learning framework that jointly models both notions while maximizing overall policy value. The problem is formalized as a multi-objective optimization and solved via a lexicographic weighted Chebyshev method to navigate the non-convex Pareto frontier. The approach accommodates diverse fairness metrics and provides theoretical regret bounds. Experiments on synthetic data and real-world datasets from insurance and entrepreneurship training demonstrate that the framework substantially improves both forms of fairness with only minor degradation in overall policy value.
📝 Abstract
Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to the same action. We propose a novel double fairness learning (DFL) framework that explicitly manages the trade-off among three objectives: action fairness, outcome fairness, and value maximization. We integrate fairness directly into a multi-objective optimization problem for policy learning and employ a lexicographic weighted Tchebyshev method that recovers Pareto solutions beyond convex settings, with theoretical guarantees on the regret bounds. Our framework is flexible and accommodates various commonly used fairness notions. Extensive simulations demonstrate improved performance relative to competing methods. In applications to a motor third-party liability insurance dataset and an entrepreneurship training dataset, DFL substantially improves both action and outcome fairness while incurring only a modest reduction in overall value.