Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses multi-objective optimization in reinforcement learning for cross-group fairness, aiming to maximize the expected return of the worst-off group—formally defined as the minimum state-reweighted return. To tackle challenges arising from large state spaces and exponentially many overlapping group constraints, we propose the first oracle-efficient algorithmic framework, applicable uniformly to both tabular and structured MDPs. Our approach integrates state reweighting, dual optimization, and online mirror descent, and leverages structural assumptions—such as low-dimensional embeddings or sparse group-specific features—to ensure scalability. We establish sublinear regret and constraint violation bounds. Experiments on preferential-attachment graph MDPs demonstrate that our method significantly improves the worst-group return while preserving overall performance.

Technology Category

Application Category

📝 Abstract

In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.

Problem

Research questions and friction points this paper is trying to address.

Optimize fairness in multi-objective reinforcement learning.

Maximize minimal reward across intersecting demographic groups.

Develop efficient algorithms for large-scale MDPs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective RL optimization

Oracle-efficient algorithms

Intersectional fairness constraints

🔎 Similar Papers

No similar papers found.

Anthropic

$500,000—$850,000 USD

San Francisco, CA, USA

AI Research Scientist - FAIR Social Intelligence