Fairness Aware Reinforcement Learning via Proximal Policy Optimization

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-agent systems (MAS) often exhibit reward allocation unfairness stemming from sensitive attributes (e.g., gender, race), undermining equity and trust. Method: This paper proposes a proximal policy optimization (PPO)-based RL framework incorporating dual-stage fairness constraints. It is the first to jointly embed demographic parity, counterfactual fairness, and conditional statistical fairness into the PPO objective function. Retrospective penalties rectify historical inequities, while prospective penalties ensure future decision fairness—both enabling differentiable, end-to-end fair optimization. Contribution/Results: Evaluated in the Allelopathic Harvest environment, the method significantly improves all fairness metrics; the “fairness cost” is equitably distributed across sensitive and non-sensitive agents. Behavioral analysis confirms that the dual-penalty mechanism effectively steers policy convergence toward fair solutions. This work establishes the first theoretically rigorous and practically implementable two-dimensional fairness-constrained paradigm for fair RL in MAS.

Technology Category

Application Category

📝 Abstract
Fairness in multi-agent systems (MAS) focuses on equitable reward distribution among agents in scenarios involving sensitive attributes such as race, gender, or socioeconomic status. This paper introduces fairness in Proximal Policy Optimization (PPO) with a penalty term derived from demographic parity, counterfactual fairness, and conditional statistical parity. The proposed method balances reward maximisation with fairness by integrating two penalty components: a retrospective component that minimises disparities in past outcomes and a prospective component that ensures fairness in future decision-making. We evaluate our approach in the Allelopathic Harvest game, a cooperative and competitive MAS focused on resource collection, where some agents possess a sensitive attribute. Experiments demonstrate that fair-PPO achieves fairer policies across all fairness metrics than classic PPO. Fairness comes at the cost of reduced rewards, namely the Price of Fairness, although agents with and without the sensitive attribute renounce comparable amounts of rewards. Additionally, the retrospective and prospective penalties effectively change the agents' behaviour and improve fairness. These findings underscore the potential of fair-PPO to address fairness challenges in MAS.
Problem

Research questions and friction points this paper is trying to address.

Enhance fairness in multi-agent systems
Balance reward maximization with fairness
Address sensitive attributes in resource distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates fairness into Proximal Policy Optimization.
Uses retrospective and prospective penalty components.
Evaluates fairness in Allelopathic Harvest game.
🔎 Similar Papers
No similar papers found.