Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-stakeholder reinforcement learning deployment, selecting a social welfare function (e.g., Utilitarian, Nash, or Egalitarian) is highly sensitive to stakeholder preferences and hinders policy consensus due to strong dependence of optimal policies on the exponent (p) in generalized (p)-means ((p in [-infty, 1])). Method: We propose the (alpha)-approximate policy portfolio framework—the first systematic approach to mitigate (p)-sensitivity. Integrating multi-objective RL, convex optimization, and (p)-mean generalization theory, we design a scalable algorithm for constructing policy sets, with theoretical guarantees on approximation accuracy, set size, and computational efficiency. Contribution/Results: Our framework covers the entire welfare frontier without pre-specifying (p), enabling decision-makers to intuitively explore the fairness–utility trade-off space. Experiments on synthetic and real-world datasets demonstrate robust coverage across (p), significantly enhancing interpretability and practicality of policy selection.

Technology Category

Application Category

📝 Abstract
In many real-world applications of reinforcement learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized $p$-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-making. This class includes well-known welfare functions such as Egalitarian, Nash, and Utilitarian welfare. However, selecting the appropriate social welfare function is challenging for decision-makers, as the structure and outcomes of optimal policies can be highly sensitive to the choice of $p$. To address this challenge, we study the concept of an $alpha$-approximate portfolio in RL, a set of policies that are approximately optimal across the family of generalized $p$-means for all $p in [-infty, 1]$. We propose algorithms to compute such portfolios and provide theoretical guarantees on the trade-offs among approximation factor, portfolio size, and computational efficiency. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in summarizing the policy space induced by varying $p$ values, empowering decision-makers to navigate this landscape more effectively.
Problem

Research questions and friction points this paper is trying to address.

Selecting optimal social welfare functions
Balancing stakeholder preferences in RL
Computing efficient policy portfolios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective Reinforcement Learning
Generalized p-means welfare
Alpha-approximate portfolio algorithms
🔎 Similar Papers
No similar papers found.