Pareto-Optimal Learning from Preferences with Hidden Context

📅 2024-06-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the fairness–performance trade-off in AI value alignment under multi-group preferences. We propose Pareto-Optimal Preference Learning (POPL), a novel framework that achieves group-fair alignment without requiring explicit group labels. Unlike conventional single-objective approaches, POPL explicitly formulates group preferences as a multi-objective optimization problem, integrating lexicase selection, multi-objective optimization theory, and preference learning to enable stateless preference modeling across diverse domains—including Minigrid, Metaworld, and LLM fine-tuning. Its core contribution is the first realization of pluralistic alignment under implicit group structure: POPL simultaneously improves both cross-group policy performance and fairness without access to group annotations. Empirical results demonstrate that POPL significantly outperforms existing methods across multiple benchmarks and supports customizable group fairness control.

Technology Category

Application Category

📝 Abstract
Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) leverages human preferences to achieve this alignment. However, when preferences are sourced from diverse populations, point estimates of reward can result in suboptimal performance or be unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which enables pluralistic alignment by framing discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes lexicase selection, an iterative process that selects diverse and Pareto-optimal solutions. Our theoretical and empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions and policies, effectively catering to distinct groups without access to group numbers or membership labels. We verify the performance of POPL on a stateless preference learning setting, a Minigrid RL domain, Metaworld robotics benchmarks, as well as large language model (LLM) fine-tuning. We illustrate that POPL can also serve as a foundation for techniques optimizing specific notions of group fairness, ensuring safe and equitable AI model alignment.
Problem

Research questions and friction points this paper is trying to address.

Aligns AI models with diverse human preferences
Addresses unfairness in reward estimation
Ensures Pareto-optimal and equitable AI policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pareto Optimal Preference Learning
Lexicase selection process
Pluralistic alignment with trade-offs
🔎 Similar Papers
No similar papers found.