Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key challenge in human-AI alignment: modeling diverse and socially situated human value systems in sequential decision-making while achieving interpretable and adaptive value alignment. The authors propose a novel preference-based multi-objective reinforcement learning (PbMORL) framework that integrates clustering with preference-driven learning to jointly infer the distribution of values across distinct user groups and their corresponding Pareto-approximate optimal policies. By doing so, the approach enhances both the interpretability of value identification and adaptability to environmental dynamics, while preserving policy diversity. Evaluated on two Markov decision processes embedded with human values, the method significantly outperforms existing PbMORL and baseline approaches, effectively capturing heterogeneous group-level value systems and generating aligned behavioral policies.

Technology Category

Application Category

📝 Abstract
Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users. This requires operationalization of values, which can be prone to misspecification. The social nature of values demands their representation to adhere to multiple users while value systems are diverse, yet exhibit patterns among groups. In sequential decision making, efforts have been made towards personalization for different goals or values from demonstrations of diverse agents. However, these approaches demand manually designed features or lack value-based interpretability and/or adaptability to diverse user preferences. We propose algorithms for learning models of value alignment and value systems for a society of agents in Markov Decision Processes (MDPs), based on clustering and preference-based multi-objective reinforcement learning (PbMORL). We jointly learn socially-derived value alignment models (groundings) and a set of value systems that concisely represent different groups of users (clusters) in a society. Each cluster consists of a value system representing the value-based preferences of its members and an approximately Pareto-optimal policy that reflects behaviours aligned with this value system. We evaluate our method against a state-of-the-art PbMORL algorithm and baselines on two MDPs with human values.
Problem

Research questions and friction points this paper is trying to address.

value systems
preference-based multi-objective reinforcement learning
value alignment
socially-derived preferences
Markov Decision Processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

preference-based multi-objective reinforcement learning
value alignment
clustering
Markov Decision Processes
Pareto-optimal policy
🔎 Similar Papers
No similar papers found.