Pareto Set Learning for Multi-Objective Reinforcement Learning

📅 2025-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address incomplete Pareto-frontier coverage and the lack of preference-customized policies in multi-objective reinforcement learning (MORL), this paper proposes PSL-MORL: a decomposition-based framework introducing the first hypernetwork-driven Pareto-set learning paradigm, which efficiently generates dedicated policy-network parameters for arbitrary scalarization weights. The framework is algorithm-agnostic, seamlessly integrating with diverse RL backbones. We provide theoretical guarantees on its enhanced model capacity and Pareto optimality. Empirical evaluation across multiple benchmark tasks demonstrates substantial improvements—+12.7% in Hypervolume and −38.5% in Sparsity—enabling high-density, preference-aware, and scalable Pareto-policy generation. PSL-MORL consistently outperforms state-of-the-art methods across all metrics, establishing new performance frontiers in MORL.

Technology Category

Application Category

📝 Abstract
Multi-objective decision-making problems have emerged in numerous real-world scenarios, such as video games, navigation and robotics. Considering the clear advantages of Reinforcement Learning (RL) in optimizing decision-making processes, researchers have delved into the development of Multi-Objective RL (MORL) methods for solving multi-objective decision problems. However, previous methods either cannot obtain the entire Pareto front, or employ only a single policy network for all the preferences over multiple objectives, which may not produce personalized solutions for each preference. To address these limitations, we propose a novel decomposition-based framework for MORL, Pareto Set Learning for MORL (PSL-MORL), that harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight, generating relatively distinct policies for various scalarized subproblems with high efficiency. PSL-MORL is a general framework, which is compatible for any RL algorithm. The theoretical result guarantees the superiority of the model capacity of PSL-MORL and the optimality of the obtained policy network. Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.
Problem

Research questions and friction points this paper is trying to address.

Multi-Objective Reinforcement Learning
Pareto Optimal Solutions
Adaptive Strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Reinforcement Learning
PSL-MORL
Optimized Decision Strategies
E
Erlong Liu
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China; School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
Yu-Chang Wu
Yu-Chang Wu
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China; School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
Xiaobin Huang
Xiaobin Huang
Nanjing University
machine learningbayesian optimization
C
Chengrui Gao
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China; School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
Ren-Jian Wang
Ren-Jian Wang
Nanjing University
Quality-DiversityEvolutionary AlgorithmsReinforcement LearningMachine Learning
Ke Xue
Ke Xue
Nanjing University
Black-Box OptimizationMachine Learning
Chao Qian
Chao Qian
Nanjing University
Artificial intelligenceevolutionary algorithmsmachine learning