🤖 AI Summary
To address incomplete Pareto-frontier coverage and the lack of preference-customized policies in multi-objective reinforcement learning (MORL), this paper proposes PSL-MORL: a decomposition-based framework introducing the first hypernetwork-driven Pareto-set learning paradigm, which efficiently generates dedicated policy-network parameters for arbitrary scalarization weights. The framework is algorithm-agnostic, seamlessly integrating with diverse RL backbones. We provide theoretical guarantees on its enhanced model capacity and Pareto optimality. Empirical evaluation across multiple benchmark tasks demonstrates substantial improvements—+12.7% in Hypervolume and −38.5% in Sparsity—enabling high-density, preference-aware, and scalable Pareto-policy generation. PSL-MORL consistently outperforms state-of-the-art methods across all metrics, establishing new performance frontiers in MORL.
📝 Abstract
Multi-objective decision-making problems have emerged in numerous real-world scenarios, such as video games, navigation and robotics. Considering the clear advantages of Reinforcement Learning (RL) in optimizing decision-making processes, researchers have delved into the development of Multi-Objective RL (MORL) methods for solving multi-objective decision problems. However, previous methods either cannot obtain the entire Pareto front, or employ only a single policy network for all the preferences over multiple objectives, which may not produce personalized solutions for each preference. To address these limitations, we propose a novel decomposition-based framework for MORL, Pareto Set Learning for MORL (PSL-MORL), that harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight, generating relatively distinct policies for various scalarized subproblems with high efficiency. PSL-MORL is a general framework, which is compatible for any RL algorithm. The theoretical result guarantees the superiority of the model capacity of PSL-MORL and the optimality of the obtained policy network. Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.