Multi-Policy Pareto Front Tracking Based Online and Offline Multi-Objective Reinforcement Learning

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
Traditional multi-objective reinforcement learning (MORL) methods relying on large-population evolutionary search suffer from low sample efficiency and high environmental interaction overhead. To address this, we propose a population-free, unified online/offline Pareto front tracking framework. Our approach operates in four stages: Pareto vertex policy initialization, continuous front tracking, dynamic weight adjustment for sparse regions, and policy aggregation—thereby eliminating dependence on large-scale population evolution. The proposed Pareto tracking mechanism, coupled with adaptive sparse-weight sampling, significantly improves front coverage and sample efficiency. Evaluated on seven continuous-control benchmarks, our method achieves superior hypervolume performance compared to state-of-the-art approaches, while requiring fewer environment interactions and lower hardware overhead.

Technology Category

Application Category

📝 Abstract
Multi-objective reinforcement learning (MORL) plays a pivotal role in addressing multi-criteria decision-making problems in the real world. The multi-policy (MP) based methods are widely used to obtain high-quality Pareto front approximation for the MORL problems. However, traditional MP methods only rely on the online reinforcement learning (RL) and adopt the evolutionary framework with a large policy population. This may lead to sample inefficiency and/or overwhelmed agent-environment interactions in practice. By forsaking the evolutionary framework, we propose the novel Multi-policy Pareto Front Tracking (MPFT) framework without maintaining any policy population, where both online and offline MORL algorithms can be applied. The proposed MPFT framework includes four stages: Stage 1 approximates all the Pareto-vertex policies, whose mapping to the objective space fall on the vertices of the Pareto front. Stage 2 designs the new Pareto tracking mechanism to track the Pareto front, starting from each of the Pareto-vertex policies. Stage 3 identifies the sparse regions in the tracked Pareto front, and introduces a new objective weight adjustment method to fill the sparse regions. Finally, by combining all the policies tracked in Stages 2 and 3, Stage 4 approximates the Pareto front. Experiments are conducted on seven different continuous-action robotic control tasks with both online and offline MORL algorithms, and demonstrate the superior hypervolume performance of our proposed MPFT approach over the state-of-the-art benchmarks, with significantly reduced agent-environment interactions and hardware requirements.
Problem

Research questions and friction points this paper is trying to address.

Improves sample efficiency in multi-objective reinforcement learning
Tracks Pareto front without maintaining large policy populations
Combines online and offline methods for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-policy Pareto Front Tracking framework
Online and offline MORL algorithms integration
Objective weight adjustment for sparse regions