🤖 AI Summary
This work addresses the scalability bottleneck of multi-objective reinforcement learning (MORL) as the number of objectives increases, proposing a Pareto-preserving reward dimensionality reduction method tailored for online learning. To tackle this, the authors introduce, for the first time in MORL, a dynamic and adaptive dimensionality reduction mechanism that captures evolving reward structure in real time. They design a reward mapping module integrating manifold learning with online principal component analysis, tightly coupled with multi-objective policy gradient optimization and Pareto-frontier constraints. Furthermore, they establish the first dedicated training and evaluation framework for online dimensionality reduction in MORL. Experiments in a 16-objective setting demonstrate that the proposed method achieves a 3.2× speedup in training efficiency and a 47% improvement in Pareto coverage over existing online dimensionality reduction approaches, significantly enhancing both policy learning efficiency and Pareto-optimality preservation capability.
📝 Abstract
In this paper, we introduce a simple yet effective reward dimension reduction method to tackle the scalability challenges of multi-objective reinforcement learning algorithms. While most existing approaches focus on optimizing two to four objectives, their abilities to scale to environments with more objectives remain uncertain. Our method uses a dimension reduction approach to enhance learning efficiency and policy performance in multi-objective settings. While most traditional dimension reduction methods are designed for static datasets, our approach is tailored for online learning and preserves Pareto-optimality after transformation. We propose a new training and evaluation framework for reward dimension reduction in multi-objective reinforcement learning and demonstrate the superiority of our method in environments including one with sixteen objectives, significantly outperforming existing online dimension reduction methods.