Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the scalability bottleneck of multi-objective reinforcement learning (MORL) as the number of objectives increases, proposing a Pareto-preserving reward dimensionality reduction method tailored for online learning. To tackle this, the authors introduce, for the first time in MORL, a dynamic and adaptive dimensionality reduction mechanism that captures evolving reward structure in real time. They design a reward mapping module integrating manifold learning with online principal component analysis, tightly coupled with multi-objective policy gradient optimization and Pareto-frontier constraints. Furthermore, they establish the first dedicated training and evaluation framework for online dimensionality reduction in MORL. Experiments in a 16-objective setting demonstrate that the proposed method achieves a 3.2× speedup in training efficiency and a 47% improvement in Pareto coverage over existing online dimensionality reduction approaches, significantly enhancing both policy learning efficiency and Pareto-optimality preservation capability.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce a simple yet effective reward dimension reduction method to tackle the scalability challenges of multi-objective reinforcement learning algorithms. While most existing approaches focus on optimizing two to four objectives, their abilities to scale to environments with more objectives remain uncertain. Our method uses a dimension reduction approach to enhance learning efficiency and policy performance in multi-objective settings. While most traditional dimension reduction methods are designed for static datasets, our approach is tailored for online learning and preserves Pareto-optimality after transformation. We propose a new training and evaluation framework for reward dimension reduction in multi-objective reinforcement learning and demonstrate the superiority of our method in environments including one with sixteen objectives, significantly outperforming existing online dimension reduction methods.

Problem

Research questions and friction points this paper is trying to address.

Address scalability in multi-objective reinforcement learning

Enhance learning efficiency and policy performance

Preserve Pareto-optimality in online learning environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward dimension reduction for scalability

Preserves Pareto-optimality in online learning

Superior performance in multi-objective environments

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving