Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

📅 2021-05-28

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 1

career value

223K/year

🤖 AI Summary

This work addresses the challenge of maximizing nonlinear concave aggregate rewards in multi-objective reinforcement learning (MORL), introducing general nonlinear concave combination objectives into the MORL framework for the first time. We propose a model-free policy gradient algorithm featuring a biased yet convergent gradient estimator. We rigorously establish its sample complexity for achieving an ε-optimal policy as O(M⁴σ²/((1−γ)⁸ε⁴)), where the ε-dependence matches that of single-objective policy gradient methods. Our approach overcomes fundamental limitations of conventional linear scalarization and Pareto-frontier optimization in MORL, simultaneously ensuring strong modeling expressivity—particularly for naturally concave structures—and provable convergence. It provides a novel paradigm for long-horizon cooperative optimization problems in engineering applications, such as resource allocation and energy-efficiency balancing.

📝 Abstract

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, a biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an $epsilon$ of the global optima after sampling $mathcal{O}(frac{M^4sigma^2}{(1-gamma)^8epsilon^4})$ trajectories where $gamma$ is the discount factor and $M$ is the number of the agents, thus achieving the same dependence on $epsilon$ as the policy gradient algorithm for the standard reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Optimizing non-linear functions of multiple objectives in engineering problems

Maximizing concave functions of multiple long-term objectives via reinforcement learning

Developing policy-gradient algorithms for multi-objective optimization convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy-gradient based model-free algorithm

Biased estimator for gradient computation

Convergence to global optima with sampling

🔎 Similar Papers

No similar papers found.