Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return

📅 2023-11-05

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses trajectory-level nonlinear preference optimization in multi-objective reinforcement learning—specifically, maximizing the expected scalarized return (ESR) under a smooth, nonlinear aggregation function over cumulative rewards in multi-objective MDPs. To overcome the limitation of linear scalarization in capturing time-coupled optimality, we introduce the first extended Bellman optimality principle for nonlinear scalarization. Based on this principle, we propose the first falsifiable, pseudo-polynomial-time approximation algorithm capable of computing nonstationary policies under smooth scalarizers and fixed-dimensional reward vectors. We establish a bounded approximation ratio for the algorithm. Empirical evaluation across multiple benchmark tasks demonstrates that our ESR-based approach improves performance by 37%–62% over linear-weighted baselines, significantly enhancing the expressiveness and fidelity of nonlinear preference modeling.

📝 Abstract

We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective Markov Decision Process (MOMDP). We derive an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. Using this formulation, we describe an approximation algorithm for computing an approximately optimal non-stationary policy in pseudopolynomial time for smooth scalarization functions with a constant number of rewards. We prove the approximation analytically and demonstrate the algorithm experimentally, showing that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.

Problem

Research questions and friction points this paper is trying to address.

Maximize expected nonlinear scalarized return

Extend Bellman optimality for nonlinear preferences

Compute approximately optimal non-stationary policy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear optimization in MOMDP

Extended Bellman optimality formulation

Pseudopolynomial time approximation algorithm

🔎 Similar Papers

No similar papers found.