Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for non-convex multi-objective reinforcement learning (MORL) suffer from difficulty in exploring Pareto-stationary policies and lack finite-time theoretical guarantees. Method: We propose MOCHA, the first algorithm to deeply integrate weighted Chebyshev scalarization with the Actor-Critic framework, incorporating dynamic weight adaptation and gradient-based policy updates to systematically explore the Pareto-stationary policy set. Contributions/Results: Theoretically, we establish the first finite-time sample complexity analysis for Pareto stationarity in non-convex MORL, proving a convergence rate of $ ilde{mathcal{O}}(varepsilon^{-2})$ that explicitly depends on the minimum component $p_{min}$ of the weight vector. Empirically, MOCHA achieves significant improvements over state-of-the-art MORL baselines on the large-scale offline KuaiRand dataset, demonstrating both theoretical rigor and practical effectiveness.

Technology Category

Application Category

📝 Abstract
In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a uline{M}ulti-uline{O}bjective weighted-uline{CH}ebyshev uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{min}$ in finding an $ε$-Pareto-stationary solution, where $p_{min}$ denotes the minimum entry of a given weight vector $mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $ ilde{mathcal{O}}(ε^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.
Problem

Research questions and friction points this paper is trying to address.

Explores Pareto-stationary solutions in MORL
Addresses non-convex reward objectives with sample complexity
Proposes MOCHA algorithm for systematic exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Weighted-Chebyshev Actor-Critic
Finite-time sample complexity guarantee
Pareto-stationarity exploration systematically
🔎 Similar Papers
No similar papers found.
F
Fnu Hairi
Department of Computer Science, University of Wisconsin-Whitewater, Whitewater, WI, USA
J
Jiao Yang
Amazon, Seattle, USA
Tianchen Zhou
Tianchen Zhou
Amazon
Reinforcement LearningMulti-Armed BanditMulti-Objective Optimization
Haibo Yang
Haibo Yang
Rochester Institute of Technology
Federated LearningOptimizationMachine Learning
Chaosheng Dong
Chaosheng Dong
Amazon
OptimizationMachine LearningInformation Retrieval
F
Fan Yang
Amazon, Seattle, USA
Michinari Momma
Michinari Momma
Principal Applied Scientist at Amazon.com
Y
Yan Gao
Amazon, Seattle, USA
J
Jia Liu
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA