Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing methods for non-convex multi-objective reinforcement learning (MORL) suffer from difficulty in exploring Pareto-stationary policies and lack finite-time theoretical guarantees. Method: We propose MOCHA, the first algorithm to deeply integrate weighted Chebyshev scalarization with the Actor-Critic framework, incorporating dynamic weight adaptation and gradient-based policy updates to systematically explore the Pareto-stationary policy set. Contributions/Results: Theoretically, we establish the first finite-time sample complexity analysis for Pareto stationarity in non-convex MORL, proving a convergence rate of $ ilde{mathcal{O}}(varepsilon^{-2})$ that explicitly depends on the minimum component $p_{min}$ of the weight vector. Empirically, MOCHA achieves significant improvements over state-of-the-art MORL baselines on the large-scale offline KuaiRand dataset, demonstrating both theoretical rigor and practical effectiveness.

Technology Category

Application Category

📝 Abstract

In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a uline{M}ulti-uline{O}bjective weighted-uline{CH}ebyshev uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{min}$ in finding an $ε$-Pareto-stationary solution, where $p_{min}$ denotes the minimum entry of a given weight vector $mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $ ilde{mathcal{O}}(ε^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.

Problem

Research questions and friction points this paper is trying to address.

Explores Pareto-stationary solutions in MORL

Addresses non-convex reward objectives with sample complexity

Proposes MOCHA algorithm for systematic exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Weighted-Chebyshev Actor-Critic

Finite-time sample complexity guarantee

Pareto-stationarity exploration systematically

🔎 Similar Papers

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning