🤖 AI Summary
This work addresses the challenge of optimizing under the max-min criterion in multi-objective reinforcement learning (MO-RL). We formulate the problem as a two-player zero-sum regularized continuous game and develop an efficient policy update algorithm based on mirror descent. Our approach delivers the first global last-iterate convergence guarantee for this setting. A key innovation is the introduction of an adaptive regularization mechanism, coupled with a unified analytical framework that jointly handles exact and approximate policy evaluation—yielding tight sample complexity bounds. The theoretical analysis is established for tabular MDPs, while empirical evaluation demonstrates substantial improvements over existing baselines in deep RL settings. Overall, our method establishes a new paradigm for max-min MO-RL that bridges rigorous theoretical foundations with practical efficacy.
📝 Abstract
In this paper, we propose a provably convergent and practical framework for multi-objective reinforcement learning with max-min criterion. From a game-theoretic perspective, we reformulate max-min multi-objective reinforcement learning as a two-player zero-sum regularized continuous game and introduce an efficient algorithm based on mirror descent. Our approach simplifies the policy update while ensuring global last-iterate convergence. We provide a comprehensive theoretical analysis on our algorithm, including iteration complexity under both exact and approximate policy evaluations, as well as sample complexity bounds. To further enhance performance, we modify the proposed algorithm with adaptive regularization. Our experiments demonstrate the convergence behavior of the proposed algorithm in tabular settings, and our implementation for deep reinforcement learning significantly outperforms previous baselines in many MORL environments.