Multi-Objective Reinforcement Learning with Max-Min Criterion: A Game-Theoretic Approach

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of optimizing under the max-min criterion in multi-objective reinforcement learning (MO-RL). We formulate the problem as a two-player zero-sum regularized continuous game and develop an efficient policy update algorithm based on mirror descent. Our approach delivers the first global last-iterate convergence guarantee for this setting. A key innovation is the introduction of an adaptive regularization mechanism, coupled with a unified analytical framework that jointly handles exact and approximate policy evaluation—yielding tight sample complexity bounds. The theoretical analysis is established for tabular MDPs, while empirical evaluation demonstrates substantial improvements over existing baselines in deep RL settings. Overall, our method establishes a new paradigm for max-min MO-RL that bridges rigorous theoretical foundations with practical efficacy.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a provably convergent and practical framework for multi-objective reinforcement learning with max-min criterion. From a game-theoretic perspective, we reformulate max-min multi-objective reinforcement learning as a two-player zero-sum regularized continuous game and introduce an efficient algorithm based on mirror descent. Our approach simplifies the policy update while ensuring global last-iterate convergence. We provide a comprehensive theoretical analysis on our algorithm, including iteration complexity under both exact and approximate policy evaluations, as well as sample complexity bounds. To further enhance performance, we modify the proposed algorithm with adaptive regularization. Our experiments demonstrate the convergence behavior of the proposed algorithm in tabular settings, and our implementation for deep reinforcement learning significantly outperforms previous baselines in many MORL environments.

Problem

Research questions and friction points this paper is trying to address.

Develops multi-objective reinforcement learning with max-min criterion

Reformulates problem as two-player zero-sum regularized game

Provides theoretical analysis and efficient mirror descent algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates MORL as two-player zero-sum game

Uses mirror descent for efficient policy updates

Enhances performance with adaptive regularization

🔎 Similar Papers

No similar papers found.