A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the challenge in multi-objective reinforcement learning (MORL) of generalizing to arbitrary preference weightings when user preferences are unknown. It introduces reward-free reinforcement learning (RFRL) into MORL for the first time in a systematic manner, leveraging RFRL objectives as auxiliary tasks to enhance policy generalization to unseen preference-weighted rewards. The proposed approach integrates a conditional policy network with a preference-guided exploration mechanism, enabling effective knowledge transfer across reward functions and sample-efficient learning. Evaluated on multiple MO-Gymnasium benchmarks, the method significantly outperforms existing MORL algorithms, achieving state-of-the-art performance in both final policy quality and data efficiency.

Technology Category

Application Category

📝 Abstract
Many sequential decision-making tasks involve optimizing multiple conflicting objectives, requiring policies that adapt to different user preferences. In multi-objective reinforcement learning (MORL), one widely studied approach} addresses this by training a single policy network conditioned on preference-weighted rewards. In this paper, we explore a novel algorithmic perspective: leveraging reward-free reinforcement learning (RFRL) for MORL. While RFRL has historically been studied independently of MORL, it learns optimal policies for any possible reward function, making it a natural fit for MORL's challenge of handling unknown user preferences. We propose using the RFRL's training objective as an auxiliary task to enhance MORL, enabling more effective knowledge sharing beyond the multi-objective reward function given at training time. To this end, we adapt a state-of-the-art RFRL algorithm to the MORL setting and introduce a preference-guided exploration strategy that focuses learning on relevant parts of the environment. Through extensive experiments and ablation studies, we demonstrate that our approach significantly outperforms the state-of-the-art MORL methods across diverse MO-Gymnasium tasks, achieving superior performance and data efficiency. This work provides the first systematic adaptation of RFRL to MORL, demonstrating its potential as a scalable and empirically effective solution to multi-objective policy learning.
Problem

Research questions and friction points this paper is trying to address.

multi-objective reinforcement learning
reward-free reinforcement learning
user preferences
policy adaptation
sequential decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward-free reinforcement learning
multi-objective reinforcement learning
preference-guided exploration
auxiliary task
policy generalization
🔎 Similar Papers
No similar papers found.