π€ AI Summary
Existing evaluation metrics in multi-objective reinforcement learning (MORL) struggle to assess how well preference-conditioned agents respond to user intent and lack a quantitative measure of controllability. This work establishes controllability as a critical property of MORL systems and introduces a novel metric specifically designed to evaluate the controllability of preference-conditioned policies, along with a complementary evaluation protocol. The proposed approach reveals that while current agents may perform well on standard benchmarks, they can exhibit insensitivity to preference inputsβa limitation obscured by prevailing evaluation practices. These findings highlight a significant gap in the dominant MORL assessment paradigm and motivate the community to re-examine how agent performance is evaluated in preference-based settings.
π Abstract
Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a result, preference-conditioned agents can score well on standard MORL metrics while being insensitive to the preference input. If the ability to control agents cannot be reliably assessed, the symbolic interface that MORL provides between user intent and agent behavior is broken. Mainstream MORL metrics alone fail to measure the controllability of preference-conditioned agents, motivating a complementary metric specifically designed to that end. We hope the results spur discussion in the community on existing evaluation protocols to consolidate advances in preference adaptation in MORL to larger and more complex problems.