TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the limited interpretability of policy decisions in multi-objective reinforcement learning, which obscures the underlying trade-off mechanisms among competing objectives. To this end, the paper proposes TREX, a novel framework that uniquely integrates trajectory attribution with semantic behavior clustering. TREX generates expert trajectories, clusters temporally coherent behavioral segments, and trains complementary policies to quantitatively assess each behavior’s contribution to the Pareto front. By doing so, it overcomes the constraints of single-reward interpretable reinforcement learning and enables user-preference-driven analysis of multi-objective trade-offs. Empirical evaluation on multi-objective MuJoCo benchmarks—including HalfCheetah, Ant, and Swimmer—demonstrates that TREX effectively identifies and quantifies the specific contributions of key behavioral patterns to multi-objective optimization trade-offs.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) has demonstrated its ability to solve complex decision-making problems in a variety of domains, by optimizing reward signals obtained through interaction with an environment. However, many real-world scenarios involve multiple, potentially conflicting objectives that cannot be easily represented by a single scalar reward. Multi-Objective Reinforcement Learning (MORL) addresses this limitation by enabling agents to optimize several objectives simultaneously, explicitly reasoning about trade-offs between them. However, the ``black box" nature of the RL models makes the decision process behind chosen objective trade-offs unclear. Current Explainable Reinforcement Learning (XRL) methods are typically designed for single scalar rewards and do not account for explanations with respect to distinct objectives or user preferences. To address this gap, in this paper we propose TREX, a Trajectory based Explainability framework to explain Multi-objective Reinforcement Learning policies, based on trajectory attribution. TREX generates trajectories directly from the learned expert policy, across different user preferences and clusters them into semantically meaningful temporal segments. We quantify the influence of these behavioural segments on the Pareto trade-off by training complementary policies that exclude specific clusters, measuring the resulting relative deviation on the observed rewards and actions compared to the original expert policy. Experiments on multi-objective MuJoCo environments - HalfCheetah, Ant and Swimmer, demonstrate the framework's ability to isolate and quantify the specific behavioural patterns.

Problem

Research questions and friction points this paper is trying to address.

Multi-Objective Reinforcement Learning

Explainable AI

Trajectory Explanation

Pareto Trade-off

Reinforcement Learning Interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Reinforcement Learning

Explainable AI

Trajectory Attribution