🤖 AI Summary
This work pioneers the extension of the Rashomon effect to sequential decision-making, investigating the phenomenon wherein multiple structurally heterogeneous policies—differing in feature attribution or policy parameters—induce identical state-action trajectory distributions (i.e., behavioral equivalence). A key challenge lies in rigorously defining and verifying behavioral consistency under stochastic dynamics.
Method: We propose a formal definition of policy equivalence grounded in probabilistic trajectory modeling, along with a verification framework for behavioral equivalence in Markov decision processes (MDPs).
Contribution/Results: We theoretically characterize the Rashomon set of equivalent policies in MDPs and empirically demonstrate its ubiquity across mainstream reinforcement learning policies. Leveraging this set, we design a robust ensemble policy that achieves significant performance gains under distributional shift. Moreover, we synthesize verifiably admissible policies that preserve optimality while substantially reducing formal verification overhead.
📝 Abstract
The Rashomon effect describes the phenomenon where multiple models trained on the same data produce identical predictions while differing in which features they rely on internally. This effect has been studied extensively in classification tasks, but not in sequential decision-making, where an agent learns a policy to achieve an objective by taking actions in an environment. In this paper, we translate the Rashomon effect to sequential decision-making. We define it as multiple policies that exhibit identical behavior, visiting the same states and selecting the same actions, while differing in their internal structure, such as feature attributions. Verifying identical behavior in sequential decision-making differs from classification. In classification, predictions can be directly compared to ground-truth labels. In sequential decision-making with stochastic transitions, the same policy may succeed or fail on any single trajectory due to randomness. We address this using formal verification methods that construct and compare the complete probabilistic behavior of each policy in the environment. Our experiments demonstrate that the Rashomon effect exists in sequential decision-making. We further show that ensembles constructed from the Rashomon set exhibit greater robustness to distribution shifts than individual policies. Additionally, permissive policies derived from the Rashomon set reduce computational requirements for verification while maintaining optimal performance.