🤖 AI Summary
Existing bounded-rational decision models—based on entropy, KL divergence, or mutual information—fail to capture action-space locality, prior biases, and zero-support distributions in ordinal action spaces. Method: This paper introduces the Wasserstein distance into bounded-rational reinforcement learning, proposing a novel decision-theoretic framework explicitly designed for ordinal action spaces. The framework naturally encodes action “stickiness” and geometric proximity, accommodates asymmetric supports, zero-probability priors, and asymmetric ground distances, and enables tractable optimization via constrained variational inference. Contribution/Results: Experiments demonstrate that the model significantly improves policy interpretability and robustness. It accurately captures action-switching inertia and sensitivity to prior beliefs in simulation, offering a more empirically grounded, information-theoretic paradigm for ordinal decision-making under bounded rationality.
📝 Abstract
Modelling bounded rational decision-making through information constrained processing provides a principled approach for representing departures from rationality within a reinforcement learning framework, while still treating decision-making as an optimization process. However, existing approaches are generally based on Entropy, Kullback-Leibler divergence, or Mutual Information. In this work, we highlight issues with these approaches when dealing with ordinal action spaces. Specifically, entropy assumes uniform prior beliefs, missing the impact of a priori biases on decision-makings. KL-Divergence addresses this, however, has no notion of"nearness"of actions, and additionally, has several well known potentially undesirable properties such as the lack of symmetry, and furthermore, requires the distributions to have the same support (e.g. positive probability for all actions). Mutual information is often difficult to estimate. Here, we propose an alternative approach for modeling bounded rational RL agents utilising Wasserstein distances. This approach overcomes the aforementioned issues. Crucially, this approach accounts for the nearness of ordinal actions, modeling"stickiness"in agent decisions and unlikeliness of rapidly switching to far away actions, while also supporting low probability actions, zero-support prior distributions, and is simple to calculate directly.