🤖 AI Summary
This study investigates why humans prefer suboptimal “frequency matching”—selecting options with probabilities approximating their target occurrence frequencies—in stochastic reward environments.
Method: Using a “hide-and-seek” behavioral experiment, we construct a vector representation model based on choice-frequency histograms and formally propose and characterize “probability inverse matching”—a strategy implemented via vector reflection.
Contribution/Results: We demonstrate that only two orthogonal basis strategies—frequency matching versus inverse matching (exploratory responses), and reward maximization versus minimization (exploitative responses)—suffice to unify individual choice patterns across varying numbers of rooms and opponent distributions. A mixture model combining these strategies significantly improves behavioral prediction accuracy. Our findings reveal that human decision-making in such environments can be effectively captured within a low-dimensional strategy space, providing a concise, computationally tractable framework for understanding the cognitive mechanisms underlying adaptive decision-making.
📝 Abstract
When people pursue rewards in stochastic environments, they often match their choice frequencies to the observed target frequencies, even when this policy is demonstrably sub-optimal. We used a ``hide and seek''task to evaluate this behavior under conditions where pursuit (seeking) could be toggled to avoidance (hiding), while leaving the probability distribution fixed, or varying complexity by changing the number of possible choices. We developed a model for participant choice built from choice frequency histograms treated as vectors. We posited the existence of a probability antimatching strategy for avoidance (hiding) rounds, and formalized this as a vector reflection of probability matching. We found that only two basis policies: matching/antimatching and maximizing/minimizing were sufficient to account for participant choices across a range of room numbers and opponent probability distributions. This schema requires only that people have the ability to remember the relative frequency of the different outcomes. With this knowledge simple operations can construct the maximizing and minimizing policies as well as matching and antimatching strategies. A mixture of these two policies captures human choice patterns in a stochastic environment.