🤖 AI Summary
This study investigates whether “confirmation bias” and “positive bias” observed in human performance on a bimanual Bernoulli bandit task reflect genuine cognitive biases or are merely epiphenomena arising from decaying learning rates. Using Bayesian inference modeling, stochastic dynamical analysis, and master equation theory, we formally demonstrate that symmetric yet decaying learning rates alone suffice to reproduce the canonical statistical signatures of these biases; moreover, Bayesian updating is mathematically equivalent to Q-learning with a decaying learning rate. We show that standard asymmetric Q-learning models readily misattribute learning-rate decay to cognitive asymmetry. To resolve this confound, we propose a novel experimental paradigm and a parameter-identifiability framework enabling empirical discrimination between true cognitive biases and learning-rate dynamics. Our results establish rigorous model-selection criteria for causal interpretation of cognitive biases, advancing computational psychiatry and decision neuroscience.
📝 Abstract
Recent studies claim that human behavior in a two-armed Bernoulli bandit (TABB) task is described by positivity and confirmation biases, implying that humans do not integrate new information objectively. However, we find that even if the agent updates its belief via objective Bayesian inference, fitting the standard Q-learning model with asymmetric learning rates still recovers both biases. Bayesian inference cast as an effective Q-learning algorithm has symmetric, though decreasing, learning rates. We explain this by analyzing the stochastic dynamics of these learning systems using master equations. We find that both confirmation bias and unbiased but decreasing learning rates yield the same behavioral signatures. Finally, we propose experimental protocols to disentangle true cognitive biases from artifacts of decreasing learning rates.