🤖 AI Summary
This study investigates human attentional dynamics under complex conditions—including dimensional switching (e.g., rule changes or feature additions), delayed feedback, and counterfactual feedback. Addressing the limited generalizability of reward prediction error (RPE)-based attention models, we conduct the first systematic comparison between RPE and information-theoretic entropy—defined as uncertainty quantified from historical experience—as competing modeling paradigms. Our method integrates a contextual bandit simulation, entropy-driven attentional weight updating, counterfactual reasoning, and iterative RPE-based learning. Results demonstrate that the entropy model significantly outperforms RPE across both cross-dimensional and within-dimensional switching tasks, as well as under delayed and counterfactual feedback regimes: it achieves higher fidelity in fitting human eye-tracking and choice behavior, with greater robustness. This work establishes the theoretical superiority and empirical primacy of information-theoretic approaches for modeling adaptive attention under high uncertainty.
📝 Abstract
Attention can be used to inform choice selection in contextual bandit tasks even when context features have not been previously experienced. One example of this is in dimensional shifts, where additional feature values are introduced and the relationship between features and outcomes can either be static or variable. Attentional mechanisms have been extensively studied in contextual bandit tasks where the feedback of choices is provided immediately, but less research has been done on tasks where feedback is delayed or in counterfactual feedback cases. Some methods have successfully modeled human attention with immediate feedback based on reward prediction errors (RPEs), though recent research raises questions of the applicability of RPEs onto more general attentional mechanisms. Alternative models suggest that information theoretic metrics can be used to model human attention, with broader applications to novel stimuli. In this paper, we compare two different methods for modeling how humans attend to specific features of decision making tasks, one that is based on calculating an information theoretic metric using a memory of past experiences, and another that is based on iteratively updating attention from reward prediction errors. We compare these models using simulations in a contextual bandit task with both intradimensional and extradimensional domain shifts, as well as immediate, delayed, and counterfactual feedback. We find that calculating an information theoretic metric over a history of experiences is best able to account for human-like behavior in tasks that shift dimensions and alter feedback presentation. These results indicate that information theoretic metrics of attentional mechanisms may be better suited than RPEs to predict human attention in decision making, though further studies of human behavior are necessary to support these results.