🤖 AI Summary
This study addresses the problem of inferring human implicit attention patterns solely from Atari gameplay action sequences—without eye-tracking or neural data. To this end, we propose the Contextualized Task-Relevant (CTR) attention network, the first method capable of generating sparse, decision-oriented implicit attention maps directly from offline reinforcement learning (RL) trajectories. Validated via attention distillation and comparative evaluation against the eye-movement-driven TIOA model, CTR-generated attention maps exhibit high spatial, temporal, and policy-level alignment with human ground truth—significantly outperforming standard RL agents—and thereby demonstrate physiological plausibility. Our key contributions are threefold: (1) establishing the first implicit attention modeling framework relying exclusively on action sequences; (2) revealing fundamental mechanistic differences between human and RL agent attention allocation; and (3) introducing a novel, non-invasive paradigm for cognitive modeling grounded in behavioral traces.
📝 Abstract
This study introduces a novel method for revealing human covert attention patterns using gameplay data alone, utilizing offline attention techniques from reinforcement learning (RL). We propose the contextualized, task-relevant (CTR) attention network, which generates attention maps from both human and RL agent gameplay in Atari environments. These maps are sparse yet retain the necessary information for the current player's decision making. We compare the CTR-derived attention maps with a temporally integrated overt attention (TIOA) model based on eye-tracking data, serving as a point of comparison and discussion. Visual inspection reveals distinct attention patterns: human CTR maps focus on the player and rather nearby opponents, occasionally shifting between stronger focus and broader views - sometimes even attending to empty space ahead. In contrast, agent maps maintain a consistent broad focus on most objects, including distant ones and the player. Quantitative analysis further demonstrates that human CTR maps align more closely with TIOA than agent maps do. Our findings indicate that the CTR attention network can effectively reveal human covert attention patterns from gameplay alone, without the need for additional data like brain activity recordings. This work contributes to understanding human-agent attention differences and enables the development of RL agents augmented with human covert attention.