🤖 AI Summary
Existing research lacks foundational models that directly predict consumer choice behavior from raw eye-tracking time series (e.g., gaze coordinates). Method: We propose STARE, a novel framework that tokenizes continuous gaze coordinates into spatiotemporal sequences and—first in eye-tracking analysis—adapts the temporal foundation model Chronos. STARE integrates co-attention to model monocular spatiotemporal dynamics and cross-attention to capture binocular coordination mechanisms, enabling end-to-end prediction without predefined regions of interest or handcrafted features. Contribution/Results: STARE learns dynamic visual attention patterns directly from raw gaze data. Evaluated across multiple consumer decision-making datasets, it significantly outperforms state-of-the-art methods, demonstrating both effectiveness in modeling visual attention mechanisms and strong generalization capability for choice prediction.
📝 Abstract
The present work proposes a Deep Learning architecture for the prediction of various consumer choice behaviors from time series of raw gaze or eye fixations on images of the decision environment, for which currently no foundational models are available. The architecture, called STARE (Spatio-Temporal Attention Representation for Eye Tracking), uses a new tokenization strategy, which involves mapping the x- and y- pixel coordinates of eye-movement time series on predefined, contiguous Regions of Interest. That tokenization makes the spatio-temporal eye-movement data available to the Chronos, a time-series foundation model based on the T5 architecture, to which co-attention and/or cross-attention is added to capture directional and/or interocular influences of eye movements. We compare STARE with several state-of-the art alternatives on multiple datasets with the purpose of predicting consumer choice behaviors from eye movements. We thus make a first step towards developing and testing DL architectures that represent visual attention dynamics rooted in the neurophysiology of eye movements.