🤖 AI Summary
Existing reading behavior modeling relies heavily on aggregated eye-tracking data and strong assumptions, failing to capture fine-grained spatiotemporal dynamics of fixations and saccades. To address this, we propose a marked spatiotemporal point process model: (1) a Hawkes process explicitly models saccades, capturing spatiotemporal excitation effects between fixations; (2) a temporal convolutional network models fixation durations, characterizing temporal spill-over effects. The model integrates cognitive features—such as contextual surprisal—within a unified probabilistic framework. Evaluated on real-world eye-tracking data, it significantly outperforms established baselines. Notably, ablation experiments reveal that surprisal contributes minimally to fixation duration prediction, exposing its theoretical limitations in fine-grained oculomotor explanation. This work establishes a more interpretable and predictive probabilistic modeling framework for reading cognition, advancing the formal characterization of underlying neural and cognitive mechanisms.
📝 Abstract
Reading is a process that unfolds across space and time, alternating between fixations where a reader focuses on a specific point in space, and saccades where a reader rapidly shifts their focus to a new point. An ansatz of psycholinguistics is that modeling a reader's fixations and saccades yields insight into their online sentence processing. However, standard approaches to such modeling rely on aggregated eye-tracking measurements and models that impose strong assumptions, ignoring much of the spatio-temporal dynamics that occur during reading. In this paper, we propose a more general probabilistic model of reading behavior, based on a marked spatio-temporal point process, that captures not only how long fixations last, but also where they land in space and when they take place in time. The saccades are modeled using a Hawkes process, which captures how each fixation excites the probability of a new fixation occurring near it in time and space. The duration time of fixation events is modeled as a function of fixation-specific predictors convolved across time, thus capturing spillover effects. Empirically, our Hawkes process model exhibits a better fit to human saccades than baselines. With respect to fixation durations, we observe that incorporating contextual surprisal as a predictor results in only a marginal improvement in the model's predictive accuracy. This finding suggests that surprisal theory struggles to explain fine-grained eye movements.