🤖 AI Summary
This paper introduces and addresses the novel problem of “open-ended text-specific information-seeking goal decoding” from eye-tracking data: Can fine-grained, unconstrained reading intentions—unspecified a priori—be automatically inferred from natural reading eye-movement trajectories? To this end, we propose a dual-task framework integrating multimodal large language models (LLMs), sequential eye-movement modeling, text–eye-movement alignment representations, and generative goal reconstruction. Our contributions are threefold: (1) formal definition of the first open-ended reading goal decoding task; (2) establishment of the first benchmark for evaluating eye-movement-based text-specific goal inference; and (3) joint discriminative classification and generative reconstruction modeling. Evaluated on a large-scale English eye-tracking dataset, both tasks achieve significant performance, demonstrating that eye-movement signals encode rich, decodable information about readers’ underlying cognitive goals.
📝 Abstract
When reading, we often have specific information that interests us in a text. For example, you might be reading this paper because you are curious about LLMs for eye movements in reading, the experimental design, or perhaps you only care about the question ``but does it work?''. More broadly, in daily life, people approach texts with any number of text-specific goals that guide their reading behavior. In this work, we ask, for the first time, whether open-ended reading goals can be automatically decoded from eye movements in reading. To address this question, we introduce goal classification and goal reconstruction tasks and evaluation frameworks, and use large-scale eye tracking for reading data in English with hundreds of text-specific information seeking tasks. We develop and compare several discriminative and generative multimodal LLMs that combine eye movements and text for goal classification and goal reconstruction. Our experiments show considerable success on both tasks, suggesting that LLMs can extract valuable information about the readers' text-specific goals from eye movements.