Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

๐Ÿ“… 2026-05-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

226K/year
๐Ÿค– AI Summary
This study investigates the origins of human eye movement patterns during free viewing and proposes that they emerge as a natural byproduct of optimizing scene understanding under foveal visual constraints. To test this hypothesis, we developed a computational agent equipped with a foveated visual system and trained itโ€”via reinforcement learning or self-supervised strategiesโ€”to perform scene understanding tasks without any exposure to human gaze data. Remarkably, the agent spontaneously developed fixation behaviors highly consistent with those of humans, significantly outperforming control models explicitly designed for search or classification tasks. This work provides the first computational modeling evidence establishing an intrinsic link between gaze patterns and perceptual goals, suggesting that human-like fixations arise not from task-specific tuning but from general principles of efficient visual processing under biological constraints.
๐Ÿ“ Abstract
When humans view scenes without a specific task (free-viewing), they initially direct their eye movements toward the scene center and then fixate on people, text, objects being gazed at or grasped, and semantically meaningful regions. What these signature fixation patterns reflect and whether they optimize an underlying perceptual task remain unknown. We show that a computational agent with simulated foveation, trained to optimize scene comprehension, exhibits emergent human fixation signature patterns. In contrast, versions of the agent trained to search or classify scenes, or equipped with peripheral vision that was better or worse than human vision, predicted human fixation patterns less accurately. Thus, human free-viewing fixation patterns may emerge as a functional byproduct of optimizing scene comprehension under the biological constraints of foveated vision.
Problem

Research questions and friction points this paper is trying to address.

human fixation patterns
scene understanding
foveated vision
free-viewing
perceptual task
Innovation

Methods, ideas, or system contributions that make the work stand out.

foveated vision
scene understanding
human-like fixation
emergent behavior
computational modeling
๐Ÿ”Ž Similar Papers