๐ค AI Summary
This study investigates the origins of human eye movement patterns during free viewing and proposes that they emerge as a natural byproduct of optimizing scene understanding under foveal visual constraints. To test this hypothesis, we developed a computational agent equipped with a foveated visual system and trained itโvia reinforcement learning or self-supervised strategiesโto perform scene understanding tasks without any exposure to human gaze data. Remarkably, the agent spontaneously developed fixation behaviors highly consistent with those of humans, significantly outperforming control models explicitly designed for search or classification tasks. This work provides the first computational modeling evidence establishing an intrinsic link between gaze patterns and perceptual goals, suggesting that human-like fixations arise not from task-specific tuning but from general principles of efficient visual processing under biological constraints.
๐ Abstract
When humans view scenes without a specific task (free-viewing), they initially direct their eye movements toward the scene center and then fixate on people, text, objects being gazed at or grasped, and semantically meaningful regions. What these signature fixation patterns reflect and whether they optimize an underlying perceptual task remain unknown. We show that a computational agent with simulated foveation, trained to optimize scene comprehension, exhibits emergent human fixation signature patterns. In contrast, versions of the agent trained to search or classify scenes, or equipped with peripheral vision that was better or worse than human vision, predicted human fixation patterns less accurately. Thus, human free-viewing fixation patterns may emerge as a functional byproduct of optimizing scene comprehension under the biological constraints of foveated vision.