Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging

📅 2024-11-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how target value and frequency jointly modulate eye-movement decisions and behavioral strategies in human mixed visual foraging. Combining psychophysical experiments with computational modeling, we introduce the first Transformer-based Visual Foraging (VF) model that integrates foveated vision and reinforcement learning. The VF model explicitly encodes a value–frequency trade-off mechanism, enabling high-fidelity replication of human gaze biases, dynamic fixation duration adjustments, and time-constrained item selection preferences. Empirical results show that the model’s cumulative reward closely matches human performance, and it demonstrates robust out-of-distribution generalization across novel foraging tasks. All experimental data and source code are publicly released. This work establishes a novel computational framework and empirical benchmark for studying reward-guided visual decision-making, advancing our understanding of how value and statistical regularity jointly shape oculomotor behavior.

Technology Category

Application Category

📝 Abstract
Imagine searching a collection of coins for quarters ($0.25$), dimes ($0.10$), nickels ($0.05$), and pennies ($0.01$)-a hybrid foraging task where observers look for multiple instances of multiple target types. In such tasks, how do target values and their prevalence influence foraging and eye movement behaviors (e.g., should you prioritize rare quarters or common nickels)? To explore this, we conducted human psychophysics experiments, revealing that humans are proficient reward foragers. Their eye fixations are drawn to regions with higher average rewards, fixation durations are longer on more valuable targets, and their cumulative rewards exceed chance, approaching the upper bound of optimal foragers. To probe these decision-making processes of humans, we developed a transformer-based Visual Forager (VF) model trained via reinforcement learning. Our VF model takes a series of targets, their corresponding values, and the search image as inputs, processes the images using foveated vision, and produces a sequence of eye movements along with decisions on whether to collect each fixated item. Our model outperforms all baselines, achieves cumulative rewards comparable to those of humans, and approximates human foraging behavior in eye movements and foraging biases within time-limited environments. Furthermore, stress tests on out-of-distribution tasks with novel targets, unseen values, and varying set sizes demonstrate the VF model's effective generalization. Our work offers valuable insights into the relationship between eye movements and decision-making, with our model serving as a powerful tool for further exploration of this connection. All data, code, and models are available at https://github.com/ZhangLab-DeepNeuroCogLab/visual-forager.
Problem

Research questions and friction points this paper is trying to address.

How target values and prevalence influence human foraging behavior
Developing a transformer model to mimic human eye movement decisions
Exploring generalization of AI model in novel foraging scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based Visual Forager model
Reinforcement learning trained decision-making
Foveated vision processes search images
B
Bo Wang
College of Computing and Data Science, Nanyang Technological University, Singapore; Deep NeuroCognition Lab, I2R and CFAR, Agency for Science, Technology and Research, Singapore; Harbin Institute of Technology, Harbin, China
D
Dingwei Tan
College of Computing and Data Science, Nanyang Technological University, Singapore; Deep NeuroCognition Lab, I2R and CFAR, Agency for Science, Technology and Research, Singapore; Beijing Institute of Technology, Beijing, China
Yen-Ling Kuo
Yen-Ling Kuo
University of Virginia
Artificial IntelligenceRoboticsHuman-AI/Robot Interaction
Z
Zhao-Yu Sun
Harbin Institute of Technology, Harbin, China
J
Jeremy M. Wolfe
Brigham and Women’s Hospital, USA; Harvard Medical School, USA
Tat-Jen Cham
Tat-Jen Cham
Nanyang Technological University
Computer Vision
Mengmi Zhang
Mengmi Zhang
Assistant professor and PI of Deep NeuroCognition Lab, Nanyang Technological University, Singapore
neuroscience-inspired AIcomputer visioncomputational neurosciencecognitive science