🤖 AI Summary
This study investigates how target value and frequency jointly modulate eye-movement decisions and behavioral strategies in human mixed visual foraging. Combining psychophysical experiments with computational modeling, we introduce the first Transformer-based Visual Foraging (VF) model that integrates foveated vision and reinforcement learning. The VF model explicitly encodes a value–frequency trade-off mechanism, enabling high-fidelity replication of human gaze biases, dynamic fixation duration adjustments, and time-constrained item selection preferences. Empirical results show that the model’s cumulative reward closely matches human performance, and it demonstrates robust out-of-distribution generalization across novel foraging tasks. All experimental data and source code are publicly released. This work establishes a novel computational framework and empirical benchmark for studying reward-guided visual decision-making, advancing our understanding of how value and statistical regularity jointly shape oculomotor behavior.
📝 Abstract
Imagine searching a collection of coins for quarters ($0.25$), dimes ($0.10$), nickels ($0.05$), and pennies ($0.01$)-a hybrid foraging task where observers look for multiple instances of multiple target types. In such tasks, how do target values and their prevalence influence foraging and eye movement behaviors (e.g., should you prioritize rare quarters or common nickels)? To explore this, we conducted human psychophysics experiments, revealing that humans are proficient reward foragers. Their eye fixations are drawn to regions with higher average rewards, fixation durations are longer on more valuable targets, and their cumulative rewards exceed chance, approaching the upper bound of optimal foragers. To probe these decision-making processes of humans, we developed a transformer-based Visual Forager (VF) model trained via reinforcement learning. Our VF model takes a series of targets, their corresponding values, and the search image as inputs, processes the images using foveated vision, and produces a sequence of eye movements along with decisions on whether to collect each fixated item. Our model outperforms all baselines, achieves cumulative rewards comparable to those of humans, and approximates human foraging behavior in eye movements and foraging biases within time-limited environments. Furthermore, stress tests on out-of-distribution tasks with novel targets, unseen values, and varying set sizes demonstrate the VF model's effective generalization. Our work offers valuable insights into the relationship between eye movements and decision-making, with our model serving as a powerful tool for further exploration of this connection. All data, code, and models are available at https://github.com/ZhangLab-DeepNeuroCogLab/visual-forager.