Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging

📅 2024-11-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This study investigates how target value and frequency jointly modulate eye-movement decisions and behavioral strategies in human mixed visual foraging. Combining psychophysical experiments with computational modeling, we introduce the first Transformer-based Visual Foraging (VF) model that integrates foveated vision and reinforcement learning. The VF model explicitly encodes a value–frequency trade-off mechanism, enabling high-fidelity replication of human gaze biases, dynamic fixation duration adjustments, and time-constrained item selection preferences. Empirical results show that the model’s cumulative reward closely matches human performance, and it demonstrates robust out-of-distribution generalization across novel foraging tasks. All experimental data and source code are publicly released. This work establishes a novel computational framework and empirical benchmark for studying reward-guided visual decision-making, advancing our understanding of how value and statistical regularity jointly shape oculomotor behavior.

Technology Category

Application Category

📝 Abstract

Imagine searching a collection of coins for quarters ($0.25$), dimes ($0.10$), nickels ($0.05$), and pennies ($0.01$)-a hybrid foraging task where observers look for multiple instances of multiple target types. In such tasks, how do target values and their prevalence influence foraging and eye movement behaviors (e.g., should you prioritize rare quarters or common nickels)? To explore this, we conducted human psychophysics experiments, revealing that humans are proficient reward foragers. Their eye fixations are drawn to regions with higher average rewards, fixation durations are longer on more valuable targets, and their cumulative rewards exceed chance, approaching the upper bound of optimal foragers. To probe these decision-making processes of humans, we developed a transformer-based Visual Forager (VF) model trained via reinforcement learning. Our VF model takes a series of targets, their corresponding values, and the search image as inputs, processes the images using foveated vision, and produces a sequence of eye movements along with decisions on whether to collect each fixated item. Our model outperforms all baselines, achieves cumulative rewards comparable to those of humans, and approximates human foraging behavior in eye movements and foraging biases within time-limited environments. Furthermore, stress tests on out-of-distribution tasks with novel targets, unseen values, and varying set sizes demonstrate the VF model's effective generalization. Our work offers valuable insights into the relationship between eye movements and decision-making, with our model serving as a powerful tool for further exploration of this connection. All data, code, and models are available at https://github.com/ZhangLab-DeepNeuroCogLab/visual-forager.

Problem

Research questions and friction points this paper is trying to address.

How target values and prevalence influence human foraging behavior

Developing a transformer model to mimic human eye movement decisions

Exploring generalization of AI model in novel foraging scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based Visual Forager model

Reinforcement learning trained decision-making

Foveated vision processes search images

🔎 Similar Papers

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

2024-10-02arXiv.orgCitations: 1