Beyond Scanpaths: Graph-Based Gaze Simulation in Dynamic Scenes

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

Existing attention models struggle to explicitly capture the temporal dynamics of human gaze in dynamic scenes, often relying on saliency maps or scanpaths for implicit modeling. This work proposes an autoregressive dynamical system that formulates gaze trajectories as a generative process jointly driven by gaze history and the evolving environment, centered around a novel gaze-centric heterogeneous graph structure. To this end, we introduce the Heterogeneous Graph Transformer (ART) and the Object Density Network (ODN), enabling, for the first time, unified modeling of natural gaze trajectories, scanpaths, and saliency maps directly from unfiltered raw gaze data. Evaluated on the newly released Focus100 driving gaze dataset, our model, trained end-to-end, generates more naturalistic gaze trajectories while significantly improving the dynamic fidelity of scanpaths and the accuracy of saliency predictions, thereby achieving precise modeling of the temporal characteristics of human attention in dynamic environments.

Technology Category

Application Category

📝 Abstract

Accurately modelling human attention is essential for numerous computer vision applications, particularly in the domain of automotive safety. Existing methods typically collapse gaze into saliency maps or scanpaths, treating gaze dynamics only implicitly. We instead formulate gaze modelling as an autoregressive dynamical system and explicitly unroll raw gaze trajectories over time, conditioned on both gaze history and the evolving environment. Driving scenes are represented as gaze-centric graphs processed by the Affinity Relation Transformer (ART), a heterogeneous graph transformer that models interactions between driver gaze, traffic objects, and road structure. We further introduce the Object Density Network (ODN) to predict next-step gaze distributions, capturing the stochastic and object-centric nature of attentional shifts in complex environments. We also release Focus100, a new dataset of raw gaze data from 30 participants viewing egocentric driving footage. Trained directly on raw gaze, without fixation filtering, our unified approach produces more natural gaze trajectories, scanpath dynamics, and saliency maps than existing attention models, offering valuable insights for the temporal modelling of human attention in dynamic environments.

Problem

Research questions and friction points this paper is trying to address.

gaze modeling

dynamic scenes

attention dynamics

scanpaths

saliency maps

Innovation

Methods, ideas, or system contributions that make the work stand out.

gaze modeling

graph transformer

autoregressive dynamics