🤖 AI Summary
This study addresses the critical lack of real-world datasets and models for visual attention prediction in outdoor navigation. To bridge this gap, we introduce EgoCampus—the first large-scale, egocentric eye-tracking dataset captured in realistic campus environments, comprising over 80 participants and 6 km of diverse outdoor routes. It is the first to systematically collect and annotate multimodal sensor data—including gaze trajectories, forward-facing RGB video, IMU, and GPS—using Meta Project Aria smart glasses. Building upon this dataset, we propose EgoCampusNet, a novel deep learning architecture that explicitly models task-driven gaze behavior by fusing spatiotemporal visual features with inertial and geospatial sensor modalities. Experiments demonstrate that EgoCampusNet significantly outperforms existing methods in complex, dynamic outdoor scenes. This work fills a fundamental gap in naturalistic outdoor navigation eye-movement modeling and establishes both a foundational benchmark dataset and a scalable technical framework for embodied intelligence and human–robot collaborative navigation.
📝 Abstract
We address the challenge of predicting human visual attention during real-world navigation by measuring and modeling egocentric pedestrian eye gaze in an outdoor campus setting. We introduce the EgoCampus dataset, which spans 25 unique outdoor paths over 6 km across a university campus with recordings from more than 80 distinct human pedestrians, resulting in a diverse set of gaze-annotated videos. The system used for collection, Meta's Project Aria glasses, integrates eye tracking, front-facing RGB cameras, inertial sensors, and GPS to provide rich data from the human perspective. Unlike many prior egocentric datasets that focus on indoor tasks or exclude eye gaze information, our work emphasizes visual attention while subjects walk in outdoor campus paths. Using this data, we develop EgoCampusNet, a novel method to predict eye gaze of navigating pedestrians as they move through outdoor environments. Our contributions provide both a new resource for studying real-world attention and a resource for future work in gaze prediction models for navigation. Dataset and code are available upon request, and will be made publicly available at a later date at https://github.com/ComputerVisionRutgers/EgoCampus .