Modeling Subjective Urban Perception with Human Gaze

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

237K/year
🤖 AI Summary
Existing approaches to urban perception modeling often overlook the role of visual attention mechanisms in shaping human subjective judgments, limiting their ability to authentically capture individuals’ experiential responses to urban environments. To address this gap, this work introduces Place Pulse-Gaze, the first multimodal dataset integrating street-view images, eye-tracking data, and individual perceptual labels. Building upon this resource, we propose a gaze-guided multimodal framework for urban perception modeling that, for the first time in urban computing, incorporates eye-tracking data to jointly leverage explicit semantic cues and implicit visual representations. Our experiments demonstrate that gaze behavior alone carries predictive signals sufficient for estimating subjective urban perceptions, and that combining gaze with scene-level features further enhances model performance. These findings underscore the critical value of incorporating human perceptual processes to improve the ecological validity of urban computational models.
📝 Abstract
Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.
Problem

Research questions and friction points this paper is trying to address.

urban perception
human gaze
street view images
perceptual process
subjective evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

gaze-guided modeling
urban perception
eye-tracking
multimodal fusion
Place Pulse-Gaze