🤖 AI Summary
Predicting users’ subjective aesthetic experiences of residential interior design is highly challenging due to its strong individual variability and reliance on complex visual perception. This work proposes a dual-branch CNN-LSTM framework that, for the first time, incorporates eye-tracking signals—including gaze fixations and pupillary responses—as privileged information, enabling end-to-end multimodal fusion with visual features extracted from interior design videos. Notably, the model maintains strong performance even when deployed using only visual inputs at inference time, achieving accuracies of 72.2% and 66.8% on objective dimensions (e.g., lighting) and subjective dimensions (e.g., perceived relaxation), respectively—significantly outperforming existing video-based baselines. Ablation studies further reveal that pupillary responses contribute most substantially to the evaluation of objective attributes.
📝 Abstract
Understanding how people perceive and evaluate interior spaces is essential for designing environments that promote well-being. However, predicting aesthetic experiences remains difficult due to the subjective nature of perception and the complexity of visual responses. This study introduces a dual-branch CNN-LSTM framework that fuses visual features with eye-tracking signals to predict aesthetic evaluations of residential interiors. We collected a dataset of 224 interior design videos paired with synchronized gaze data from 28 participants who rated 15 aesthetic dimensions. The proposed model attains 72.2% accuracy on objective dimensions (e.g., light) and 66.8% on subjective dimensions (e.g., relaxation), outperforming state-of-the-art video baselines and showing clear gains on subjective evaluation tasks. Notably, models trained with eye-tracking retain comparable performance when deployed with visual input alone. Ablation experiments further reveal that pupil responses contribute most to objective assessments, while the combination of gaze and visual cues enhances subjective evaluations. These findings highlight the value of incorporating eye-tracking as privileged information during training, enabling more practical tools for aesthetic assessment in interior design.