Incorporating Eye-Tracking Signals Into Multimodal Deep Visual Models For Predicting User Aesthetic Experience In Residential Interiors

📅 2026-01-23
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Predicting users’ subjective aesthetic experiences of residential interior design is highly challenging due to its strong individual variability and reliance on complex visual perception. This work proposes a dual-branch CNN-LSTM framework that, for the first time, incorporates eye-tracking signals—including gaze fixations and pupillary responses—as privileged information, enabling end-to-end multimodal fusion with visual features extracted from interior design videos. Notably, the model maintains strong performance even when deployed using only visual inputs at inference time, achieving accuracies of 72.2% and 66.8% on objective dimensions (e.g., lighting) and subjective dimensions (e.g., perceived relaxation), respectively—significantly outperforming existing video-based baselines. Ablation studies further reveal that pupillary responses contribute most substantially to the evaluation of objective attributes.

Technology Category

Application Category

📝 Abstract
Understanding how people perceive and evaluate interior spaces is essential for designing environments that promote well-being. However, predicting aesthetic experiences remains difficult due to the subjective nature of perception and the complexity of visual responses. This study introduces a dual-branch CNN-LSTM framework that fuses visual features with eye-tracking signals to predict aesthetic evaluations of residential interiors. We collected a dataset of 224 interior design videos paired with synchronized gaze data from 28 participants who rated 15 aesthetic dimensions. The proposed model attains 72.2% accuracy on objective dimensions (e.g., light) and 66.8% on subjective dimensions (e.g., relaxation), outperforming state-of-the-art video baselines and showing clear gains on subjective evaluation tasks. Notably, models trained with eye-tracking retain comparable performance when deployed with visual input alone. Ablation experiments further reveal that pupil responses contribute most to objective assessments, while the combination of gaze and visual cues enhances subjective evaluations. These findings highlight the value of incorporating eye-tracking as privileged information during training, enabling more practical tools for aesthetic assessment in interior design.
Problem

Research questions and friction points this paper is trying to address.

aesthetic experience
residential interiors
eye-tracking
subjective perception
visual response
Innovation

Methods, ideas, or system contributions that make the work stand out.

eye-tracking
multimodal deep learning
aesthetic prediction
CNN-LSTM
privileged information
🔎 Similar Papers
No similar papers found.
C
Chen-Ying Chien
Institute of Information Systems and Applications, National Tsing Hua University
Po-Chih Kuo
Po-Chih Kuo
National Tsing Hua University
Machine learningMedical image analysisBiomedical signal processing