ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images

πŸ“… 2024-07-31
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

205K/year
πŸ€– AI Summary
This work addresses the challenge of perceptual quality assessment for egocentric spatial images (ESIs) displayed on head-mounted displays (HMDs). To this end, we introduce ESIQADβ€”the first dedicated benchmark database comprising 500 ESIs, each annotated with subjective quality scores across three display modalities: 2D windowed, 3D windowed, and 3D immersive. We propose ESIQAnet, a multi-stage quality prediction model built upon Mamba2, which innovatively integrates visual state-space dual modeling (VSSD), cross-view cross-attention, and transposed attention to achieve unified and accurate quality modeling across heterogeneous display paradigms. Extensive experiments on ESIQAD demonstrate that ESIQAnet consistently outperforms 22 state-of-the-art image quality assessment (IQA) methods, achieving new SOTA performance across all three display modes. Both the ESIQAD database and the implementation code are publicly released to foster further research.

Technology Category

Application Category

πŸ“ Abstract
With the development of eXtended Reality (XR), photo capturing and display technology based on head-mounted displays (HMDs) have experienced significant advancements and gained considerable attention. Egocentric spatial images and videos are emerging as a compelling form of stereoscopic XR content. The assessment for the Quality of Experience (QoE) of XR content is important to ensure a high-quality viewing experience. Different from traditional 2D images, egocentric spatial images present challenges for perceptual quality assessment due to their special shooting, processing methods, and stereoscopic characteristics. However, the corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking. In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESIQAD), the first IQA database dedicated for egocentric spatial images as far as we know. Our ESIQAD includes 500 egocentric spatial images and the corresponding mean opinion scores (MOSs) under three display modes, including 2D display, 3D-window display, and 3D-immersive display. Based on our ESIQAD, we propose a novel mamba2-based multi-stage feature fusion model, termed ESIQAnet, which predicts the perceptual quality of egocentric spatial images under the three display modes. Specifically, we first extract features from multiple visual state space duality (VSSD) blocks, then apply cross attention to fuse binocular view information and use transposed attention to further refine the features. The multi-stage features are finally concatenated and fed into a quality regression network to predict the quality score. Extensive experimental results demonstrate that the ESIQAnet outperforms 22 state-of-the-art IQA models on the ESIQAD under all three display modes. The database and code are available at https://github.com/IntMeGroup/ESIQA.
Problem

Research questions and friction points this paper is trying to address.

Assessing quality of egocentric spatial images.
Developing a database for quality assessment.
Proposing a model for perceptual quality prediction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba2-based multi-stage fusion
Cross attention for binocular views
Transposed attention feature refinement
πŸ”Ž Similar Papers
No similar papers found.