🤖 AI Summary
In holographic video streaming, users typically view only a localized viewport, yet existing viewport prediction methods suffer from low accuracy and remain in their infancy. Method: This paper pioneers the first systematic investigation of volumetric video viewport prediction. We propose a dynamic prediction framework integrating spatiotemporal saliency—incorporating geometric structure, chromatic contrast, and motion cues—with historical trajectory modeling, and introduce Uniform Random Sampling (URS) to substantially reduce computational overhead. An adaptive multi-source information fusion mechanism further enhances prediction robustness. Results: Extensive experiments on mainstream holographic video datasets demonstrate that our method achieves significantly higher prediction accuracy than state-of-the-art baselines, effectively improving bandwidth utilization and Quality of Experience (QoE) for VR/AR/MR applications in 5G networks. The source code and dataset will be made publicly available.
📝 Abstract
Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance.