🤖 AI Summary
Existing volumetric video representations—such as point clouds and neural radiance fields—entail prohibitive computational and storage overheads, hindering real-time rendering and scalable distribution. To address this, we propose Content-Promoted Scene Layers (CPSL), a lightweight 2.5D video representation. CPSL decomposes each frame into geometrically consistent, multi-layer 2D assets via depth-guided segmentation and saliency-aware layering. It introduces soft alpha bands and edge-aligned depth caching to preserve occlusion relationships and boundary continuity, and employs motion-guided propagation for inter-frame consistency. Rendering leverages depth-weighted warping, bidirectional alpha compositing, and per-layer encoding—ensuring full compatibility with standard video codecs. Evaluated on multiple benchmarks, CPSL outperforms both layered and neural field-based methods: it delivers superior visual fidelity, sharper object boundaries, and reduces storage and rendering costs by an order of magnitude—enabling real-time playback.
📝 Abstract
Volumetric video enables immersive and interactive visual experiences by supporting free viewpoint exploration and realistic motion parallax. However, existing volumetric representations from explicit point clouds to implicit neural fields, remain costly in capture, computation, and rendering, which limits their scalability for on-demand video and reduces their feasibility for real-time communication. To bridge this gap, we propose Content-Promoted Scene Layers (CPSL), a compact 2.5D video representation that brings the perceptual benefits of volumetric video to conventional 2D content. Guided by per-frame depth and content saliency, CPSL decomposes each frame into a small set of geometry-consistent layers equipped with soft alpha bands and an edge-depth cache that jointly preserve occlusion ordering and boundary continuity. These lightweight, 2D-encodable assets enable parallax-corrected novel-view synthesis via depth-weighted warping and front-to-back alpha compositing, bypassing expensive 3D reconstruction. Temporally, CPSL maintains inter-frame coherence using motion-guided propagation and per-layer encoding, supporting real-time playback with standard video codecs. Across multiple benchmarks, CPSL achieves superior perceptual quality and boundary fidelity compared with layer-based and neural-field baselines while reducing storage and rendering cost by several folds. Our approach offer a practical path from 2D video to scalable 2.5D immersive media.