🤖 AI Summary
Omnidirectional videos projected via equirectangular projection (ERP) suffer from severe geometric distortion, hindering existing video inpainting methods from preserving spatiotemporal consistency and geometric continuity. To address this, we propose a distortion-aware deep learning inpainting framework. First, we design a spherical geodesic distance-based temporal motion modeling module to explicitly capture the nonlinear motion inherent in panoramic video. Second, we introduce a depth-estimation-guided feature propagation mechanism to compensate for ERP-induced distortion directly in feature space. Finally, we formulate an end-to-end jointly optimized framework. Our method is the first to holistically reconcile geometric distortion, depth structure, and temporal dynamics at the feature level, significantly improving inpainting quality—especially in wide-field-of-view distorted regions. Experiments demonstrate superior performance over state-of-the-art methods in PSNR, LPIPS, and visual realism, with enhanced robustness and spatiotemporal coherence.
📝 Abstract
Omnidirectional videos that capture the entire surroundings are employed in a variety of fields such as VR applications and remote sensing. However, their wide field of view often causes unwanted objects to appear in the videos. This problem can be addressed by video inpainting, which enables the natural removal of such objects while preserving both spatial and temporal consistency. Nevertheless, most existing methods assume processing ordinary videos with a narrow field of view and do not tackle the distortion in equirectangular projection of omnidirectional videos. To address this issue, this paper proposes a novel deep learning model for omnidirectional video inpainting, called Distortion-Aware Omnidirectional Video Inpainting (DAOVI). DAOVI introduces a module that evaluates temporal motion information in the image space considering geodesic distance, as well as a depth-aware feature propagation module in the feature space that is designed to address the geometric distortion inherent to omnidirectional videos. The experimental results demonstrate that our proposed method outperforms existing methods both quantitatively and qualitatively.