🤖 AI Summary
Head-mounted displays (HMDs) occlude the upper face, severely degrading video-based facial expression and gaze estimation in social XR and impairing visual communication quality. To address this, we propose the first geometry-aware, single-view RGB video inpainting framework that jointly removes HMD occlusions and reconstructs complete 3D facial geometry. Our method innovatively unifies video inpainting with 3D face modeling: dense facial landmarks serve as geometric priors to guide inpainting; a per-frame unoccluded reference mechanism ensures identity consistency; and SynergyNet regresses 3D Morphable Model (3DMM) parameters while a GAN-based network synthesizes photorealistic textures—all optimized end-to-end via geometry-aware losses. Experiments demonstrate robustness across varying landmark densities, significantly improving inpainting fidelity and 3D geometric accuracy. The approach enhances immersion and interaction naturalness in social XR applications.
📝 Abstract
Head-mounted displays (HMDs) are essential for experiencing extended reality (XR) environments and observing virtual content. However, they obscure the upper part of the user's face, complicating external video recording and significantly impacting social XR applications such as teleconferencing, where facial expressions and eye gaze details are crucial for creating an immersive experience. This study introduces a geometry-aware learning-based framework to jointly remove HMD occlusions and reconstruct complete 3D facial geometry from RGB frames captured from a single viewpoint. The method integrates a GAN-based video inpainting network, guided by dense facial landmarks and a single occlusion-free reference frame, to restore missing facial regions while preserving identity. Subsequently, a SynergyNet-based module regresses 3D Morphable Model (3DMM) parameters from the inpainted frames, enabling accurate 3D face reconstruction. Dense landmark optimization is incorporated throughout the pipeline to improve both the inpainting quality and the fidelity of the recovered geometry. Experimental results demonstrate that the proposed framework can successfully remove HMDs from RGB facial videos while maintaining facial identity and realism, producing photorealistic 3D face geometry outputs. Ablation studies further show that the framework remains robust across different landmark densities, with only minor quality degradation under sparse landmark configurations.