🤖 AI Summary
This work addresses the challenges of reconstructing high-fidelity 3D hand models from first-person monocular videos, which include insufficient geometric detail, complex hand-object interactions, and high computational costs. The authors propose a novel approach based on a deformable 2D Gaussian surfel representation, initializing surfels via mesh-aligned Steiner in-ellipses and employing fractal densification to achieve high-resolution geometry. They introduce an opacity mask optimization decoupled from density adaptation to jointly refine geometry and texture, and design a surfel attribute residual prediction mechanism to capture dynamic and personalized features. A two-stage training strategy combined with a binding loss further enhances robustness. Evaluated on ARCTIC, Hand Appearance, and InterHand2.6M datasets, the method significantly outperforms existing approaches, achieving high-quality, efficient hand reconstruction and photorealistic rendering.
📝 Abstract
Reconstructing high-fidelity 3D hands from egocentric monocular videos remains a challenge due to the limitations in capturing high-resolution geometry, hand-object interactions, and complex objects on hands. Additionally, existing methods often incur high computational costs, making them impractical for real-time applications. In this work, we propose Mesh-inellipse Aligned deformable Surfel Splatting (MASS) to address these challenges by leveraging a deformable 2D Gaussian Surfel representation. We introduce the mesh-aligned Steiner Inellipse and fractal densification for mesh-to-surfel conversion that initiates high-resolution 2D Gaussian surfels from coarse parametric hand meshes, providing surface representation with photorealistic rendering potential. Second, we propose Gaussian Surfel Deformation, which enables efficient modeling of hand deformations and personalized features by predicting residual updates to surfel attributes and introducing an opacity mask to refine geometry and texture without adaptive density control. In addition, we propose a two-stage training strategy and a novel binding loss to improve the optimization robustness and reconstruction quality. Extensive experiments on the ARCTIC dataset, the Hand Appearance dataset, and the Interhand2.6M dataset demonstrate that our model achieves superior reconstruction performance compared to state-of-the-art methods.