WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Existing methods suffer significant degradation in 3D hand reconstruction quality on real-world monocular videos due to disturbances such as hand-object interactions, extreme poses, illumination variations, and motion blur. This work proposes an optimization-based 3D Gaussian splatting framework for hand reconstruction, which explicitly models these disturbances as time-varying biases in the attributes of 3D Gaussians. To mitigate their impact, we introduce a dynamic disturbance decoupling module and a disturbance-aware optimization strategy, complemented by per-frame anisotropic weighted masks that adaptively suppress perturbations in both spatial and temporal dimensions. Evaluated on a newly collected dataset and two public benchmarks, our method achieves state-of-the-art performance, yielding up to a 15.8% improvement in PSNR and a 23.1% reduction in LPIPS.

Technology Category

Application Category

📝 Abstract

Despite recent progress in 3D hand reconstruction from monocular videos, most existing methods rely on data captured in well-controlled environments and therefore degrade in real-world settings with severe perturbations, such as hand-object interactions, extreme poses, illumination changes, and motion blur. To tackle these issues, we introduce WildGHand, an optimization-based framework that enables self-adaptive 3D Gaussian splatting on in-the-wild videos and produces high-fidelity hand avatars. WildGHand incorporates two key components: (i) a dynamic perturbation disentanglement module that explicitly represents perturbations as time-varying biases on 3D Gaussian attributes during optimization, and (ii) a perturbation-aware optimization strategy that generates per-frame anisotropic weighted masks to guide optimization. Together, these components allow the framework to identify and suppress perturbations across both spatial and temporal dimensions. We further curate a dataset of monocular hand videos captured under diverse perturbations to benchmark in-the-wild hand avatar reconstruction. Extensive experiments on this dataset and two public datasets demonstrate that WildGHand achieves state-of-the-art performance and substantially improves over its base model across multiple metrics (e.g., up to a $15.8\%$ relative gain in PSNR and a $23.1\%$ relative reduction in LPIPS). Our implementation and dataset are available at https://github.com/XuanHuang0/WildGHand.

Problem

Research questions and friction points this paper is trying to address.

3D hand reconstruction

in-the-wild videos

perturbations

hand avatars

monocular video

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

in-the-wild hand reconstruction

perturbation disentanglement

monocular video

hand avatar

🔎 Similar Papers

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

2024-09-18arXiv.orgCitations: 8