WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods suffer significant degradation in 3D hand reconstruction quality on real-world monocular videos due to disturbances such as hand-object interactions, extreme poses, illumination variations, and motion blur. This work proposes an optimization-based 3D Gaussian splatting framework for hand reconstruction, which explicitly models these disturbances as time-varying biases in the attributes of 3D Gaussians. To mitigate their impact, we introduce a dynamic disturbance decoupling module and a disturbance-aware optimization strategy, complemented by per-frame anisotropic weighted masks that adaptively suppress perturbations in both spatial and temporal dimensions. Evaluated on a newly collected dataset and two public benchmarks, our method achieves state-of-the-art performance, yielding up to a 15.8% improvement in PSNR and a 23.1% reduction in LPIPS.

Technology Category

Application Category

📝 Abstract
Despite recent progress in 3D hand reconstruction from monocular videos, most existing methods rely on data captured in well-controlled environments and therefore degrade in real-world settings with severe perturbations, such as hand-object interactions, extreme poses, illumination changes, and motion blur. To tackle these issues, we introduce WildGHand, an optimization-based framework that enables self-adaptive 3D Gaussian splatting on in-the-wild videos and produces high-fidelity hand avatars. WildGHand incorporates two key components: (i) a dynamic perturbation disentanglement module that explicitly represents perturbations as time-varying biases on 3D Gaussian attributes during optimization, and (ii) a perturbation-aware optimization strategy that generates per-frame anisotropic weighted masks to guide optimization. Together, these components allow the framework to identify and suppress perturbations across both spatial and temporal dimensions. We further curate a dataset of monocular hand videos captured under diverse perturbations to benchmark in-the-wild hand avatar reconstruction. Extensive experiments on this dataset and two public datasets demonstrate that WildGHand achieves state-of-the-art performance and substantially improves over its base model across multiple metrics (e.g., up to a $15.8\%$ relative gain in PSNR and a $23.1\%$ relative reduction in LPIPS). Our implementation and dataset are available at https://github.com/XuanHuang0/WildGHand.
Problem

Research questions and friction points this paper is trying to address.

3D hand reconstruction
in-the-wild videos
perturbations
hand avatars
monocular video
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
in-the-wild hand reconstruction
perturbation disentanglement
monocular video
hand avatar
🔎 Similar Papers
No similar papers found.
Hanhui Li
Hanhui Li
Sun Yat-sen University
Deep LearningComputer Vision
X
Xuan Huang
School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, P.R. China, 518107
Wanquan Liu
Wanquan Liu
Sun Yat-sen University
Computer visionIntelligent controlPattern recognition
Yuhao Cheng
Yuhao Cheng
Lenovo Research
LLMComputer VisionAI
L
Long Chen
Lenovo Research Group, Shenzhen, P.R. China, 518038
Yiqiang Yan
Yiqiang Yan
Lenovo
Xiaodan Liang
Xiaodan Liang
Professor of Computer Science, Sun Yat-sen University, MBZUAI, CMU, NUS
Computer visionEmbodied AIMachine learning
C
Chenqiang Gao
School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, P.R. China, 518107