AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation

πŸ“… 2026-03-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing hand pose datasets are limited in scale, diversity, occlusion handling, arm geometry representation, and RGB-D alignment, hindering model performance and generalization. To address these limitations, this work introduces AnyHand, a large-scale synthetic dataset comprising 2.5 million single-hand and 4.1 million hand–object interaction physically realistic RGB-D images, uniquely providing occlusion annotations, full-arm geometry, and precisely aligned depth data. Furthermore, the authors propose a lightweight, plug-and-play depth fusion module that effectively integrates multimodal features without requiring fine-tuning. Experiments demonstrate that the proposed approach significantly outperforms existing methods on FreiHAND and HO-3D, while exhibiting strong generalization to the unseen domain HO-Cap. Notably, the RGB-D model achieves state-of-the-art performance on HO-3D.

Technology Category

Application Category

πŸ“ Abstract
We present AnyHand, a large-scale synthetic dataset designed to advance the state of the art in 3D hand pose estimation from both RGB-only and RGB-D inputs. While recent works with foundation approaches have shown that an increase in the quantity and diversity of training data can markedly improve performance and robustness in hand pose estimation, existing real-world-collected datasets on this task are limited in coverage, and prior synthetic datasets rarely provide occlusions, arm details, and aligned depth together at scale. To address this bottleneck, our AnyHand contains 2.5M single-hand and 4.1M hand-object interaction RGB-D images, with rich geometric annotations. In the RGB-only setting, we show that extending the original training sets of existing baselines with AnyHand yields significant gains on multiple benchmarks (FreiHAND and HO-3D), even when keeping the architecture and training scheme fixed. More impressively, the model trained with AnyHand shows stronger generalization to the out-of-domain HO-Cap dataset, without any fine-tuning. We also contribute a lightweight depth fusion module that can be easily integrated into existing RGB-based models. Trained with AnyHand, the resulting RGB-D model achieves superior performance on the HO-3D benchmark, showing the benefits of depth integration and the effectiveness of our synthetic data.
Problem

Research questions and friction points this paper is trying to address.

hand pose estimation
synthetic dataset
RGB-D
occlusion
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic dataset
hand pose estimation
RGB-D fusion
occlusion modeling
domain generalization
πŸ”Ž Similar Papers
No similar papers found.