REACH: Hand Pose Estimation from Room Corners

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses the challenging problem of 3D hand pose estimation under extreme conditions—such as long distances, low resolution, and frequent occlusions—typical of corner-mounted room cameras. The authors propose REACH-Net, the first method to jointly leverage multi-view observations, temporal autoregressive modeling, and hand–body coordination priors. Built upon a Transformer architecture, REACH-Net explicitly models spatial relationships between hands and the body through view-specific feature tokens and captures temporal coherence via autoregressive prediction. To support this research, the authors also introduce REACH, the first large-scale 3D hand pose dataset captured during everyday activities using an unobtrusive chest-mounted camera system. Experiments demonstrate that REACH-Net significantly outperforms existing approaches in realistic, complex scenarios, advancing the practical deployment of continuous human behavior analysis.
📝 Abstract
We introduce a novel 3D hand pose estimator that can accurately recover the shape and pose of people's hands in a room from afar, typically from fixed cameras at room corners, in extremely low-resolution and frequently occluded views. Our key idea is to fully leverage hand-body coordination, its temporal progression, and multiview observations. We achieve this with a novel Transformer-based model, in which hand and body configurations are modeled through correlations between their visual features expressed as per-view tokens, and their temporal coordination is exploited in an autoregressive manner. We introduce a novel dataset, which we refer to as REACH, Room-Environment dataset Annotated with Chest cameras for Hand pose estimation, to train and test our method. REACH is a first-of-its-kind large-scale hand pose dataset that captures accurate hand movements of 50 participants across a wide variety of daily activities. In order to avoid interfering with natural movements while annotating the hands with accurate shape and pose, we leverage concealed chest cameras. Through extensive experiments, including comparative studies with existing methods, we show that our model, REACH-Net, achieves highly accurate 3D hand pose estimation from afar. These results broaden the horizon of 3D hand pose estimation, especially towards "in-the-wild" continuous human behavior analysis.
Problem

Research questions and friction points this paper is trying to address.

hand pose estimation
low-resolution
occlusion
3D reconstruction
in-the-wild
Innovation

Methods, ideas, or system contributions that make the work stand out.

hand pose estimation
Transformer-based model
multiview fusion
temporal autoregressive modeling
in-the-wild dataset
🔎 Similar Papers