EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of 3D hand reconstruction from first-person views, which is hindered by depth ambiguity, self-occlusion, and complex hand-object interactions. To this end, we propose the first in-context learning (ICL)-based reconstruction framework, featuring a vision-language model (VLM)-guided exemplar retrieval mechanism, a dedicated tokenizer for multimodal context, a Masked Autoencoder (MAE)-based architecture, and a hand geometry-aware training objective. These components jointly enable semantic alignment and visual consistency modeling. Evaluated on the ARCTIC and EgoExo4D datasets, our method significantly outperforms existing state-of-the-art approaches and demonstrates strong generalization to unseen real-world scenarios. Furthermore, it enhances EgoVLM’s reasoning capability regarding hand-object interactions.

Technology Category

Application Category

📝 Abstract
Robust 3D hand reconstruction in egocentric vision is challenging due to depth ambiguity, self-occlusion, and complex hand-object interactions. Prior methods mitigate these issues by scaling training data or adding auxiliary cues, but they often struggle in unseen contexts. We present EgoHandICL, the first in-context learning (ICL) framework for 3D hand reconstruction that improves semantic alignment, visual consistency, and robustness under challenging egocentric conditions. EgoHandICL introduces complementary exemplar retrieval guided by vision-language models (VLMs), an ICL-tailored tokenizer for multimodal context, and a masked autoencoder (MAE)-based architecture trained with hand-guided geometric and perceptual objectives. Experiments on ARCTIC and EgoExo4D show consistent gains over state-of-the-art methods. We also demonstrate real-world generalization and improve EgoVLM hand-object interaction reasoning by using reconstructed hands as visual prompts. Code and data: https://github.com/Nicous20/EgoHandICL
Problem

Research questions and friction points this paper is trying to address.

egocentric vision
3D hand reconstruction
depth ambiguity
self-occlusion
hand-object interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning
Egocentric Vision
3D Hand Reconstruction
Vision-Language Models
Masked Autoencoder
🔎 Similar Papers
No similar papers found.
Binzhu Xie
Binzhu Xie
The Chinese University of Hong Kong
CVMultimodal AI
S
Shi Qiu
Department of Computer Science and Engineering, The Chinese University of Hong Kong; Institute of Medical Intelligence and XR, The Chinese University of Hong Kong
Sicheng Zhang
Sicheng Zhang
Khalifa University
Artificial IntelligenceComputer Vision
Y
Yinqiao Wang
Department of Computer Science and Engineering, The Chinese University of Hong Kong; Institute of Medical Intelligence and XR, The Chinese University of Hong Kong
Hao Xu
Hao Xu
CUHK
Computer GraphicsComputer Vision
Muzammal Naseer
Muzammal Naseer
Asst. Professor, Khalifa University
Multi-modal LearningAI Safety and Reliability
C
Chi-Wing Fu
Department of Computer Science and Engineering, The Chinese University of Hong Kong; Institute of Medical Intelligence and XR, The Chinese University of Hong Kong
P
Pheng-Ann Heng
Department of Computer Science and Engineering, The Chinese University of Hong Kong; Institute of Medical Intelligence and XR, The Chinese University of Hong Kong