Fast-HaMeR: Boosting Hand Mesh Reconstruction using Knowledge Distillation

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying high-accuracy 3D hand mesh reconstruction models—such as HaMeR—on resource-constrained devices like VR/AR headsets and smartphones, where their substantial computational demands are prohibitive. We present the first systematic evaluation of lightweight backbone architectures (including MobileNet, MobileViT, ConvNeXt, and ResNet) combined with multi-level knowledge distillation strategies—spanning output, feature, and hybrid levels—to effectively transfer knowledge from HaMeR to compact student models. Our approach achieves a compelling trade-off between accuracy and efficiency: with only a marginal increase of 0.4 mm in reconstruction error, the distilled models reduce parameter count to 35% of the original and accelerate inference by 1.5×, thereby enabling real-time, low-power deployment without significant performance degradation.

Technology Category

Application Category

📝 Abstract
Fast and accurate 3D hand reconstruction is essential for real-time applications in VR/AR, human-computer interaction, robotics, and healthcare. Most state-of-the-art methods rely on heavy models, limiting their use on resource-constrained devices like headsets, smartphones, and embedded systems. In this paper, we investigate how the use of lightweight neural networks, combined with Knowledge Distillation, can accelerate complex 3D hand reconstruction models by making them faster and lighter, while maintaining comparable reconstruction accuracy. While our approach is suited for various hand reconstruction frameworks, we focus primarily on boosting the HaMeR model, currently the leading method in terms of reconstruction accuracy. We replace its original ViT-H backbone with lighter alternatives, including MobileNet, MobileViT, ConvNeXt, and ResNet, and evaluate three knowledge distillation strategies: output-level, feature-level, and a hybrid of both. Our experiments show that using lightweight backbones that are only 35% the size of the original achieves 1.5x faster inference speed while preserving similar performance quality with only a minimal accuracy difference of 0.4mm. More specifically, we show how output-level distillation notably improves student performance, while feature-level distillation proves more effective for higher-capacity students. Overall, the findings pave the way for efficient real-world applications on low-power devices. The code and models are publicly available under https://github.com/hunainahmedj/Fast-HaMeR.
Problem

Research questions and friction points this paper is trying to address.

Hand Mesh Reconstruction
Real-time Applications
Resource-constrained Devices
Model Efficiency
3D Hand Reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Distillation
Hand Mesh Reconstruction
Lightweight Neural Networks
Model Compression
Real-time 3D Hand Tracking
🔎 Similar Papers
No similar papers found.
H
Hunain Ahmed Jillani
RPTU, Kaiserslautern, Germany
A
Ahmed Tawfik Aboukhadra
RPTU, Kaiserslautern, Germany; DFKI-AV, Kaiserslautern, Germany
A
Ahmed Elhayek
UPM, Saudi Arabia
J
Jameel Malik
NUST-SEECS, Islamabad, Pakistan
N
Nadia Robertini
DFKI-AV, Kaiserslautern, Germany
Didier Stricker
Didier Stricker
Professor for Computer Science, University Kaiserslautern
augmented realitycomputer visionimage processingbody sensor networkshci