Fast-HaMeR: Boosting Hand Mesh Reconstruction using Knowledge Distillation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of deploying high-accuracy 3D hand mesh reconstruction models—such as HaMeR—on resource-constrained devices like VR/AR headsets and smartphones, where their substantial computational demands are prohibitive. We present the first systematic evaluation of lightweight backbone architectures (including MobileNet, MobileViT, ConvNeXt, and ResNet) combined with multi-level knowledge distillation strategies—spanning output, feature, and hybrid levels—to effectively transfer knowledge from HaMeR to compact student models. Our approach achieves a compelling trade-off between accuracy and efficiency: with only a marginal increase of 0.4 mm in reconstruction error, the distilled models reduce parameter count to 35% of the original and accelerate inference by 1.5×, thereby enabling real-time, low-power deployment without significant performance degradation.

Technology Category

Application Category

📝 Abstract

Fast and accurate 3D hand reconstruction is essential for real-time applications in VR/AR, human-computer interaction, robotics, and healthcare. Most state-of-the-art methods rely on heavy models, limiting their use on resource-constrained devices like headsets, smartphones, and embedded systems. In this paper, we investigate how the use of lightweight neural networks, combined with Knowledge Distillation, can accelerate complex 3D hand reconstruction models by making them faster and lighter, while maintaining comparable reconstruction accuracy. While our approach is suited for various hand reconstruction frameworks, we focus primarily on boosting the HaMeR model, currently the leading method in terms of reconstruction accuracy. We replace its original ViT-H backbone with lighter alternatives, including MobileNet, MobileViT, ConvNeXt, and ResNet, and evaluate three knowledge distillation strategies: output-level, feature-level, and a hybrid of both. Our experiments show that using lightweight backbones that are only 35% the size of the original achieves 1.5x faster inference speed while preserving similar performance quality with only a minimal accuracy difference of 0.4mm. More specifically, we show how output-level distillation notably improves student performance, while feature-level distillation proves more effective for higher-capacity students. Overall, the findings pave the way for efficient real-world applications on low-power devices. The code and models are publicly available under https://github.com/hunainahmedj/Fast-HaMeR.

Problem

Research questions and friction points this paper is trying to address.

Hand Mesh Reconstruction

Real-time Applications

Resource-constrained Devices

Model Efficiency

3D Hand Reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Distillation

Hand Mesh Reconstruction

Lightweight Neural Networks

Model Compression

Real-time 3D Hand Tracking

🔎 Similar Papers

No similar papers found.

Authors to Follow