HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

📅 2024-05-02

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This work addresses the challenging problem of hand mesh reconstruction from a single RGB image under severe occlusion (e.g., hand-object interaction). To this end, we introduce state space models (SSMs) to this task for the first time, proposing a state space-driven spatial-channel joint attention module that simultaneously captures long-range spatial dependencies and channel-wise responses, thereby significantly enlarging the effective receptive field. Our method employs a lightweight encoder-decoder architecture with end-to-end differentiable mesh regression, balancing accuracy and efficiency. Evaluated on strong-occlusion benchmarks—including FREIHAND, DEXYCB, and HO3D—our approach achieves state-of-the-art performance in both quantitative metrics and qualitative fidelity. It attains the smallest parameter count and fastest inference speed among comparable methods, while consistently producing complete, geometrically accurate, and detail-rich hand reconstructions.

Technology Category

Application Category

📝 Abstract

Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D hand mesh from single RGB image with occlusions

Balancing computational efficiency and reconstruction performance

Enhancing hand feature extraction via spatial-channel attention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates state space model for hand mesh reconstruction

Uses spatial-channel attention for feature enhancement

Achieves high efficiency with minimal parameters

🔎 Similar Papers

No similar papers found.