HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

📅 2024-05-02
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of hand mesh reconstruction from a single RGB image under severe occlusion (e.g., hand-object interaction). To this end, we introduce state space models (SSMs) to this task for the first time, proposing a state space-driven spatial-channel joint attention module that simultaneously captures long-range spatial dependencies and channel-wise responses, thereby significantly enlarging the effective receptive field. Our method employs a lightweight encoder-decoder architecture with end-to-end differentiable mesh regression, balancing accuracy and efficiency. Evaluated on strong-occlusion benchmarks—including FREIHAND, DEXYCB, and HO3D—our approach achieves state-of-the-art performance in both quantitative metrics and qualitative fidelity. It attains the smallest parameter count and fastest inference speed among comparable methods, while consistently producing complete, geometrically accurate, and detail-rich hand reconstructions.

Technology Category

Application Category

📝 Abstract
Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D hand mesh from single RGB image with occlusions
Balancing computational efficiency and reconstruction performance
Enhancing hand feature extraction via spatial-channel attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates state space model for hand mesh reconstruction
Uses spatial-channel attention for feature enhancement
Achieves high efficiency with minimal parameters
🔎 Similar Papers
No similar papers found.
Z
Zixun Jiao
Xi’an Polytechnic University
X
Xihan Wang
Xi’an Polytechnic University
Zhaoqiang Xia
Zhaoqiang Xia
Northwestern Polytechnical University
Visual ComputingInformation Processing
L
Lianhe Shao
Xi’an Polytechnic University
Q
Quanli Gao
Xi’an Polytechnic University