Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation

📅 2024-12-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pose estimation methods suffer from high computational overhead and model complexity. To address this, we propose a lightweight dual-stacked hourglass network. Our key innovation lies in the first integration of depthwise separable convolutions with the Convolutional Block Attention Module (CBAM) into the hourglass architecture, enabling simultaneous model compression and enhanced feature representation. Compared to the original eight-stacked hourglass baseline, our model reduces parameters to only 10% (2.3M) and computational cost to 3.7G FLOPs, while achieving competitive accuracy—72.07 AP on COCO and MPII benchmarks—surpassing six state-of-the-art lightweight models. This design achieves a superior trade-off among accuracy, parameter count, and inference efficiency, offering a practical and efficient solution for real-time human pose estimation on edge devices.

Technology Category

Application Category

📝 Abstract
Pose estimation is a critical task in computer vision with a wide range of applications from activity monitoring to human-robot interaction. However,most of the existing methods are computationally expensive or have complex architecture. Here we propose a lightweight attention based pose estimation network that utilizes depthwise separable convolution and Convolutional Block Attention Module on an hourglass backbone. The network significantly reduces the computational complexity (floating point operations) and the model size (number of parameters) containing only about 10% of parameters of original eight stack Hourglass network. Experiments were conducted on COCO and MPII datasets using a two stack hourglass backbone. The results showed that our model performs well in comparison to six other lightweight pose estimation models with an average precision of 72.07. The model achieves this performance with only 2.3M parameters and 3.7G FLOPs.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in human pose estimation
Simplifying architecture of pose estimation networks
Maintaining accuracy with fewer model parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight attention-based hourglass network
Depthwise separable convolution reduces complexity
Convolutional Block Attention Module enhances performance
🔎 Similar Papers
No similar papers found.
M
Marsha Mariya Kappan
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
E
E. B. Sandoval
School of Art and Design, Creative Robotics Lab, University of New South Wales, Sydney, Australia
Erik Meijering
Erik Meijering
Professor of Biomedical Image Computing
Artificial IntelligenceComputer VisionDeep LearningImage AnalysisBiomedical Imaging
F
Francisco Cruz
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia; Escuela de Ingenier´ıa, Universidad Central de Chile, Santiago, Chile