Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation

📅 2024-12-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing pose estimation methods suffer from high computational overhead and model complexity. To address this, we propose a lightweight dual-stacked hourglass network. Our key innovation lies in the first integration of depthwise separable convolutions with the Convolutional Block Attention Module (CBAM) into the hourglass architecture, enabling simultaneous model compression and enhanced feature representation. Compared to the original eight-stacked hourglass baseline, our model reduces parameters to only 10% (2.3M) and computational cost to 3.7G FLOPs, while achieving competitive accuracy—72.07 AP on COCO and MPII benchmarks—surpassing six state-of-the-art lightweight models. This design achieves a superior trade-off among accuracy, parameter count, and inference efficiency, offering a practical and efficient solution for real-time human pose estimation on edge devices.

Technology Category

Application Category

📝 Abstract

Pose estimation is a critical task in computer vision with a wide range of applications from activity monitoring to human-robot interaction. However,most of the existing methods are computationally expensive or have complex architecture. Here we propose a lightweight attention based pose estimation network that utilizes depthwise separable convolution and Convolutional Block Attention Module on an hourglass backbone. The network significantly reduces the computational complexity (floating point operations) and the model size (number of parameters) containing only about 10% of parameters of original eight stack Hourglass network. Experiments were conducted on COCO and MPII datasets using a two stack hourglass backbone. The results showed that our model performs well in comparison to six other lightweight pose estimation models with an average precision of 72.07. The model achieves this performance with only 2.3M parameters and 3.7G FLOPs.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in human pose estimation

Simplifying architecture of pose estimation networks

Maintaining accuracy with fewer model parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight attention-based hourglass network

Depthwise separable convolution reduces complexity

Convolutional Block Attention Module enhances performance

🔎 Similar Papers

No similar papers found.