A Coarse-to-Fine Human Pose Estimation Method based on Two-stage Distillation and Progressive Graph Neural Network

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing high-precision human pose estimation models suffer from excessive computational cost, making it challenging to simultaneously achieve accuracy and model efficiency. To address this, we propose a coarse-to-fine two-stage knowledge distillation framework. Our key contributions are: (1) a structure-aware joint loss that explicitly models geometric and semantic contextual relationships among keypoints; (2) an image-guided progressive graph convolutional network (IGP-GCN) that fuses visual features for fine-grained pose refinement; and (3) a progressive supervision training strategy to enhance the student model’s representational capacity and generalization. Evaluated on COCO and CrowdPose benchmarks, our method significantly outperforms state-of-the-art lightweight approaches—particularly on the challenging CrowdPose dataset with severe occlusion and high crowd density—achieving a favorable trade-off between accuracy and inference efficiency.

Technology Category

Application Category

📝 Abstract
Human pose estimation has been widely applied in the human-centric understanding and generation, but most existing state-of-the-art human pose estimation methods require heavy computational resources for accurate predictions. In order to obtain an accurate, robust yet lightweight human pose estimator, one feasible way is to transfer pose knowledge from a powerful teacher model to a less-parameterized student model by knowledge distillation. However, the traditional knowledge distillation framework does not fully explore the contextual information among human joints. Thus, in this paper, we propose a novel coarse-to-fine two-stage knowledge distillation framework for human pose estimation. In the first-stage distillation, we introduce the human joints structure loss to mine the structural information among human joints so as to transfer high-level semantic knowledge from the teacher model to the student model. In the second-stage distillation, we utilize an Image-Guided Progressive Graph Convolutional Network (IGP-GCN) to refine the initial human pose obtained from the first-stage distillation and supervise the training of the IGP-GCN in the progressive way by the final output pose of teacher model. The extensive experiments on the benchmark dataset: COCO keypoint and CrowdPose datasets, show that our proposed method performs favorably against lots of the existing state-of-the-art human pose estimation methods, especially for the more complex CrowdPose dataset, the performance improvement of our model is more significant.
Problem

Research questions and friction points this paper is trying to address.

Develops lightweight human pose estimator via knowledge distillation
Enhances joint context learning with two-stage distillation framework
Improves accuracy on complex datasets like CrowdPose
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage knowledge distillation framework
Human joints structure loss
Image-Guided Progressive Graph Convolutional Network
🔎 Similar Papers
No similar papers found.
Z
Zhangjian Ji
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
W
Wenjin Zhang
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
S
Shaotong Qiao
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
Kai Feng
Kai Feng
Northwestern Polytechnical University
Computational imagingspectral imagingdeep learning
Yuhua Qian
Yuhua Qian
山西大学大数据科学与产业研究院
机器学习、数据挖掘、复杂网络