SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing parametric human models (e.g., SMPL) lack biomechanical fidelity; while SKEL provides anatomically accurate skeletal structure, its parameter estimation is hindered by data scarcity, occlusion-prone multi-view ambiguity, and joint complexity. To address these challenges, we propose SKEL-CF: a Transformer-based coarse-to-fine encoder-decoder framework—where the encoder produces initial estimates of camera and SKEL parameters, and the decoder refines them hierarchically. We explicitly model camera geometry to mitigate depth and scale ambiguities. Furthermore, we introduce 4DHuman-SKEL, the first large-scale 4D dataset specifically designed for anatomical skeleton estimation. Evaluated on MOYO, SKEL-CF achieves 85.0 mm MPJPE and 51.4 mm PA-MPJPE, substantially outperforming HSMR (104.5 / 79.6), and marks the first method enabling jointly high-accuracy, anatomically plausible reconstruction of both skeletal structure and surface geometry.

Technology Category

Application Category

📝 Abstract

Parametric 3D human models such as SMPL have driven significant advances in human pose and shape estimation, yet their simplified kinematics limit biomechanical realism. The recently proposed SKEL model addresses this limitation by re-rigging SMPL with an anatomically accurate skeleton. However, estimating SKEL parameters directly remains challenging due to limited training data, perspective ambiguities, and the inherent complexity of human articulation. We introduce SKEL-CF, a coarse-to-fine framework for SKEL parameter estimation. SKEL-CF employs a transformer-based encoder-decoder architecture, where the encoder predicts coarse camera and SKEL parameters, and the decoder progressively refines them in successive layers. To ensure anatomically consistent supervision, we convert the existing SMPL-based dataset 4DHuman into a SKEL-aligned version, 4DHuman-SKEL, providing high-quality training data for SKEL estimation. In addition, to mitigate depth and scale ambiguities, we explicitly incorporate camera modeling into the SKEL-CF pipeline and demonstrate its importance across diverse viewpoints. Extensive experiments validate the effectiveness of the proposed design. On the challenging MOYO dataset, SKEL-CF achieves 85.0 MPJPE / 51.4 PA-MPJPE, significantly outperforming the previous SKEL-based state-of-the-art HSMR (104.5 / 79.6). These results establish SKEL-CF as a scalable and anatomically faithful framework for human motion analysis, bridging the gap between computer vision and biomechanics. Our implementation is available on the project page: https://pokerman8.github.io/SKEL-CF/.

Problem

Research questions and friction points this paper is trying to address.

Estimating anatomically accurate skeleton parameters from limited training data

Addressing perspective ambiguities in human pose and shape estimation

Refining coarse biomechanical parameters progressively for realistic articulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-fine framework for SKEL parameter estimation

Transformer encoder-decoder refines camera and SKEL parameters

Converts SMPL dataset to SKEL-aligned for anatomical supervision

🔎 Similar Papers

Recognizing Identities From Human Skeletons: A Survey on 3D Skeleton Based Person Re-Identification