🤖 AI Summary
This work addresses the limitation of existing 3D human pose estimation methods that supervise each joint independently and thus fail to effectively model local and global structural dependencies among joints. To overcome this, we propose SEAL-pose, a novel framework that introduces, for the first time, a learnable loss network (loss-net) to replace handcrafted structural constraints. Trained in an end-to-end manner, the loss-net automatically captures complex joint-graph dependencies directly from data. Our approach integrates a joint-graph-based loss network architecture, structural consistency learning, and differentiable training. Evaluated across three standard benchmarks with eight diverse backbone networks, SEAL-pose consistently achieves significant reductions in joint error, outperforming methods that rely on explicit structural constraints, and demonstrates strong robustness in cross-dataset and in-the-wild scenarios.
📝 Abstract
3D human pose estimation (HPE) is characterized by intricate local and global dependencies among joints. Conventional supervised losses are limited in capturing these correlations because they treat each joint independently. Previous studies have attempted to promote structural consistency through manually designed priors or rule-based constraints; however, these approaches typically require manual specification and are often non-differentiable, limiting their use as end-to-end training objectives. We propose SEAL-pose, a data-driven framework in which a learnable loss-net trains a pose-net by evaluating structural plausibility. Rather than relying on hand-crafted priors, our joint-graph-based design enables the loss-net to learn complex structural dependencies directly from data. Extensive experiments on three 3D HPE benchmarks with eight backbones show that SEAL-pose reduces per-joint errors and improves pose plausibility compared with the corresponding backbones across all settings. Beyond improving each backbone, SEAL-pose also outperforms models with explicit structural constraints, despite not enforcing any such constraints. Finally, we analyze the relationship between the loss-net and structural consistency, and evaluate SEAL-pose in cross-dataset and in-the-wild settings.