3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Monocular 3D pose estimation yields camera-centered skeletal representations that exhibit strong viewpoint dependency, hindering cross-view kinematic analysis in health and sports science. To address this, we propose 3DPCNet—a model-agnostic pose normalization module that jointly integrates graph convolution (to encode local bone topology) and Transformer-based global context modeling via gated cross-attention, while self-supervisedly learning continuous 6D rotation prediction for SO(3)-aligned, body-centered canonical poses. Our method requires only synthetic rotational augmentations and a composite loss—no ground-truth 3D annotations. On MM-Fi, it reduces mean rotation error from >20° to 3.4° and mean joint position error from 64 mm to 47 mm. On TotalCapture, normalized poses yield acceleration signals highly consistent with ground-truth IMU measurements, significantly improving physical plausibility and cross-view comparability of kinematic analysis.

Technology Category

Application Category

📝 Abstract

Monocular 3D pose estimators produce camera-centered skeletons, creating view-dependent kinematic signals that complicate comparative analysis in applications such as health and sports science. We present 3DPCNet, a compact, estimator-agnostic module that operates directly on 3D joint coordinates to rectify any input pose into a consistent, body-centered canonical frame. Its hybrid encoder fuses local skeletal features from a graph convolutional network with global context from a transformer via a gated cross-attention mechanism. From this representation, the model predicts a continuous 6D rotation that is mapped to an $SO(3)$ matrix to align the pose. We train the model in a self-supervised manner on the MM-Fi dataset using synthetically rotated poses, guided by a composite loss ensuring both accurate rotation and pose reconstruction. On the MM-Fi benchmark, 3DPCNet reduces the mean rotation error from over 20$^{circ}$ to 3.4$^{circ}$ and the Mean Per Joint Position Error from ~64 mm to 47 mm compared to a geometric baseline. Qualitative evaluations on the TotalCapture dataset further demonstrate that our method produces acceleration signals from video that show strong visual correspondence to ground-truth IMU sensor data, confirming that our module removes viewpoint variability to enable physically plausible motion analysis.

Problem

Research questions and friction points this paper is trying to address.

Correcting view-dependent 3D poses into a consistent canonical frame

Enabling robust viewpoint-invariant 3D kinematic analysis from monocular RGB

Removing viewpoint variability for physically plausible motion analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pose canonicalization via body-centered frame alignment

Hybrid encoder fuses graph and transformer features

Self-supervised training with composite loss function

🔎 Similar Papers

No similar papers found.