SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of extrinsic calibration in uncalibrated multi-camera systems lacking artificial markers, by leveraging freely moving humans or animals as natural geometric signal sources to jointly solve camera extrinsic estimation and cross-view keypoint correspondence. Methodologically, we design a differentiable pose rotation module and a differentiable matching mechanism within an end-to-end neural framework, augmented with an explicit geometric consistency loss that enforces rotational estimates and correspondences to yield valid translations. Our architecture is category-agnostic, supporting both human and animal pose inputs. The key contribution is the first formulation of biological joint motion as a geometric calibration signal, unifying extrinsic parameter estimation and correspondence search. Evaluated on multiple in-the-wild datasets, our method achieves high-fidelity 3D pose reconstruction of previously unseen animals using only single-frame 2D poses—demonstrating superior generalization and robustness in fully uncalibrated settings.

Technology Category

Application Category

📝 Abstract
Can freely moving humans or animals themselves serve as calibration targets for multi-camera systems while simultaneously estimating their correspondences across views? We humans can solve this problem by mentally rotating the observed 2D poses and aligning them with those in the target views. Inspired by this cognitive ability, we propose SteerPose, a neural network that performs this rotation of 2D poses into another view. By integrating differentiable matching, SteerPose simultaneously performs extrinsic camera calibration and correspondence search within a single unified framework. We also introduce a novel geometric consistency loss that explicitly ensures that the estimated rotation and correspondences result in a valid translation estimation. Experimental results on diverse in-the-wild datasets of humans and animals validate the effectiveness and robustness of the proposed method. Furthermore, we demonstrate that our method can reconstruct the 3D poses of novel animals in multi-camera setups by leveraging off-the-shelf 2D pose estimators and our class-agnostic model.
Problem

Research questions and friction points this paper is trying to address.

Simultaneous extrinsic camera calibration and correspondence matching
Using moving humans/animals as calibration targets
3D pose reconstruction from 2D views
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network rotates 2D poses across views
Differentiable matching integrates calibration and correspondence
Geometric consistency loss ensures valid translation
🔎 Similar Papers
No similar papers found.