🤖 AI Summary
Existing multi-view 3D human pose estimation methods rely heavily on accurate camera calibration, limiting their applicability in real-world uncalibrated settings. This work proposes the first calibration-free framework that achieves both geometric consistency and temporal coherence in 3D pose reconstruction. The approach innovatively integrates multi-view algebraic geometric constraints into the learning pipeline via a Gröbner basis corrector, and further combines a Transformer-based triangulation regression module with an equivariant temporal refinement module to effectively resolve scale ambiguity and geometric inconsistency. Evaluated on standard benchmarks, the method significantly outperforms existing uncalibrated approaches, establishing a new state of the art and substantially narrowing the performance gap with fully calibrated methods.
📝 Abstract
Recovering 3D human pose from multi-view imagery typically relies on precise camera calibration, which is often unavailable in real-world scenarios, thereby severely limiting the applicability of existing methods. To overcome this challenge, we propose an unconstrained framework that synergizes deep neural networks, algebraic priors, and temporal dynamics for uncalibrated multi-view human pose estimation. First, we introduce the Triangulation with Transformer Regressor (TTR), which reformulates classical triangulation into a data-driven token fusion process to bypass the dependency on explicit camera parameters. Second, to explicitly embed the inherent algebraic relations of the multi-view variety into the learning process, we propose the Gröbner basis Corrector (GC). This pioneering loss formulation enforces constraints derived from the multi-view variety to ensure the neural predictions strictly adhere to the laws of projective geometry. Finally, we devise the Temporal Equivariant Rectifier (TER), which exploits the equivariance property of human motion to impose temporal coherence and structural consistency, effectively mitigating scale ambiguity in uncalibrated settings. Extensive evaluations on standard benchmarks demonstrate that our framework establishes a new state-of-the-art for uncalibrated multi-view human pose estimation. Notably, our approach significantly closes the performance gap between calibration-free methods and fully calibrated oracles.