AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of multi-view 3D human reconstruction methods that typically require precise camera calibration, hindering their applicability in real-world scenarios. The paper proposes the first calibration-free, feed-forward framework capable of reconstructing SMPL parameters and world-coordinate positions of arbitrary humans from uncalibrated multi-view images. Key innovations include a learnable identity query with soft assignment for cross-view person association and the integration of multi-view geometric triangulation to resolve depth ambiguity. The model is jointly optimized through a Cross-View Identity Association module, contrastive learning supervision, and a cross-view reprojection loss. Evaluated on EgoHumans and EgoExo4D datasets, the method achieves state-of-the-art performance in both 3D reconstruction accuracy and camera pose estimation, while running 180× faster than optimization-based approaches at inference time.

Technology Category

Application Category

📝 Abstract
Reconstructing 3D humans from images captured at multiple perspectives typically requires pre-calibration, like using checkerboards or MVS algorithms, which limits scalability and applicability in diverse real-world scenarios. In this work, we present \textbf{AHAP} (Reconstructing \textbf{A}rbitrary \textbf{H}umans from \textbf{A}rbitrary \textbf{P}erspectives), a feed-forward framework for reconstructing arbitrary humans from arbitrary camera perspectives without requiring camera calibration. Our core lies in the effective fusion of multi-view geometry to assist human association, reconstruction and localization. Specifically, we use a Cross-View Identity Association module through learnable person queries and soft assignment, supervised by contrastive learning to resolve cross-view human identity association. A Human Head fuses cross-view features and scene context for SMPL prediction, guided by cross-view reprojection losses to enforce body pose consistency. Additionally, multi-view geometry eliminates the depth ambiguity inherent in monocular methods, providing more precise 3D human localization through multi-view triangulation. Experiments on EgoHumans and EgoExo4D demonstrate that AHAP achieves competitive performance on both world-space human reconstruction and camera pose estimation, while being 180$\times$ faster than optimization-based approaches.
Problem

Research questions and friction points this paper is trying to address.

3D human reconstruction
multi-view images
camera calibration
real-world scenarios
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view geometry
camera calibration-free
cross-view association
3D human reconstruction
feed-forward framework
🔎 Similar Papers
No similar papers found.
X
Xiaozhen Qiao
Institute of Artificial Intelligence (TeleAI), China Telecom, P. R. China; University of Science and Technology of China
Wenjia Wang
Wenjia Wang
University of Hong Kong
Computer VisionComputer Graphics
Z
Zhiyuan Zhao
Institute of Artificial Intelligence (TeleAI), China Telecom, P. R. China
J
Jiacheng Sun
Huawei Technologies Co., Ltd.
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing
Hongyuan Zhang
Hongyuan Zhang
The University of Hong Kong
Representation LearningMultimodal LearningGraph Neural NetworksOptimization
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom, P. R. China