🤖 AI Summary
Existing multi-view human mesh reconstruction methods rely heavily on cumbersome camera calibration or multi-view training data, limiting their generalization capability. This work proposes a training-free test-time optimization framework that, for the first time, leverages pre-trained single-view human mesh recovery models (e.g., HMR) as strong priors, integrating multi-view consistency and anatomical constraints to achieve high-fidelity, calibration-free reconstruction under arbitrary camera configurations. By eliminating the need for multi-view supervised training, the method significantly enhances generalization and achieves performance on par with or superior to state-of-the-art approaches that explicitly require multi-view training data, as demonstrated on standard benchmarks.
📝 Abstract
Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time optimization guided by multi-view consistency and anatomical constraints. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks, surpassing multi-view models trained with explicit multi-view supervision.