Monocular Models are Strong Learners for Multi-View Human Mesh Recovery

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing multi-view human mesh reconstruction methods rely heavily on cumbersome camera calibration or multi-view training data, limiting their generalization capability. This work proposes a training-free test-time optimization framework that, for the first time, leverages pre-trained single-view human mesh recovery models (e.g., HMR) as strong priors, integrating multi-view consistency and anatomical constraints to achieve high-fidelity, calibration-free reconstruction under arbitrary camera configurations. By eliminating the need for multi-view supervised training, the method significantly enhances generalization and achieves performance on par with or superior to state-of-the-art approaches that explicitly require multi-view training data, as demonstrated on standard benchmarks.

Technology Category

Application Category

📝 Abstract

Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time optimization guided by multi-view consistency and anatomical constraints. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks, surpassing multi-view models trained with explicit multi-view supervision.

Problem

Research questions and friction points this paper is trying to address.

multi-view human mesh recovery

camera calibration

generalization

multi-view training data

real-world scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

calibration-free

test-time optimization

multi-view consistency