From Camera to World: A Plug-and-Play Module for Human Mesh Transformation

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
3D human mesh reconstruction from in-the-wild images suffers from inaccurate orientation estimation in the world coordinate system, primarily due to the absence of ground-truth camera rotation—especially pitch angle—leading to substantial errors under the common zero-rotation assumption. To address this, we propose a human-centered strategy that estimates camera pitch solely from RGB images and synthetic depth maps. We further introduce a plug-and-play Mesh-Plug module that jointly optimizes root joint orientation and full-body pose. Additionally, we design a camera rotation prediction network grounded in human spatial configuration. Our method achieves significant improvements over state-of-the-art approaches on the SPEC-SYN and SPEC-MTP benchmarks, enabling more accurate and robust world-coordinate human mesh reconstruction without requiring real camera calibration.

Technology Category

Application Category

📝 Abstract
Reconstructing accurate 3D human meshes in the world coordinate system from in-the-wild images remains challenging due to the lack of camera rotation information. While existing methods achieve promising results in the camera coordinate system by assuming zero camera rotation, this simplification leads to significant errors when transforming the reconstructed mesh to the world coordinate system. To address this challenge, we propose Mesh-Plug, a plug-and-play module that accurately transforms human meshes from camera coordinates to world coordinates. Our key innovation lies in a human-centered approach that leverages both RGB images and depth maps rendered from the initial mesh to estimate camera rotation parameters, eliminating the dependency on environmental cues. Specifically, we first train a camera rotation prediction module that focuses on the human body's spatial configuration to estimate camera pitch angle. Then, by integrating the predicted camera parameters with the initial mesh, we design a mesh adjustment module that simultaneously refines the root joint orientation and body pose. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods on the benchmark datasets SPEC-SYN and SPEC-MTP.
Problem

Research questions and friction points this paper is trying to address.

Estimates camera rotation for 3D human mesh reconstruction
Transforms human meshes from camera to world coordinates
Refines root joint orientation and body pose simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play module transforms camera to world coordinates
Uses human-centered RGB and depth maps to estimate camera rotation
Refines root joint orientation and body pose simultaneously
🔎 Similar Papers
No similar papers found.
C
Changhai Ma
University of Science and Technology of China
Ziyu Wu
Ziyu Wu
USTC
smart textile applicationspressure sensing mattresshuman pose and shape estimation
Y
Yunkang Zhang
University of Science and Technology of China
Qijun Ying
Qijun Ying
University of Science and Technology of China
Ubiquitous ComputeHuman Mesh ReconstructionHuman Computer InteractionE-health
B
Boyan Liu
University of Science and Technology of China
X
Xiaohui Cai
University of Science and Technology of China