PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular 3D human pose estimation methods neglect camera intrinsics during image cropping, leading to depth estimation bias; moreover, perspective distortion arises when the human subject deviates from the image center, degrading model generalization. To address these issues, this work introduces Perspective Encoding (PE) to explicitly model camera intrinsics and enhance depth awareness, and designs Perspective Rotation (PR), a geometric transformation that reprojects the human body to the image center to correct perspective distortion. PE and PR are jointly integrated into an end-to-end deep network, enabling joint optimization of the 2D–3D perspective mapping during both training and inference. The proposed method achieves state-of-the-art performance on 3DPW, MPI-INF-3DHP, and Human3.6M: on 3DPW, it attains an MPJPE of 60.1 mm—improving upon the previous best by 7.54%—demonstrating significantly enhanced robustness and accuracy in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Monocular 3D human pose estimation (HPE) methods estimate the 3D positions of joints from individual images. Existing 3D HPE approaches often use the cropped image alone as input for their models. However, the relative depths of joints cannot be accurately estimated from cropped images without the corresponding camera intrinsics, which determine the perspective relationship between 3D objects and the cropped images. In this work, we introduce Perspective Encoding (PE) to encode the camera intrinsics of the cropped images. Moreover, since the human subject can appear anywhere within the original image, the perspective relationship between the 3D scene and the cropped image differs significantly, which complicates model fitting. Additionally, the further the human subject deviates from the image center, the greater the perspective distortions in the cropped image. To address these issues, we propose Perspective Rotation (PR), a transformation applied to the original image that centers the human subject, thereby reducing perspective distortions and alleviating the difficulty of model fitting. By incorporating PE and PR, we propose a novel 3D HPE framework, PersPose. Experimental results demonstrate that PersPose achieves state-of-the-art (SOTA) performance on the 3DPW, MPIINF-3DHP, and Human3.6M datasets. For example, on the in-the-wild dataset 3DPW, PersPose achieves an MPJPE of 60.1 mm, 7.54% lower than the previous SOTA approach. Code is available at: https://github.com/ KenAdamsJoseph/PersPose.
Problem

Research questions and friction points this paper is trying to address.

Estimating 3D joint positions from cropped monocular images
Addressing perspective distortions without camera intrinsics
Reducing model fitting difficulty from off-center human subjects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perspective Encoding for camera intrinsics integration
Perspective Rotation to center subject and reduce distortion
Novel framework combining both techniques for SOTA performance
🔎 Similar Papers
No similar papers found.
Xiaoyang Hao
Xiaoyang Hao
Tencent
speech synthesis
H
Han Li
Southern University of Science and Technology, China