Monocular Person Localization under Camera Ego-motion

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular human 3D localization methods for mobile platforms suffer from severe degradation under aggressive ego-motion and often rely on restrictive static-camera assumptions or exhibit poor generalization. To address this, we propose a joint optimization framework that reformulates localization as pose estimation. Leveraging a minimal four-point human geometric model, our method simultaneously optimizes the camera’s 2D pose and the subject’s 3D position, integrating monocular geometric constraints while abandoning both conventional regression-based paradigms and the static-camera assumption. The approach enables real-time embedded deployment and achieves substantial accuracy improvements—reducing average localization error by 37.2%—on public benchmarks and real-world quadruped robot experiments. It further enables a robust autonomous following system. Our core contribution is the first end-to-end, nonlinear joint optimization of monocular human localization and camera motion estimation, establishing a novel paradigm for human–robot interaction (HRI) on dynamic platforms.

Technology Category

Application Category

📝 Abstract
Localizing a person from a moving monocular camera is critical for Human-Robot Interaction (HRI). To estimate the 3D human position from a 2D image, existing methods either depend on the geometric assumption of a fixed camera or use a position regression model trained on datasets containing little camera ego-motion. These methods are vulnerable to fierce camera ego-motion, resulting in inaccurate person localization. We consider person localization as a part of a pose estimation problem. By representing a human with a four-point model, our method jointly estimates the 2D camera attitude and the person's 3D location through optimization. Evaluations on both public datasets and real robot experiments demonstrate our method outperforms baselines in person localization accuracy. Our method is further implemented into a person-following system and deployed on an agile quadruped robot.
Problem

Research questions and friction points this paper is trying to address.

Localizing a person from a moving monocular camera.
Estimating 3D human position from 2D images with camera motion.
Improving accuracy in person localization under camera ego-motion.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses four-point model for human representation
Jointly estimates camera attitude and 3D location
Implemented in agile quadruped robot system
🔎 Similar Papers
No similar papers found.
Yu Zhan
Yu Zhan
Southern University of Science and Technology
robot person followinghuman pose estimationomnidirectional image
Hanjing Ye
Hanjing Ye
PhD Student at Southern University of Science and Technology
Robot Person FollowingPlace Recognition
H
Hong Zhang
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology (SUSTech), and the Department of Electronic and Electrical Engineering, SUSTech