🤖 AI Summary
To address the low accuracy of monocular fisheye-based 3D human pose estimation caused by severe radial distortion, this paper systematically evaluates four projection models—pinhole, equidistant, double-sphere, and cylindrical—in terms of undistortion quality and 3D pose reconstruction performance. We propose a geometry-aware, adaptive projection model selection heuristic based on human detection bounding boxes, eliminating manual preselection. Furthermore, we introduce FISHnCHIPS, the first real-world fisheye dataset featuring extreme viewpoints and precise 3D annotations. Experiments demonstrate that the double-sphere model significantly improves absolute pose accuracy—especially for close-range and wide-field-of-view scenarios—reducing mean joint error by 12.7% over the pinhole model. These results underscore the critical importance of projection model adaptation for wide-FOV monocular 3D pose estimation.
📝 Abstract
Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. However, accurately detecting human poses in fisheye images is challenging due to the curved distortions inherent to fisheye optics. While various methods for undistorting fisheye images have been proposed, their effectiveness and limitations for poses that cover a wide FOV has not been systematically evaluated in the context of absolute human pose estimation from monocular fisheye images. To address this gap, we evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy. We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose. The usage of advanced fisheye models like the double sphere model significantly enhances 3D human pose estimation accuracy. We propose a heuristic for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality. Additionally, we introduce and evaluate on our novel dataset FISHnCHIPS, which features 3D human skeleton annotations in fisheye images, including images from unconventional angles, such as extreme close-ups, ground-mounted cameras, and wide-FOV poses, available at: https://www.vision.rwth-aachen.de/fishnchips