FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera

๐Ÿ“… 2024-09-23
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses three key challenges in fisheye monocular depth estimation: the absence of ground-truth depth annotations, severe distortion-induced scale ambiguity, and training instability. We propose a real-scale self-supervised monocular depth estimation method specifically designed for fisheye cameras. Our approach explicitly embeds a differentiable fisheye camera model into the reprojection pipelineโ€”marking the first such integration. To resolve scale ambiguity, we replace network-predicted poses with geometrically calibrated, metric-scale poses derived from intrinsic and extrinsic calibration. Furthermore, we introduce a multi-scale adaptive feature fusion module to suppress pose estimation noise. By unifying fisheye geometric modeling, real-scale geometric constraints, and a self-supervised learning framework, our method achieves significant improvements in depth accuracy and robustness on public benchmarks and real-world fisheye sequences. It produces physically interpretable, metric-scale depth maps while simplifying both training and inference pipelines.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate depth estimation is crucial for 3D scene comprehension in robotics and autonomous vehicles. Fisheye cameras, known for their wide field of view, have inherent geometric benefits. However, their use in depth estimation is restricted by a scarcity of ground truth data and image distortions. We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras. We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions, thereby improving depth estimation accuracy and training stability. Furthermore, we incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network. Essentially, this method offers the necessary physical depth for robotic tasks, and also streamlines the training and inference procedures. Additionally, we devise a multi-channel output strategy to improve robustness by adaptively fusing features at various scales, which reduces the noise from real pose data. We demonstrate the superior performance and robustness of our model in fisheye image depth estimation through evaluations on public datasets and real-world scenarios. The project website is available at: https://github.com/guoyangzhao/FisheyeDepth.
Problem

Research questions and friction points this paper is trying to address.

Accurate depth estimation for fisheye cameras in robotics and autonomous vehicles.
Handling fisheye image distortions and scarcity of ground truth data.
Improving depth estimation accuracy and training stability with real-scale pose information.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised depth estimation for fisheye cameras
Real-scale pose integration for accurate depth
Multi-channel output for robust feature fusion
๐Ÿ”Ž Similar Papers
No similar papers found.
G
Guoyang Zhao
the Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (Guangzhou), China
Y
Yuxuan Liu
the Department of ECE, The Hong Kong University of Science and Technology, Hong Kong SAR, China
W
Weiqing Qi
the Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (Guangzhou), China
F
Fulong Ma
the Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (Guangzhou), China
M
Ming Liu
the Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (Guangzhou), China
J
Jun Ma
the Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology (Guangzhou), China