UM-Depth : Uncertainty Masked Self-Supervised Monocular Depth Estimation with Visual Odometry

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Self-supervised monocular depth estimation suffers significant performance degradation in low-texture and dynamic regions. To address this, we propose an uncertainty-aware teacher-student framework that integrates visual odometry and optical-flow-guided motion modeling into self-supervised training. The teacher network leverages optical flow to strengthen geometric constraints in weak-texture regions, while the student network employs uncertainty-aware masking to suppress interference from dynamic or unreliable pixels during joint depth and pose optimization. Our method requires no ground-truth depth labels, additional annotations, or inference-time overhead, enabling end-to-end robust depth and pose estimation. Evaluated on KITTI and Cityscapes, it achieves state-of-the-art performance—particularly improving depth accuracy at dynamic object boundaries and textureless regions—and concurrently enhances pose estimation accuracy.

Technology Category

Application Category

📝 Abstract
Monocular depth estimation has been increasingly adopted in robotics and autonomous driving for its ability to infer scene geometry from a single camera. In self-supervised monocular depth estimation frameworks, the network jointly generates and exploits depth and pose estimates during training, thereby eliminating the need for depth labels. However, these methods remain challenged by uncertainty in the input data, such as low-texture or dynamic regions, which can cause reduced depth accuracy. To address this, we introduce UM-Depth, a framework that combines motion- and uncertainty-aware refinement to enhance depth accuracy at dynamic object boundaries and in textureless regions. Specifically, we develop a teacherstudent training strategy that embeds uncertainty estimation into both the training pipeline and network architecture, thereby strengthening supervision where photometric signals are weak. Unlike prior motion-aware approaches that incur inference-time overhead and rely on additional labels or auxiliary networks for real-time generation, our method uses optical flow exclusively within the teacher network during training, which eliminating extra labeling demands and any runtime cost. Extensive experiments on the KITTI and Cityscapes datasets demonstrate the effectiveness of our uncertainty-aware refinement. Overall, UM-Depth achieves state-of-the-art results in both self-supervised depth and pose estimation on the KITTI datasets.
Problem

Research questions and friction points this paper is trying to address.

Addressing uncertainty in self-supervised monocular depth estimation
Enhancing depth accuracy in textureless and dynamic regions
Eliminating inference-time overhead for motion-aware refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-aware teacher-student training strategy
Motion-aware refinement without inference overhead
Optical flow used only during teacher training
🔎 Similar Papers
No similar papers found.
T
Tae-Wook Um
School of Mechanical and Robotics Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, South Korea
K
Ki-Hyeon Kim
School of Mechanical and Robotics Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, South Korea
H
Hyun-Duck Choi
Department of Smart ICT Convergence Engineering, Seoul National University of Science and Technology, Seoul 01811, South Korea
Hyo-Sung Ahn
Hyo-Sung Ahn
Professor, School of Mechanical Eng., GIST
Formation ControlDistributed CoordinationNetworked Control SystemsIterative Learning ControlAutonomous Systems