SA4Depth: Consistent Pose-Depth Scale Alignment for Self-Supervised Monocular Depth Estimation

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the scale inconsistency between depth and pose predictions in monocular self-supervised depth estimation, which often limits depth accuracy. To resolve this issue, the authors propose a differentiable pose refinement mechanism that implicitly enforces scale consistency by aligning features across consecutive frames through reprojection guided by learnable visual features during training. The method operates entirely within an end-to-end trainable framework without introducing additional inference overhead. Experimental results demonstrate significant improvements in depth estimation accuracy across diverse indoor and outdoor benchmarks, including KITTI, Cityscapes, and NYUv2. Furthermore, evaluations on KITTI Odometry validate the effectiveness of the proposed pose optimization in enhancing trajectory consistency.

📝 Abstract

Self-supervised depth estimation from monocular sequences relies on the joint learning of a depth and a pose network. Despite abundant research done to improve the depth network, efforts on the pose remain limited. In this context, even when depth is estimated up to scale, we highlight the importance of the alignment between the scene scales estimated by the pose and depth nets. Then, we introduce SA4Depth, an approach to improve this alignment and boost the depth predictions while keeping the inference time unchanged. Our proposed method uses the depth estimated during training to reproject learnable visual features across consecutive frames and refine the pose estimates by reducing feature alignment residuals. With our method, the estimated scene scales by the separate depth and pose networks are aligned, and the prediction scale consistency is improved across different sequences. Our differentiable refinement integrates seamlessly into existing self-supervised pipelines and substantially improves their depth estimates. We demonstrate this with extensive experiments both outdoors and indoors on KITTI, Cityscapes, and NYUv2. Additionally, results on KITTI Odometry confirm the effectiveness of our pose refinement. Our code is available at https://github.com/Runningchauncey/SA4Depth .

Problem

Research questions and friction points this paper is trying to address.

self-supervised depth estimation

monocular depth

pose-depth alignment

scale consistency

scene scale

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale alignment

self-supervised depth estimation

pose refinement