Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing self-supervised multi-camera depth estimation methods struggle with geometric inconsistencies across views caused by the structural complexity and motion coupling of articulated vehicles. This work proposes ArticuSurDepth, a novel framework that, for the first time, integrates multi-view spatial context enhancement, cross-view surface normal constraints, ground-aware camera height regularization, and cross-body pose consistency mechanisms. Leveraging structural priors from vision foundation models, the approach enables self-supervised learning of surround-view depth for articulated vehicles. The method substantially improves structural coherence and metric accuracy of depth estimates, achieving state-of-the-art performance on both a newly curated articulated vehicle dataset and established public benchmarks including DDAD, nuScenes, and KITTI.

Technology Category

Application Category

📝 Abstract

Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors from vision foundation model. Specifically, we introduce multi-view spatial context enrichment strategy and a cross-view surface normal constraint to improve structural coherence across spatial and temporal contexts. We further incorporate camera height regularization with ground plane-awareness to encourage metric depth estimation, together with cross-vehicle pose consistency that bridges motion estimation between articulated segments. To validate our proposed method, an articulated vehicle experiment platform was established with a dataset collected over it. Experiment results demonstrate state-of-the-art (SoTA) performance of depth estimation on our self-collected dataset as well as on DDAD, nuScenes, and KITTI benchmarks.

Problem

Research questions and friction points this paper is trying to address.

articulated vehicles

surround depth estimation

cross-view consistency

self-supervised learning

3D geometric consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised depth estimation

articulated vehicles

cross-view geometric consistency