Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised multi-camera depth estimation methods struggle with geometric inconsistencies across views caused by the structural complexity and motion coupling of articulated vehicles. This work proposes ArticuSurDepth, a novel framework that, for the first time, integrates multi-view spatial context enhancement, cross-view surface normal constraints, ground-aware camera height regularization, and cross-body pose consistency mechanisms. Leveraging structural priors from vision foundation models, the approach enables self-supervised learning of surround-view depth for articulated vehicles. The method substantially improves structural coherence and metric accuracy of depth estimates, achieving state-of-the-art performance on both a newly curated articulated vehicle dataset and established public benchmarks including DDAD, nuScenes, and KITTI.
📝 Abstract
Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors from vision foundation model. Specifically, we introduce multi-view spatial context enrichment strategy and a cross-view surface normal constraint to improve structural coherence across spatial and temporal contexts. We further incorporate camera height regularization with ground plane-awareness to encourage metric depth estimation, together with cross-vehicle pose consistency that bridges motion estimation between articulated segments. To validate our proposed method, an articulated vehicle experiment platform was established with a dataset collected over it. Experiment results demonstrate state-of-the-art (SoTA) performance of depth estimation on our self-collected dataset as well as on DDAD, nuScenes, and KITTI benchmarks.
Problem

Research questions and friction points this paper is trying to address.

articulated vehicles
surround depth estimation
cross-view consistency
self-supervised learning
3D geometric consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised depth estimation
articulated vehicles
cross-view geometric consistency
vision foundation model
metric depth
🔎 Similar Papers
No similar papers found.
Weimin Liu
Weimin Liu
Assistant Professor ,School of Physical Science and Technology, ShanghaiTech University
ultrafast spectroscopy
J
Jiyuan Qiu
Remote Sensing and Earth Observation Laboratory, University of Copenhagen, Copenhagen K, Denmark
Wenjun Wang
Wenjun Wang
Tianjin University
Data MiningSocial NetworkComplex NetworkSmart City
J
Joshua H. Meng
California PATH, University of California, Berkeley, CA, USA