🤖 AI Summary
Monocular 3D lane detection faces significant challenges due to the absence of depth supervision and accurate camera intrinsic parameters. To address this, we propose a self-supervised dual-path framework: (1) it leverages self-supervised monocular depth estimation to generate point clouds, explicitly modeling scene geometry; and (2) it jointly extracts features from front-view and bird’s-eye-view representations while introducing a 3D lane anchor sampling mechanism. Crucially, our method is the first to perform frame-wise camera intrinsic parameter prediction and theory-driven piecewise curve fitting—eliminating reliance on ground-truth intrinsics or large-scale annotated depth data. Evaluated on OpenLane, our approach achieves state-of-the-art performance, demonstrating that learned intrinsic parameters can effectively substitute ground-truth ones. This significantly enhances robustness and practicality in real-world applications such as uncalibrated deployment and crowdsourced high-definition map construction.
📝 Abstract
Monocular 3D lane detection is essential for autonomous driving, but challenging due to the inherent lack of explicit spatial information. Multi-modal approaches rely on expensive depth sensors, while methods incorporating fully-supervised depth networks rely on ground-truth depth data that is impractical to collect at scale. Additionally, existing methods assume that camera parameters are available, limiting their applicability in scenarios like crowdsourced high-definition (HD) lane mapping. To address these limitations, we propose Depth3DLane, a novel dual-pathway framework that integrates self-supervised monocular depth estimation to provide explicit structural information, without the need for expensive sensors or additional ground-truth depth data. Leveraging a self-supervised depth network to obtain a point cloud representation of the scene, our bird's-eye view pathway extracts explicit spatial information, while our front view pathway simultaneously extracts rich semantic information. Depth3DLane then uses 3D lane anchors to sample features from both pathways and infer accurate 3D lane geometry. Furthermore, we extend the framework to predict camera parameters on a per-frame basis and introduce a theoretically motivated fitting procedure to enhance stability on a per-segment basis. Extensive experiments demonstrate that Depth3DLane achieves competitive performance on the OpenLane benchmark dataset. Furthermore, experimental results show that using learned parameters instead of ground-truth parameters allows Depth3DLane to be applied in scenarios where camera calibration is infeasible, unlike previous methods.