π€ AI Summary
To address the insufficient robustness of road segmentation in winter rural and suburban scenes, this paper proposes a trajectory-driven cross-modal self-supervised road annotation method. The approach jointly leverages temporally aligned LiDAR and camera data, integrating multimodal feature alignment, trajectory-constrained spatiotemporal consistency modeling, and self-supervised contrastive learning to generate high-confidence road masks without manual annotations. It introduces the first trajectory-guided LiDARβcamera joint self-supervised learning framework, significantly enhancing generalization under out-of-distribution conditions such as snow-covered surfaces and low-contrast scenes. On a dedicated winter rural/suburban test set, the method achieves a 6.2% improvement in mean Intersection-over-Union (mIoU) over state-of-the-art single-modal approaches. The implementation code is publicly available.
π Abstract
Robust road segmentation in all road conditions is required for safe autonomous driving and advanced driver assistance systems. Supervised deep learning methods provide accurate road segmentation in the domain of their training data but cannot be trusted in out-of-distribution scenarios. Including the whole distribution in the trainset is challenging as each sample must be labeled by hand. Trajectory-based self-supervised methods offer a potential solution as they can learn from the traversed route without manual labels. However, existing trajectory-based methods use learning schemes that rely only on the camera or only on the lidar. In this paper, trajectory-based learning is implemented jointly with lidar and camera for increased performance. Our method outperforms recent standalone camera- and lidar-based methods when evaluated with a challenging winter driving dataset including countryside and suburb driving scenes. The source code is available at https://github.com/eerik98/lidar-camera-road-autolabeling.git