π€ AI Summary
This work proposes a semi-supervised online vectorization method grounded in geospatial consistency to address the heavy reliance on large-scale annotated data and poor scalability in high-definition map construction. By introducing a contrastive learning mechanism over overlapping regions from multiple trajectories, the approach enhances birdβs-eye-view (BEV) feature representations and incorporates an adaptive data partitioning strategy to effectively integrate limited labeled data with abundant unlabeled data. Experimental results demonstrate that the proposed method significantly outperforms fully supervised baselines on vectorized map perception tasks. Furthermore, PCA visualizations reveal markedly improved structural consistency in the BEV feature space, underscoring the efficacy of the geospatially coherent learning framework.
π Abstract
Autonomous vehicles rely on map information to understand the world around them. However, the creation and maintenance of offline high-definition (HD) maps remains costly. A more scalable alternative lies in online HD map construction, which only requires map annotations at training time. To further reduce the need for annotating vast training labels, self-supervised training provides an alternative. This work focuses on improving the latent birds-eye-view (BEV) feature grid representation within a vectorized online HD map construction model by enforcing geospatial consistency between overlapping BEV feature grids as part of a contrastive loss function. To ensure geospatial overlap for contrastive pairs, we introduce an approach to analyze the overlap between traversals within a given dataset and generate subsidiary dataset splits following adjustable multi-traversal requirements. We train the same model supervised using a reduced set of single-traversal labeled data and self-supervised on a broader unlabeled set of data following our multi-traversal requirements, effectively implementing a semi-supervised approach. Our approach outperforms the supervised baseline across the board, both quantitatively in terms of the downstream tasks vectorized map perception performance and qualitatively in terms of segmentation in the principal component analysis (PCA) visualization of the BEV feature space.