🤖 AI Summary
To address key challenges in online vectorized high-definition map construction—including coarse spatiotemporal modeling, semantic-geometric misalignment, and distribution mismatch between classification and regression branches—this paper proposes a local-to-global spatiotemporal interaction framework built upon the DETR architecture. Our method introduces three core innovations: (1) explicit point-to-instance positional priors integrated with geometric shape priors; (2) a hierarchical keyframe temporal fusion module to enhance inter-frame temporal consistency; and (3) a geometry-aware classification loss and matching cost to jointly optimize semantic classification and geometric regression. Evaluated on nuScenes and Argoverse2, our approach achieves state-of-the-art performance, significantly improving both vectorization accuracy and robustness under dynamic and occluded conditions.
📝 Abstract
Vectorized high-definition (HD) maps are essential for an autonomous driving system. Recently, state-of-the-art map vectorization methods are mainly based on DETR-like framework to generate HD maps in an end-to-end manner. In this paper, we propose InteractionMap, which improves previous map vectorization methods by fully leveraging local-to-global information interaction in both time and space. Firstly, we explore enhancing DETR-like detectors by explicit position relation prior from point-level to instance-level, since map elements contain strong shape priors. Secondly, we propose a key-frame-based hierarchical temporal fusion module, which interacts temporal information from local to global. Lastly, the separate classification branch and regression branch lead to the problem of misalignment in the output distribution. We interact semantic information with geometric information by introducing a novel geometric-aware classification loss in optimization and a geometric-aware matching cost in label assignment. InteractionMap achieves state-of-the-art performance on both nuScenes and Argoverse2 benchmarks.