🤖 AI Summary
This work addresses the challenge of temporal inconsistency and instability in existing query-based online HD map construction methods, which stem from random query initialization and implicit temporal modeling. To overcome these limitations, the authors propose an end-to-end framework that jointly achieves consistent vectorized map construction and instance tracking through semantic-aware query initialization, an explicit memory mechanism for historical raster maps, a history-guided module, and a short-term future motion prediction module. The proposed method significantly outperforms state-of-the-art approaches on both the nuScenes and Argoverse2 datasets, achieving substantial improvements in temporal consistency and geometric accuracy while maintaining efficient inference.
📝 Abstract
High-definition (HD) maps are crucial to autonomous driving, providing structured representations of road elements to support navigation and planning. However, existing query-based methods often employ random query initialization and depend on implicit temporal modeling, which lead to temporal inconsistencies and instabilities during the construction of a global map. To overcome these challenges, we introduce a novel end-to-end framework for consistent online HD vectorized map construction, which jointly performs map instance tracking and short-term prediction. First, we propose a Semantic-Aware Query Generator that initializes queries with spatially aligned semantic masks to capture scene-level context globally. Next, we design a History Rasterized Map Memory to store fine-grained instance-level maps for each tracked instance, enabling explicit historical priors. A History-Map Guidance Module then integrates rasterized map information into track queries, improving temporal continuity. Finally, we propose a Short-Term Future Guidance module to forecast the immediate motion of map instances based on the stored history trajectories. These predicted future locations serve as hints for tracked instances to further avoid implausible predictions and keep temporal consistency. Extensive experiments on the nuScenes and Argoverse2 datasets demonstrate that our proposed method outperforms state-of-the-art (SOTA) methods with good efficiency.