🤖 AI Summary
Existing Transformer-based online high-definition (HD) map construction methods overlook intrinsic spatial and semantic relationships among map elements, leading to limited accuracy and generalization. This paper proposes RelMap, the first approach to introduce a class-aware spatial relation encoder that explicitly models geometric constraints between elements of different categories. Additionally, we design a semantic-aware Mixture-of-Experts (MoE) decoder to enable fine-grained, class-adaptive feature decoding. RelMap supports both single-frame and multi-frame temporal inputs and seamlessly integrates with mainstream Transformer backbones. Evaluated on nuScenes and Argoverse 2, it achieves state-of-the-art performance, significantly improving detection accuracy and topological reconstruction robustness for HD map elements.
📝 Abstract
Online high-definition (HD) map construction plays an increasingly important role in scaling autonomous driving systems. Transformer-based methods have become prevalent in online HD map construction; however, existing approaches often neglect the inherent spatial and semantic relationships among map elements, which limits their accuracy and generalization. To address this, we propose RelMap, an end-to-end framework that enhances online map construction by incorporating spatial relations and semantic priors. We introduce a Class-aware Spatial Relation Prior, which explicitly encodes relative positional dependencies between map elements using a learnable class-aware relation encoder. Additionally, we propose a Mixture-of-Experts (MoE)-based Semantic Prior, which routes features to class-specific experts based on predicted class probabilities, refining instance feature decoding. Our method is compatible with both single-frame and temporal perception backbones, achieving state-of-the-art performance on both the nuScenes and Argoverse 2 datasets.