Unified Map Prior Encoder for Mapping and Planning

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This work addresses the challenge that existing end-to-end autonomous driving systems overly rely on sensor inputs and struggle to effectively fuse heterogeneous map priors—such as vector maps, raster maps, and satellite imagery—that exhibit inconsistent availability and pose drift during testing. To this end, the authors propose a Unified Map Prior Encoder (UMPE) featuring a dual-branch architecture for geometry-aware alignment and fusion: the vector branch employs SE(2) pre-alignment with confidence-weighted cross-attention, while the raster branch integrates a FiLM-conditioned ResNet-18 with zero-initialized residual fusion. UMPE is the first framework to enable unified encoding of arbitrary combinations of map priors and demonstrates power-set robustness—achieving superior performance even when tested with only a single prior despite being trained with all available priors. Experiments on nuScenes and Argoverse2 show consistent improvements in mAP (+5.9/+5.3 and +4.1) for MapTRv2/MapQR, along with reduced trajectory error (−0.30 m) and collision rate (−0.10%) in end-to-end planning.
📝 Abstract
Online mapping and end-to-end (E2E) planning in autonomous driving remain largely sensor-centric, leaving rich map priors, including HD/SD vector maps, rasterized SD maps, and satellite imagery, underused because of heterogeneity, pose drift, and inconsistent availability at test time. We present UMPE, a Unified Map Prior Encoder that can ingest any subset of four priors and fuse them with BEV features for both mapping and planning. UMPE has two branches. The vector encoder pre-aligns HD/SD polylines with a frame-wise SE(2) correction, encodes points via multi-frequency sinusoidal features, and produces polyline tokens with confidence scores. BEV queries then apply cross-attention with confidence bias, followed by normalized channel-wise gating to avoid length imbalance and softly down-weight uncertain sources. The raster encoder shares a ResNet-18 backbone conditioned by FiLM with scaling and shift at every stage, performs SE(2) micro-alignment, and injects priors through zero-initialized residual fusion, so the network starts from a do-no-harm baseline and learns to add only useful prior evidence. A vector-then-raster fusion order reflects the inductive bias of geometry first, appearance second. On nuScenes mapping, UMPE lifts MapTRv2 from 61.5 to 67.4 mAP (+5.9) and MapQR from 66.4 to 71.7 mAP (+5.3). On Argoverse2, UMPE adds +4.1 mAP over strong baselines. UMPE is compositional: when trained with all priors, it outperforms single-prior models even when only one prior is available at test time, demonstrating powerset robustness. For E2E planning with the VAD backbone on nuScenes, UMPE reduces trajectory error from 0.72 to 0.42 m L2 on average (-0.30 m) and collision rate from 0.22% to 0.12% (-0.10%), surpassing recent prior-injection methods. These results show that a unified, alignment-aware treatment of heterogeneous map priors yields better mapping and better planning.
Problem

Research questions and friction points this paper is trying to address.

map priors
autonomous driving
heterogeneity
pose drift
inconsistent availability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Map Prior Encoder
heterogeneous map fusion
SE(2)-aware alignment
compositional robustness
end-to-end planning
Z
Zongzheng Zhang
Institute for AI Industry Research (AIR), Tsinghua University
S
Sizhe Zou
Institute for AI Industry Research (AIR), Tsinghua University
G
Guantian Zheng
Institute for AI Industry Research (AIR), Tsinghua University
Zhenxin Zhu
Zhenxin Zhu
Xiaomi AD
AIGCNeRF
Yu Gao
Yu Gao
Unknown affiliation
AlgorithmsData structures
Guoxuan Chi
Guoxuan Chi
Tsinghua University
Mobile ComputingWireless SensingSpatial Intelligence
S
Shuo Wang
Bosch Corporate Research, China
Y
Yuwen Heng
Bosch Corporate Research, China
Z
Zhigang Sun
Bosch Corporate Research, China
Yiru Wang
Yiru Wang
University of Pittsburgh
Econometrics
H
Hao Sun
Bosch Corporate Research, China
Chao Ma
Chao Ma
Professor, Shanghai Jiao Tong University
Computer visionMachine learningImage processing
Zhen Li
Zhen Li
Assistant Professor, the Chinese University of Hong Kong, Shenzhen (CUHKSZ)
Deep Learning3D VisionPoint Cloud AnalysisProtein Structure PredictionComputational Biology
A
Anqing Jiang
Bosch Corporate Research, China
Hao Zhao
Hao Zhao
Tsinghua University
Computer Vision