FlexMap: Generalized HD Map Construction from Flexible Camera Configurations

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the limited robustness of existing high-definition (HD) map construction methods, which rely on fixed multi-camera calibrations and explicit or implicit 2D-to-bird’s-eye-view transformations, often failing under sensor degradation or inconsistent cross-vehicle camera setups. To overcome this, the authors propose a novel mapping framework that adapts to arbitrary camera configurations without retraining. By eschewing explicit geometric projection, the method leverages a geometry-aware foundation model and cross-frame attention mechanisms to implicitly model 3D structure in feature space. A spatiotemporal disentanglement module and a decoder augmented with latent camera tokens further enable adaptive handling of missing views and varying camera arrangements. Experiments demonstrate consistent superiority over current approaches across diverse configurations, significantly enhancing robustness and facilitating practical deployment of HD maps in heterogeneous vehicle fleets.

Technology Category

Application Category

📝 Abstract

High-definition (HD) maps provide essential semantic information of road structures for autonomous driving systems, yet current HD map construction methods require calibrated multi-camera setups and either implicit or explicit 2D-to-BEV transformations, making them fragile when sensors fail or camera configurations vary across vehicle fleets. We introduce FlexMap, unlike prior methods that are fixed to a specific N-camera rig, our approach adapts to variable camera configurations without any architectural changes or per-configuration retraining. Our key innovation eliminates explicit geometric projections by using a geometry-aware foundation model with cross-frame attention to implicitly encode 3D scene understanding in feature space. FlexMap features two core components: a spatial-temporal enhancement module that separates cross-view spatial reasoning from temporal dynamics, and a camera-aware decoder with latent camera tokens, enabling view-adaptive attention without the need for projection matrices. Experiments demonstrate that FlexMap outperforms existing methods across multiple configurations while maintaining robustness to missing views and sensor variations, enabling more practical real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

HD map construction

camera configuration variability

sensor robustness

autonomous driving

BEV transformation

Innovation

Methods, ideas, or system contributions that make the work stand out.

FlexMap

geometry-aware foundation model

cross-frame attention