SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of multimodal high-definition map prediction in challenging conditions—such as low illumination, occlusion, or sparse point clouds—where inconsistencies between camera and LiDAR modalities impair accuracy. To mitigate this, the authors propose SEF-MAP, a framework that decomposes bird’s-eye-view (BEV) features into four semantic subspaces: LiDAR-private, image-private, shared, and interactive, each processed by a dedicated expert network. An uncertainty-aware gating mechanism at the BEV level adaptively fuses these expert outputs, while a balancing regularizer prevents expert collapse. Additionally, a distribution-aware masking strategy enhances both robustness and specialization. Evaluated on nuScenes and Argoverse2, SEF-MAP achieves state-of-the-art results, surpassing existing methods by 4.2% and 4.8% mAP, respectively.

Technology Category

Application Category

📝 Abstract
High-definition (HD) maps are essential for autonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlusions, or sparse point clouds. To address this, we propose SEFMAP, a Subspace-Expert Fusion framework for robust multimodal HD map prediction. The key idea is to explicitly disentangle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is assigned a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively combine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEFMAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAPprovides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.
Problem

Research questions and friction points this paper is trying to address.

multimodal fusion
HD map prediction
modality inconsistency
robustness
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subspace-Decomposed Expert Fusion
Uncertainty-Aware Gating
Distribution-Aware Masking
Multimodal HD Map Prediction
BEV Feature Disentanglement
🔎 Similar Papers
No similar papers found.
H
Haoxiang Fu
National University of Singapore
Lingfeng Zhang
Lingfeng Zhang
PhD student at Tsinghua University
embodied ai
H
Hao Li
Independent Researcher
R
Ruibing Hu
Chinese University of Hong Kong
Z
Zhengrong Li
The University of Manchester
G
Guanjing Liu
Renmin University of China
Z
Zimu Tan
Independent Researcher
L
Long Chen
Xiaomi EV
H
Hangjun Ye
Xiaomi EV
Xiaoshuai Hao
Xiaoshuai Hao
Beijing Academy of Artificial Intelligence,BAAI
vision and language