SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the performance degradation of multimodal high-definition map prediction in challenging conditions—such as low illumination, occlusion, or sparse point clouds—where inconsistencies between camera and LiDAR modalities impair accuracy. To mitigate this, the authors propose SEF-MAP, a framework that decomposes bird’s-eye-view (BEV) features into four semantic subspaces: LiDAR-private, image-private, shared, and interactive, each processed by a dedicated expert network. An uncertainty-aware gating mechanism at the BEV level adaptively fuses these expert outputs, while a balancing regularizer prevents expert collapse. Additionally, a distribution-aware masking strategy enhances both robustness and specialization. Evaluated on nuScenes and Argoverse2, SEF-MAP achieves state-of-the-art results, surpassing existing methods by 4.2% and 4.8% mAP, respectively.

Technology Category

Application Category

📝 Abstract

High-definition (HD) maps are essential for autonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlusions, or sparse point clouds. To address this, we propose SEFMAP, a Subspace-Expert Fusion framework for robust multimodal HD map prediction. The key idea is to explicitly disentangle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is assigned a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively combine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEFMAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAPprovides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.

Problem

Research questions and friction points this paper is trying to address.

multimodal fusion

HD map prediction

modality inconsistency

robustness

autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subspace-Decomposed Expert Fusion

Uncertainty-Aware Gating

Distribution-Aware Masking