🤖 AI Summary
This work addresses the performance degradation of multimodal high-definition map prediction in challenging conditions—such as low illumination, occlusion, or sparse point clouds—where inconsistencies between camera and LiDAR modalities impair accuracy. To mitigate this, the authors propose SEF-MAP, a framework that decomposes bird’s-eye-view (BEV) features into four semantic subspaces: LiDAR-private, image-private, shared, and interactive, each processed by a dedicated expert network. An uncertainty-aware gating mechanism at the BEV level adaptively fuses these expert outputs, while a balancing regularizer prevents expert collapse. Additionally, a distribution-aware masking strategy enhances both robustness and specialization. Evaluated on nuScenes and Argoverse2, SEF-MAP achieves state-of-the-art results, surpassing existing methods by 4.2% and 4.8% mAP, respectively.
📝 Abstract
High-definition (HD) maps are essential for autonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlusions, or sparse point clouds. To address this, we propose SEFMAP, a Subspace-Expert Fusion framework for robust multimodal HD map prediction. The key idea is to explicitly disentangle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is assigned a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively combine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEFMAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAPprovides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.