Learning Ego-Centric BEV Representations from a Perspective-Privileged View: Cross-View Supervision for Online HD Map Construction

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the limitations of existing multi-camera bird’s-eye-view (BEV) methods, which rely on egocentric supervision and struggle to reliably reconstruct large-scale map structures under occlusion, sparse observations, and perspective distortion. To overcome these challenges, the authors propose a cross-view supervision (CVS) paradigm that introduces a top-down view as a “viewpoint-privileged” teacher model. Through feature-level knowledge distillation, geometric and topological priors are injected into the egocentric BEV encoder, aligning representations in a shared feature space to enhance structural consistency. Notably, this approach requires no modifications to the inference architecture or additional sensors. On the nuScenes benchmark, the method outperforms StreamMapNet by 3.9 mAP in the 60×30-meter region and by 9.9 mAP in the 100×50-meter region, achieving a 44% relative improvement in long-range performance.
📝 Abstract
Bird's-eye-view (BEV) representations derived from multi-camera input have become a central interface for online high-definition (HD) map construction. However, most approaches rely solely on ego-centric supervision, requiring large-scale scene structure to be inferred from incomplete observations, occlusions, and diminishing information density at long range, where perspective effects and spatial sparsity hinder consistent structural reasoning. We introduce Cross-View Supervision (CVS), a representation learning paradigm that transfers geometric and topological priors from an ego-aligned overhead perspective into camera-based BEV encoders. Rather than adding auxiliary semantic losses, CVS aligns representations in a shared BEV feature space and distills globally consistent structural knowledge from a perspective-privileged teacher into the ego-centric backbone. This supervision enhances structural coherence without modifying the inference architecture or requiring overhead input at test time. Experiments on nuScenes using ego-aligned aerial imagery from the AID4AD cross-view extension demonstrate consistent improvements over StreamMapNet while maintaining identical camera-only inference. CVS yields +3.9\,mAP in the standard $60\times30\,\mathrm{m}$ region and +9.9\,mAP in the extended $100\times50\,\mathrm{m}$ setting, corresponding to a 44\% relative gain at long range. These results highlight perspective-privileged structural supervision as a promising training principle for improving BEV representation learning in HD map construction.
Problem

Research questions and friction points this paper is trying to address.

BEV representation
online HD map construction
ego-centric supervision
structural reasoning
perspective-privileged view
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-View Supervision
BEV representation learning
HD map construction
perspective-privileged teacher
structural distillation
🔎 Similar Papers
No similar papers found.