KD360-VoxelBEV: LiDAR and 360-degree Camera Cross Modality Knowledge Distillation for Bird's-Eye-View Segmentation

πŸ“… 2025-12-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited performance of single-360Β°-camera BEV segmentation, this paper proposes the first cross-modal knowledge distillation framework that transfers spatial and semantic knowledge from a LiDAR–360Β°-camera fusion teacher model to a lightweight camera-only student model. Methodologically, we introduce a novel LiDAR multi-channel (range/intensity/ambient) image fusion representation and a voxel-aligned view transformer to enable joint feature-level and output-level distillation. Key contributions include: (1) the first cross-modal distillation paradigm tailored for single-360Β°-camera BEV segmentation; (2) generalizability across diverse multi-sensor configurations; (3) on Dur360BEV, the teacher achieves a +25.6% IoU gain, while the student attains an +8.5% IoU improvement and runs at 31.2 FPS; and (4) validated cross-sensor generalization on KITTI-360.

Technology Category

Application Category

πŸ“ Abstract
We present the first cross-modality distillation framework specifically tailored for single-panoramic-camera Bird's-Eye-View (BEV) segmentation. Our approach leverages a novel LiDAR image representation fused from range, intensity and ambient channels, together with a voxel-aligned view transformer that preserves spatial fidelity while enabling efficient BEV processing. During training, a high-capacity LiDAR and camera fusion Teacher network extracts both rich spatial and semantic features for cross-modality knowledge distillation into a lightweight Student network that relies solely on a single 360-degree panoramic camera image. Extensive experiments on the Dur360BEV dataset demonstrate that our teacher model significantly outperforms existing camera-based BEV segmentation methods, achieving a 25.6% IoU improvement. Meanwhile, the distilled Student network attains competitive performance with an 8.5% IoU gain and state-of-the-art inference speed of 31.2 FPS. Moreover, evaluations on KITTI-360 (two fisheye cameras) confirm that our distillation framework generalises to diverse camera setups, underscoring its feasibility and robustness. This approach reduces sensor complexity and deployment costs while providing a practical solution for efficient, low-cost BEV segmentation in real-world autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

Develops cross-modality distillation for single-panoramic-camera BEV segmentation
Enhances lightweight student network performance using LiDAR-camera teacher features
Reduces sensor cost and complexity for efficient autonomous driving systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modality distillation for single-panoramic-camera BEV segmentation
Voxel-aligned view transformer preserves spatial fidelity
LiDAR image representation fused from range, intensity, ambient channels
πŸ”Ž Similar Papers
No similar papers found.