BridgeTA: Bridging the Representation Gap in Knowledge Distillation via Teacher Assistant for Bird's Eye View Map Segmentation

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Pure-vision bird’s-eye-view (BEV) map segmentation lags behind LiDAR-camera (LC) fusion models, and existing knowledge distillation (KD) methods often enhance student performance by enlarging the student architecture—sacrificing inference efficiency. Method: We propose a lightweight Teacher Assistant (TA) network that bridges the representational gap between an LC teacher and a pure-vision student without modifying the student’s architecture or increasing its inference cost. The TA establishes a shared latent space and introduces a theoretically grounded dual-path distillation loss derived from Young’s inequality, enabling stable and efficient joint transfer of feature-level and output-level knowledge. Results: On nuScenes, our method improves the pure-vision baseline by +4.2% mIoU, achieving 45% of the performance gain attained by the current state-of-the-art KD approach—significantly advancing the practicality of efficient pure-vision BEV perception.

Technology Category

Application Category

📝 Abstract

Bird's-Eye-View (BEV) map segmentation is one of the most important and challenging tasks in autonomous driving. Camera-only approaches have drawn attention as cost-effective alternatives to LiDAR, but they still fall behind LiDAR-Camera (LC) fusion-based methods. Knowledge Distillation (KD) has been explored to narrow this gap, but existing methods mainly enlarge the student model by mimicking the teacher's architecture, leading to higher inference cost. To address this issue, we introduce BridgeTA, a cost-effective distillation framework to bridge the representation gap between LC fusion and Camera-only models through a Teacher Assistant (TA) network while keeping the student's architecture and inference cost unchanged. A lightweight TA network combines the BEV representations of the teacher and student, creating a shared latent space that serves as an intermediate representation. To ground the framework theoretically, we derive a distillation loss using Young's Inequality, which decomposes the direct teacher-student distillation path into teacher-TA and TA-student dual paths, stabilizing optimization and strengthening knowledge transfer. Extensive experiments on the challenging nuScenes dataset demonstrate the effectiveness of our method, achieving an improvement of 4.2% mIoU over the Camera-only baseline, up to 45% higher than the improvement of other state-of-the-art KD methods.

Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap between LiDAR-Camera and Camera-only BEV segmentation

Reducing knowledge distillation cost without changing student architecture

Stabilizing optimization through teacher assistant dual-path framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher Assistant network bridges representation gap

Lightweight TA creates shared latent BEV space

Young's Inequality loss stabilizes dual-path distillation

🔎 Similar Papers

Attention-guided Feature Distillation for Semantic Segmentation