๐ค AI Summary
To address the high parameter count and computational overhead of multi-task autonomous driving panoptic perception models (joint object detection, drivable area segmentation, and lane line segmentation) for onboard deployment, this paper proposes a lightweight framework integrating task-aware safe pruning and head-free feature distillation. We innovatively introduce a gradient conflict penalty to guide channel importance estimation via Taylor expansion, enabling precise retention of critical channels. Additionally, task-agnostic intermediate-layer feature distillation mitigates performance degradation induced by pruning. Evaluated on BDD100K, our method reduces model parameters by 32.7% with negligible accuracy loss in drivable area and lane line segmentation, a marginal drop in detection mAP, and an inference speed of 32.7 FPSโsatisfying real-time deployment requirements.
๐ Abstract
Autonomous driving systems rely on panoptic perception to jointly handle object detection, drivable area segmentation, and lane line segmentation. Although multi-task learning is an effective way to integrate these tasks, its increasing model parameters and complexity make deployment on on-board devices difficult. To address this challenge, we propose a multi-task model compression framework that combines task-aware safe pruning with feature-level knowledge distillation. Our safe pruning strategy integrates Taylor-based channel importance with gradient conflict penalty to keep important channels while removing redundant and conflicting channels. To mitigate performance degradation after pruning, we further design a task head-agnostic distillation method that transfers intermediate backbone and encoder features from a teacher to a student model as guidance. Experiments on the BDD100K dataset demonstrate that our compressed model achieves a 32.7% reduction in parameters while segmentation performance shows negligible accuracy loss and only a minor decrease in detection (-1.2% for Recall and -1.8% for mAP50) compared to the teacher. The compressed model still runs at 32.7 FPS in real-time. These results show that combining pruning and knowledge distillation provides an effective compression solution for multi-task panoptic perception.