RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the demand for holistic perception in autonomous driving, this paper proposes a lightweight, real-time multi-task Transformer model that jointly performs object detection, drivable area segmentation, and lane line segmentation. To tackle feature coupling across tasks and train-inference inconsistency, we introduce a novel gated adaptation module that enables adaptive fusion of shared and task-specific features, and design an adaptive multi-scale segmentation decoder for dynamic feature weighting. Additionally, a structured lane line loss is incorporated to mitigate training-inference discrepancy. Evaluated on BDD100K, the model achieves 84.9% mAP₅₀ for detection, 92.6% mIoU for drivable area segmentation, and 56.8% IoU for lane line segmentation, while running at 32.6 FPS on standard hardware. Extensive experiments demonstrate strong robustness in real-world driving scenarios.

Technology Category

Application Category

📝 Abstract
Autonomous driving systems rely on panoptic driving perception that requires both precision and real-time performance. In this work, we propose RMT-PPAD, a real-time, transformer-based multi-task model that jointly performs object detection, drivable area segmentation, and lane line segmentation. We introduce a lightweight module, a gate control with an adapter to adaptively fuse shared and task-specific features, effectively alleviating negative transfer between tasks. Additionally, we design an adaptive segmentation decoder to learn the weights over multi-scale features automatically during the training stage. This avoids the manual design of task-specific structures for different segmentation tasks. We also identify and resolve the inconsistency between training and testing labels in lane line segmentation. This allows fairer evaluation. Experiments on the BDD100K dataset demonstrate that RMT-PPAD achieves state-of-the-art results with mAP50 of 84.9% and Recall of 95.4% for object detection, mIoU of 92.6% for drivable area segmentation, and IoU of 56.8% and accuracy of 84.7% for lane line segmentation. The inference speed reaches 32.6 FPS. Moreover, we introduce real-world scenarios to evaluate RMT-PPAD performance in practice. The results show that RMT-PPAD consistently delivers stable performance. The source codes and pre-trained models are released at https://github.com/JiayuanWang-JW/RMT-PPAD.
Problem

Research questions and friction points this paper is trying to address.

Real-time panoptic perception for autonomous driving
Multi-task learning for object detection and segmentation
Addressing negative transfer between tasks adaptively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based multi-task model for panoptic perception
Lightweight gate control with adaptive feature fusion
Adaptive segmentation decoder for multi-scale features
🔎 Similar Papers
No similar papers found.
Jiayuan Wang
Jiayuan Wang
University of Windsor
Multi-task LearningMedical ImagingAutonomous DrivingConnected Vehicles
Q
Q. M. Jonathan Wu
Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4, Canada
K
Katsuya Suto
Graduate School of Information Science and Technology at Hokkaido University
N
Ning Zhang
Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4, Canada