JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address model redundancy and excessive storage and training overhead when jointly deploying depth estimation and scene segmentation in intelligent transportation systems, this paper proposes a unified modeling framework based on adaptive multi-teacher knowledge distillation. Methodologically: (1) an adaptive distillation mechanism is designed to dynamically weight and fuse task-specific knowledge from multiple teachers; (2) knowledge trajectory modeling and a trajectory distillation loss are introduced to mitigate gradient conflict and catastrophic forgetting. The approach integrates multi-task learning, gradient direction optimization, and trajectory-based memory modeling. Evaluated on Cityscapes and NYU-v2, the method significantly outperforms state-of-the-art approaches: it achieves higher accuracy while reducing model parameters by 32%, training memory consumption by 27%, and inference latency by 21%.

Technology Category

Application Category

📝 Abstract
Depth estimation and scene segmentation are two important tasks in intelligent transportation systems. A joint modeling of these two tasks will reduce the requirement for both the storage and training efforts. This work explores how the multi-task distillation could be used to improve such unified modeling. While existing solutions transfer multiple teachers' knowledge in a static way, we propose a self-adaptive distillation method that can dynamically adjust the knowledge amount from each teacher according to the student's current learning ability. Furthermore, as multiple teachers exist, the student's gradient update direction in the distillation is more prone to be erroneous where knowledge forgetting may occur. To avoid this, we propose a knowledge trajectory to record the most essential information that a model has learnt in the past, based on which a trajectory-based distillation loss is designed to guide the student to follow the learning curve similarly in a cost-effective way. We evaluate our method on multiple benchmarking datasets including Cityscapes and NYU-v2. Compared to the state-of-the-art solutions, our method achieves a clearly improvement. The code is provided in the supplementary materials.
Problem

Research questions and friction points this paper is trying to address.

Adaptive distillation for joint depth and segmentation tasks
Dynamic teacher knowledge transfer based on student ability
Trajectory-based loss to prevent knowledge forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-adaptive distillation adjusts teacher knowledge dynamically
Knowledge trajectory records essential past learning information
Trajectory-based distillation loss guides student learning effectively
🔎 Similar Papers
No similar papers found.
T
Tiancong Cheng
Northwestern Polytechnical University
Y
Ying Zhang
Northwestern Polytechnical University
Yuxuan Liang
Yuxuan Liang
Assistant Professor, Hong Kong University of Science and Technology (Guangzhou)
Spatio-Temporal Data MiningUrban ComputingUrban AIFoundation ModelsTime Series
Roger Zimmermann
Roger Zimmermann
Professor of Computer Science, National University of Singapore
MultimediaMultimedia SystemsStreaming MediaGeospatial AnalyticsMachine Learning Applications
Z
Zhiwen Yu
Northwestern Polytechnical University, Harbin Engineering University
B
Bin Guo
Northwestern Polytechnical University