JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address model redundancy and excessive storage and training overhead when jointly deploying depth estimation and scene segmentation in intelligent transportation systems, this paper proposes a unified modeling framework based on adaptive multi-teacher knowledge distillation. Methodologically: (1) an adaptive distillation mechanism is designed to dynamically weight and fuse task-specific knowledge from multiple teachers; (2) knowledge trajectory modeling and a trajectory distillation loss are introduced to mitigate gradient conflict and catastrophic forgetting. The approach integrates multi-task learning, gradient direction optimization, and trajectory-based memory modeling. Evaluated on Cityscapes and NYU-v2, the method significantly outperforms state-of-the-art approaches: it achieves higher accuracy while reducing model parameters by 32%, training memory consumption by 27%, and inference latency by 21%.

Technology Category

Application Category

📝 Abstract

Depth estimation and scene segmentation are two important tasks in intelligent transportation systems. A joint modeling of these two tasks will reduce the requirement for both the storage and training efforts. This work explores how the multi-task distillation could be used to improve such unified modeling. While existing solutions transfer multiple teachers' knowledge in a static way, we propose a self-adaptive distillation method that can dynamically adjust the knowledge amount from each teacher according to the student's current learning ability. Furthermore, as multiple teachers exist, the student's gradient update direction in the distillation is more prone to be erroneous where knowledge forgetting may occur. To avoid this, we propose a knowledge trajectory to record the most essential information that a model has learnt in the past, based on which a trajectory-based distillation loss is designed to guide the student to follow the learning curve similarly in a cost-effective way. We evaluate our method on multiple benchmarking datasets including Cityscapes and NYU-v2. Compared to the state-of-the-art solutions, our method achieves a clearly improvement. The code is provided in the supplementary materials.

Problem

Research questions and friction points this paper is trying to address.

Adaptive distillation for joint depth and segmentation tasks

Dynamic teacher knowledge transfer based on student ability

Trajectory-based loss to prevent knowledge forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-adaptive distillation adjusts teacher knowledge dynamically

Knowledge trajectory records essential past learning information

Trajectory-based distillation loss guides student learning effectively

🔎 Similar Papers

No similar papers found.

Authors to Follow