HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of catastrophic forgetting across heterogeneous dense prediction tasks—characterized by differing output structures—in lifelong learning scenarios. To this end, it formally introduces the "lifelong heterogeneous learning" setting and proposes a novel sample-replay-free, heterogeneity-aware self-distillation approach. The method employs a dual-branch distillation loss: one branch preserves the global output distribution to maintain task-agnostic knowledge, while the other leverages Sobel operators to capture edge saliency and retain local structural information. Extensive experiments demonstrate that the proposed method significantly outperforms existing techniques across multiple lifelong heterogeneous dense prediction benchmarks, effectively mitigating knowledge forgetting between structurally diverse tasks.
📝 Abstract
Lifelong learning aims to preserve knowledge acquired from previous tasks while incorporating knowledge from a sequence of new tasks. However, most prior work explores only streams of homogeneous tasks (\textit{e.g.}, only classification tasks) and neglects the scenario of learning across heterogeneous tasks that possess different structures of outputs. In this work, we formalize this broader setting as lifelong heterogeneous learning (LHL). Departing from conventional lifelong learning, the task sequence of LHL spans different task types, and the learner needs to retain heterogeneous knowledge for different output space structures. To instantiate the LHL, we focus on LHL in the context of dense prediction (LHL4DP), a realistic and challenging scenario. To this end, we propose the Heterogeneity-Aware Distillation (HAD) method, an exemplar-free approach that preserves previously gained heterogeneous knowledge by self-distillation in each training phase. The proposed HAD comprises two complementary components, including a distribution-balanced heterogeneity-aware distillation loss to alleviate the global imbalance of prediction distribution and a salience-guided heterogeneity-aware distillation loss that concentrates learning on informative edge pixels extracted with the Sobel operator. Extensive experiments demonstrate that the proposed HAD method significantly outperforms existing methods in this new scenario.
Problem

Research questions and friction points this paper is trying to address.

lifelong learning
heterogeneous tasks
output space structures
knowledge retention
dense prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifelong Heterogeneous Learning
Heterogeneity-Aware Distillation
Exemplar-Free
Self-Distillation
Dense Prediction
🔎 Similar Papers
No similar papers found.