🤖 AI Summary
Continual multi-task learning (CMTL)—where models learn multiple tasks sequentially over a shared data stream, as in autonomous driving and medical imaging—suffers from task interference and catastrophic forgetting due to task-specific representations.
Method: We propose Learning-with-Preservation (LwP), the first framework introducing Dynamic Weighted Distance Preservation (DWDP) loss. DWDP enforces pairwise distance regularization in the latent space via end-to-end optimization, preserving the geometric consistency of shared representations and mitigating representational drift—without requiring replay buffers or task-specific heads.
Contribution/Results: LwP enables concurrent multi-task learning and achieves state-of-the-art performance on sequential and image-based benchmarks. It significantly alleviates forgetting, outperforms all existing CMTL methods, and is the first replay-free continual learning approach to surpass independent single-task training baselines. Moreover, it demonstrates superior robustness to distribution shifts.
📝 Abstract
Artificial intelligence systems in critical fields like autonomous driving and medical imaging analysis often continually learn new tasks using a shared stream of input data. For instance, after learning to detect traffic signs, a model may later need to learn to classify traffic lights or different types of vehicles using the same camera feed. This scenario introduces a challenging setting we term Continual Multitask Learning (CMTL), where a model sequentially learns new tasks on an underlying data distribution without forgetting previously learned abilities. Existing continual learning methods often fail in this setting because they learn fragmented, task-specific features that interfere with one another. To address this, we introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the geometric structure of the shared representation space. The core of LwP is a Dynamically Weighted Distance Preservation (DWDP) loss that prevents representation drift by regularizing the pairwise distances between latent data representations. This mechanism of preserving the underlying geometric structure allows the model to retain implicit knowledge and support diverse tasks without requiring a replay buffer, making it suitable for privacy-conscious applications. Extensive evaluations on time-series and image benchmarks show that LwP not only mitigates catastrophic forgetting but also consistently outperforms state-of-the-art baselines in CMTL tasks. Notably, our method shows superior robustness to distribution shifts and is the only approach to surpass the strong single-task learning baseline, underscoring its effectiveness for real-world dynamic environments.