🤖 AI Summary
The core challenge in lifelong learning is to mitigate catastrophic forgetting of previously learned tasks while simultaneously enhancing both backward transfer (improved performance on past tasks) and forward transfer (generalization to unseen future tasks). This paper proposes a representation ensembling method that requires neither memory replay, explicit regularization, nor architectural expansion—yet, for the first time, systematically demonstrates natural support for bidirectional transfer. By leveraging multi-task representation sharing and cross-task feature reweighting, the approach operates efficiently under both computationally constrained and unconstrained settings. Extensive evaluation across multimodal benchmarks—including CIFAR-100, Split Mini-ImageNet, Food1K, the 5-dataset suite, and speech-digit recognition—shows consistent and significant improvements over state-of-the-art continual learning methods, achieving stable gains in both forward and backward transfer performance.
📝 Abstract
In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be to use data to improve performance on both future tasks (forward transfer) and past tasks (backward transfer). In this paper, we show that a simple approach -- representation ensembling -- demonstrates both forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, vision (CIFAR-100, 5-dataset, Split Mini-Imagenet, and Food1k), and speech (spoken digit), in contrast to various reference algorithms, which typically failed to transfer either forward or backward, or both. Moreover, our proposed approach can flexibly operate with or without a computational budget.