PMF-CL: Pareto-Minimal-Forgetting Continual Learner for Conflicting Tasks

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses catastrophic forgetting in continual learning caused by task conflicts, particularly in realistic scenarios where no global common optimum exists. It introduces Pareto optimality into continual learning for the first time, proposing a “Pareto-minimal forgetting” principle. Leveraging a Pareto optimization approach based on quadratic upper-bound loss functions, the framework supports linear models, basis function regression, and logistic regression, while enabling efficient iterative updates with only O(d²) static memory overhead. Experimental results demonstrate that the method significantly mitigates forgetting on sequences of conflicting tasks and achieves near-theoretically-optimal performance retention with low memory cost.

📝 Abstract

In the literature, many continual learning (CL) algorithms have been proposed to address the issue of catastrophic forgetting in ML models (i.e., learning new tasks leads to the loss of performance on previously learned tasks). Although all CL approaches use some form of memory to retain information about past tasks, a grounded understanding of what information needs to be stored to minimize catastrophic forgetting remains elusive. Recently, it has been recognized that under the strong assumption of the existence of a common global minimizer over all tasks, catastrophic forgetting can be completely avoided. However, in practice, tasks rarely have a common global minimizer, and a certain amount of forgetting is inevitable. In this paper, we propose a foundational framework for principled and systematic CL of conflicting tasks using a multi-task learning (MTL) perspective. The approach is based on finding Pareto-optimal solutions, i.e., the solutions which, by definition, minimally forget the previous tasks in the Pareto sense. We derive Pareto-minimal-forgetting CL algorithms for linear and basis-function regression, and general loss functions which have a quadratic upper bound, e.g., logistic regression. For quadratic problems, PMF-CL uses memory-efficient iterative updates with a static memory footage of $\mathcal{O}(d^2)$ for models with $d$ parameters.

Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting

continual learning

conflicting tasks

Pareto optimality

multi-task learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pareto-optimal

continual learning

catastrophic forgetting