Forward-Only Continual Learning

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Catastrophic forgetting severely hinders the application of pretrained models in continual learning. Existing approaches rely on gradient-based backpropagation, entailing high computational overhead and resource consumption. This paper proposes FoRo—a forward-only, gradient-free continual learning framework—that enables efficient incremental learning without modifying the backbone model, via lightweight prompt tuning and knowledge encoding. FoRo introduces, for the first time, a gradient-free optimization paradigm that integrates the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) with nonlinear random projection and recursive least squares. It supports both incremental knowledge encoding and low-forgetting parameter updates. Experiments demonstrate that FoRo significantly reduces average forgetting rates and improves accuracy, while substantially decreasing memory footprint and runtime. Moreover, it maintains superior knowledge retention over long task sequences.

Technology Category

Application Category

📝 Abstract

Catastrophic forgetting remains a central challenge in continual learning (CL) with pre-trained models. While existing approaches typically freeze the backbone and fine-tune a small number of parameters to mitigate forgetting, they still rely on iterative error backpropagation and gradient-based optimization, which can be computationally intensive and less suitable for resource-constrained environments. To address this, we propose FoRo, a forward-only, gradient-free continual learning method. FoRo consists of a lightweight prompt tuning strategy and a novel knowledge encoding mechanism, both designed without modifying the pre-trained model. Specifically, prompt embeddings are inserted at the input layer and optimized using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which mitigates distribution shifts and extracts high-quality task representations. Subsequently, task-specific knowledge is encoded into a knowledge encoding matrix via nonlinear random projection and recursive least squares, enabling incremental updates to the classifier without revisiting prior data. Experiments show that FoRo significantly reduces average forgetting and improves accuracy. Thanks to forward-only learning, FoRo reduces memory usage and run time while maintaining high knowledge retention across long task sequences. These results suggest that FoRo could serve as a promising direction for exploring continual learning with pre-trained models, especially in real-world multimedia applications where both efficiency and effectiveness are critical.

Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in continual learning

Reduce computational intensity of gradient-based optimization

Enable efficient learning in resource-constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-only gradient-free continual learning method

CMA-ES optimized prompt embeddings without model modification

Nonlinear random projection for incremental classifier updates

🔎 Similar Papers

No similar papers found.