Forward-Only Continual Learning

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Catastrophic forgetting severely hinders the application of pretrained models in continual learning. Existing approaches rely on gradient-based backpropagation, entailing high computational overhead and resource consumption. This paper proposes FoRo—a forward-only, gradient-free continual learning framework—that enables efficient incremental learning without modifying the backbone model, via lightweight prompt tuning and knowledge encoding. FoRo introduces, for the first time, a gradient-free optimization paradigm that integrates the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) with nonlinear random projection and recursive least squares. It supports both incremental knowledge encoding and low-forgetting parameter updates. Experiments demonstrate that FoRo significantly reduces average forgetting rates and improves accuracy, while substantially decreasing memory footprint and runtime. Moreover, it maintains superior knowledge retention over long task sequences.

Technology Category

Application Category

📝 Abstract
Catastrophic forgetting remains a central challenge in continual learning (CL) with pre-trained models. While existing approaches typically freeze the backbone and fine-tune a small number of parameters to mitigate forgetting, they still rely on iterative error backpropagation and gradient-based optimization, which can be computationally intensive and less suitable for resource-constrained environments. To address this, we propose FoRo, a forward-only, gradient-free continual learning method. FoRo consists of a lightweight prompt tuning strategy and a novel knowledge encoding mechanism, both designed without modifying the pre-trained model. Specifically, prompt embeddings are inserted at the input layer and optimized using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which mitigates distribution shifts and extracts high-quality task representations. Subsequently, task-specific knowledge is encoded into a knowledge encoding matrix via nonlinear random projection and recursive least squares, enabling incremental updates to the classifier without revisiting prior data. Experiments show that FoRo significantly reduces average forgetting and improves accuracy. Thanks to forward-only learning, FoRo reduces memory usage and run time while maintaining high knowledge retention across long task sequences. These results suggest that FoRo could serve as a promising direction for exploring continual learning with pre-trained models, especially in real-world multimedia applications where both efficiency and effectiveness are critical.
Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in continual learning
Reduce computational intensity of gradient-based optimization
Enable efficient learning in resource-constrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-only gradient-free continual learning method
CMA-ES optimized prompt embeddings without model modification
Nonlinear random projection for incremental classifier updates
🔎 Similar Papers
No similar papers found.
J
Jiao Chen
Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, Guangdong, China
J
Jiayi He
Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, Guangdong, China
F
Fangfang Chen
Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, Guangdong, China
Z
Zuohong Lv
Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, Guangdong, China
Jianhua Tang
Jianhua Tang
Shien-Ming Wu School of Intelligent Engineering, South China University of Technology
6GEdge ComputingNetwork SlicingIndustrial Internet of ThingsIndustrial AI