More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Continual learning faces a fundamental trade-off among plasticity, stability, and computational efficiency—particularly challenging on memory-constrained edge devices. To address this, we propose ZO-FC, the first method to integrate zeroth-order (ZO) optimization into adapter module training. Leveraging ZO’s robustness to flat loss landscapes, ZO-FC mitigates catastrophic forgetting while retaining a lightweight first-order classifier for rapid adaptation to new tasks. Crucially, only adapter parameters are fine-tuned, incurring negligible memory overhead. We provide theoretical analysis characterizing ZO’s inherent trade-off between stability and plasticity. Extensive experiments on standard continual learning benchmarks demonstrate that ZO-FC significantly outperforms pure ZO, conventional first-order, and state-of-the-art parameter-efficient fine-tuning (PEFT) methods—achieving both low forgetting rates and high task adaptation efficiency. The approach is especially suited for resource-limited deployment scenarios.

Technology Category

Application Category

📝 Abstract
Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods, particularly in settings where gradient computation is expensive or even impractical. Beyond its memory efficiency, in this work, we investigate ZO optimization for continual learning (CL) as a novel approach to address the plasticity-stability-efficiency trilemma. Through theoretical analysis and empirical evidence, we show that ZO optimization naturally leads to flatter loss landscapes, which in turn reduce forgetting in CL. However, this stability comes at a cost of plasticity: due to its imprecise gradient estimates and slower convergence, ZO optimization tends to be less effective than FO in acquiring new task-specific knowledge, particularly under constrained training budgets. To better understand this trade-off, we conduct a holistic evaluation of ZO optimization applied to various existing CL methods. Our findings reveal that ZO optimization enhances stability but often undermines plasticity, particularly when used with learnable classifiers. Motivated by this insight, we propose ZO-FC, a simple but effective approach that applies ZO optimization to a single adapter-based PEFT module with FO optimized classifier. This design leverages the stability benefits of ZO while preserving the adaptability of FO updates with negligible memory overhead. Experiments demonstrate that ZO-FC achieves an effective balance between stability and plasticity, offering a practical and memory-efficient solution for on-device CL.
Problem

Research questions and friction points this paper is trying to address.

ZO optimization mitigates forgetting in continual learning
ZO methods enhance stability but reduce plasticity
Proposed ZO-FC balances stability-plasticity trade-off efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-order optimization reduces forgetting via flatter loss landscapes
ZO-FC combines ZO optimization with first-order classifier updates
Adapter-based PEFT module enables memory-efficient continual learning
🔎 Similar Papers
No similar papers found.
W
Wanhao Yu
Department of Computer Science at University of North Carolina at Charlotte
Z
Zheng Wang
Department of Computer Science at University of Houston
Shuteng Niu
Shuteng Niu
Departmen of Artificial Intelligence & Informatics, Mayo Clinic
Transfer LearningGraph Representation LearningBiomedical Informatics
S
Sen Lin
Department of Computer Science at University of Houston
L
Li Yang
Department of Computer Science at University of North Carolina at Charlotte