More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Continual learning faces a fundamental trade-off among plasticity, stability, and computational efficiency—particularly challenging on memory-constrained edge devices. To address this, we propose ZO-FC, the first method to integrate zeroth-order (ZO) optimization into adapter module training. Leveraging ZO’s robustness to flat loss landscapes, ZO-FC mitigates catastrophic forgetting while retaining a lightweight first-order classifier for rapid adaptation to new tasks. Crucially, only adapter parameters are fine-tuned, incurring negligible memory overhead. We provide theoretical analysis characterizing ZO’s inherent trade-off between stability and plasticity. Extensive experiments on standard continual learning benchmarks demonstrate that ZO-FC significantly outperforms pure ZO, conventional first-order, and state-of-the-art parameter-efficient fine-tuning (PEFT) methods—achieving both low forgetting rates and high task adaptation efficiency. The approach is especially suited for resource-limited deployment scenarios.

Technology Category

Application Category

📝 Abstract

Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods, particularly in settings where gradient computation is expensive or even impractical. Beyond its memory efficiency, in this work, we investigate ZO optimization for continual learning (CL) as a novel approach to address the plasticity-stability-efficiency trilemma. Through theoretical analysis and empirical evidence, we show that ZO optimization naturally leads to flatter loss landscapes, which in turn reduce forgetting in CL. However, this stability comes at a cost of plasticity: due to its imprecise gradient estimates and slower convergence, ZO optimization tends to be less effective than FO in acquiring new task-specific knowledge, particularly under constrained training budgets. To better understand this trade-off, we conduct a holistic evaluation of ZO optimization applied to various existing CL methods. Our findings reveal that ZO optimization enhances stability but often undermines plasticity, particularly when used with learnable classifiers. Motivated by this insight, we propose ZO-FC, a simple but effective approach that applies ZO optimization to a single adapter-based PEFT module with FO optimized classifier. This design leverages the stability benefits of ZO while preserving the adaptability of FO updates with negligible memory overhead. Experiments demonstrate that ZO-FC achieves an effective balance between stability and plasticity, offering a practical and memory-efficient solution for on-device CL.

Problem

Research questions and friction points this paper is trying to address.

ZO optimization mitigates forgetting in continual learning

ZO methods enhance stability but reduce plasticity

Proposed ZO-FC balances stability-plasticity trade-off efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-order optimization reduces forgetting via flatter loss landscapes

ZO-FC combines ZO optimization with first-order classifier updates

Adapter-based PEFT module enables memory-efficient continual learning

🔎 Similar Papers

Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last