🤖 AI Summary
In continual learning, models face the fundamental trade-off between adaptability to new tasks and stability of previously acquired knowledge; moreover, conventional zero-order sharpness-based optimization often converges to sharp minima, degrading generalization robustness. To address this, we propose C-Flat—the first loss landscape flatness optimization framework explicitly designed for continual learning—and its lightweight, efficient variant C-Flat++, compatible with mainstream paradigms including replay, regularization, and parameter isolation for plug-and-play integration. Our method integrates sharpness-aware principles with a selective flatness-driven update mechanism, requiring no architectural modifications and significantly reducing computational overhead. Extensive experiments demonstrate that C-Flat consistently improves both accuracy and stability across diverse benchmarks, algorithms, and continual learning scenarios. C-Flat++ achieves comparable performance with substantially reduced training cost, offering both theoretical rigor and practical deployability.
📝 Abstract
Balancing sensitivity to new tasks and stability for retaining past knowledge is crucial in continual learning (CL). Recently, sharpness-aware minimization has proven effective in transfer learning and has also been adopted in continual learning (CL) to improve memory retention and learning efficiency. However, relying on zeroth-order sharpness alone may favor sharper minima over flatter ones in certain settings, leading to less robust and potentially suboptimal solutions. In this paper, we propose extbf{C}ontinual extbf{Flat}ness ( extbf{C-Flat}), a method that promotes flatter loss landscapes tailored for CL. C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline. Besides, we present a general framework that integrates C-Flat into all major CL paradigms and conduct comprehensive comparisons with loss-minima optimizers and flat-minima-based CL methods. Our results show that C-Flat consistently improves performance across a wide range of settings. In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion, significantly reducing the update cost required by C-Flat. Extensive experiments across multiple CL methods, datasets, and scenarios demonstrate the effectiveness and efficiency of our proposed approaches. Code is available at https://github.com/WanNaa/C-Flat.