A Faster Path to Continual Learning

πŸ“… 2026-04-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high computational cost and low training efficiency of optimization methods in continual learning by proposing the C-Flat Turbo optimizer. It reveals, for the first time, the directional invariance inherent in first-order gradient flatness and leverages inter-task gradient stability to design an adaptive, linearly scheduled triggering mechanism that effectively reduces redundant gradient computations. Without compromising model performance, C-Flat Turbo achieves 1.0–1.25Γ— training speedup across multiple continual learning benchmarks while maintaining comparable or superior accuracy, thereby significantly enhancing training efficiency.

Technology Category

Application Category

πŸ“ Abstract
Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old tasks. However, C-Flat requires three additional gradient computations per iteration, imposing substantial overhead on the optimization process. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer that significantly reduces the training cost. We show that the gradients associated with first-order flatness contain direction-invariant components relative to the proxy-model gradients, enabling us to skip redundant gradient computations in the perturbed ascent steps. Moreover, we observe that these flatness-promoting gradients progressively stabilize across tasks, which motivates a linear scheduling strategy with an adaptive trigger to allocate larger turbo steps for later tasks. Experiments show that C-Flat Turbo is 1.0$\times$ to 1.25$\times$ faster than C-Flat across a wide range of CL methods, while achieving comparable or even improved accuracy.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
C-Flat
gradient computation
training overhead
optimization efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning
C-Flat Turbo
gradient efficiency
flatness optimization
adaptive scheduling
πŸ”Ž Similar Papers
No similar papers found.