🤖 AI Summary
Existing prompt-based continual learning methods suffer from insufficient prompt representation diversity: fixed prompts lack adaptability, while shared prompt spaces cause task interference. To address this, we propose a dynamic evolutionary prompt fusion mechanism that adaptively integrates historical and novel task knowledge without replay, under a frozen backbone. Our approach employs learnable, task-specific alignment transformations coupled with a probabilistic gating strategy to modulate prompt fusion. Innovatively integrating prompt engineering, multi-task alignment transformations, and end-to-end optimization, our method achieves state-of-the-art performance on sequential image classification and video action recognition benchmarks—outperforming prior art by 9.07% and 7.40% in average accuracy, respectively. This demonstrates substantial improvements in representational richness and knowledge continuity across tasks.
📝 Abstract
Prompt-based continual learning provides a rehearsal-free solution by tuning small sets of parameters while keeping pre-trained models frozen. To meet the complex demands of sequential tasks, it is crucial to integrate task-specific knowledge within prompts effectively. However, existing works rely on either fixed learned prompts (i.e., prompts whose representations remain unchanged during new task learning) or on prompts generated from an entangled task-shared space, limiting the representational diversity of the integrated prompt. To address this issue, we propose a novel prompt-evolving mechanism to adaptively aggregate base prompts (i.e., task-specific prompts) into a unified prompt while ensuring diversity. By transforming and aligning base prompts, both previously learned and newly introduced, our approach continuously evolves accumulated knowledge to facilitate learning new tasks. We further introduce a learnable probabilistic gate that adaptively determines which layers to activate during the evolution process. We validate our method on image classification and video action recognition tasks in class-incremental learning, achieving average gains of 9.07% and 7.40% over existing methods across all scenarios.