Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address implicit forgetting and prompt memory explosion—two critical bottlenecks in prompt-based continual learning (CL) under task-agnostic inference—this paper proposes GRID, a novel framework for efficient lifelong adaptation of large language models. Methodologically, GRID introduces (1) a task-aware decoding mechanism that dynamically aligns with task semantics during inference to mitigate forgetting, and (2) a gradient-driven prompt selection and aggregation strategy that jointly leverages input-guided task identification, constrained decoding, and gradient similarity evaluation to automatically select informative prompts and compress redundant parameters. Empirically, GRID achieves substantial improvements in backward transfer across multiple benchmarks—reducing forgotten tasks by 80%—while preserving strong forward transfer. Crucially, prompt memory scales sublinearly with the number of tasks, enabling the first scalable, memory-efficient paradigm for lifelong self-adaptation of LLMs.

Technology Category

Application Category

📝 Abstract

Prompt-based continual learning (CL) offers a parameter-efficient way to adapt large language models (LLMs) across task sequences. However, most existing methods assume task-aware inference and maintain a growing list of task-specific prompts, which limits scalability and hides latent forgetting. In this work, we introduce GRID, a unified framework that addresses two key limitations: (1) latent forgetting under task-agnostic inference, and (2) prompt memory explosion as task sequences grow. GRID integrates a task-aware decoding mechanism that improves backward transfer by leveraging representative inputs, automatic task identification, and constrained decoding. Additionally, we propose a gradient-based prompt selection strategy that compresses less informative prompts into a single aggregated representation, enabling scalable and memory-efficient lifelong learning. Extensive experiments across short-sequence, long-sequence, and negative transfer benchmarks show that GRID significantly improves backward transfer, achieves competitive forward transfer, and reduces forgotten tasks by up to 80%, outperforming state-of-the-art methods on T5 and Flan-T5 backbones.

Problem

Research questions and friction points this paper is trying to address.

Addresses latent forgetting in task-agnostic continual learning

Reduces prompt memory explosion in growing task sequences

Improves backward transfer and minimizes forgotten tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-based prompt selection strategy

Task-aware decoding mechanism

Unified framework for continual learning

🔎 Similar Papers

How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?