Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Audio-visual multi-task continual learning faces dual challenges of catastrophic forgetting and effective cross-modal collaborative modeling. To address these, we propose a progressive three-stage prompt tuning framework that hierarchically designs shared–specific prompt mechanisms to preserve knowledge stability for previously learned tasks while enhancing plasticity for new ones. Our key contributions include: (i) a task-shared modality-aggregation adapter for cross-modal alignment; (ii) a task-specific yet modality-shared dynamic generation adapter for adaptive feature modulation; and (iii) a task-specific and modality-independent prompt module enabling modality-agnostic knowledge extraction and cross-task transfer. Evaluated on four benchmarks—AVE, AVVP, AVS, and AVQA—our method achieves state-of-the-art performance across diverse task sequences, demonstrating significant improvements in generalization and robustness for audio-visual multi-task continual learning.

Technology Category

Application Category

📝 Abstract

Audio-visual multi-task incremental learning aims to continuously learn from multiple audio-visual tasks without the need for joint training on all tasks. The challenge of the problem is how to preserve the old task knowledge while facilitating the learning of new task with previous experiences. To address these challenges, we introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method. In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning to enhance shared understanding between tasks. In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks while remaining general across modalities, which balances the models ability to retain knowledge against forgetting with its potential for versatile multi-task transferability. In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability by targeting individual information for each task and modality. By incorporating these three phases, PHP retains task-specific prompts while adapting shared parameters for new tasks to effectively balance knowledge sharing and specificity. Our method achieves SOTA performance in different orders of four tasks (AVE, AVVP, AVS and AVQA). Our code can be available at https://github.com/ENJOY-Yin-jiong/PHP.

Problem

Research questions and friction points this paper is trying to address.

Enables continuous learning from multiple audio-visual tasks without joint training

Balances old task retention and new task learning with shared experiences

Enhances cross-task and cross-modal representation through progressive prompt tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Homeostatic and Plastic prompt tuning

Task-shared modality aggregating adapter

Task-specific modality-independent prompts

🔎 Similar Papers

ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models