Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Audio-visual multi-task continual learning faces dual challenges of catastrophic forgetting and effective cross-modal collaborative modeling. To address these, we propose a progressive three-stage prompt tuning framework that hierarchically designs shared–specific prompt mechanisms to preserve knowledge stability for previously learned tasks while enhancing plasticity for new ones. Our key contributions include: (i) a task-shared modality-aggregation adapter for cross-modal alignment; (ii) a task-specific yet modality-shared dynamic generation adapter for adaptive feature modulation; and (iii) a task-specific and modality-independent prompt module enabling modality-agnostic knowledge extraction and cross-task transfer. Evaluated on four benchmarks—AVE, AVVP, AVS, and AVQA—our method achieves state-of-the-art performance across diverse task sequences, demonstrating significant improvements in generalization and robustness for audio-visual multi-task continual learning.

Technology Category

Application Category

📝 Abstract
Audio-visual multi-task incremental learning aims to continuously learn from multiple audio-visual tasks without the need for joint training on all tasks. The challenge of the problem is how to preserve the old task knowledge while facilitating the learning of new task with previous experiences. To address these challenges, we introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method. In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning to enhance shared understanding between tasks. In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks while remaining general across modalities, which balances the models ability to retain knowledge against forgetting with its potential for versatile multi-task transferability. In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability by targeting individual information for each task and modality. By incorporating these three phases, PHP retains task-specific prompts while adapting shared parameters for new tasks to effectively balance knowledge sharing and specificity. Our method achieves SOTA performance in different orders of four tasks (AVE, AVVP, AVS and AVQA). Our code can be available at https://github.com/ENJOY-Yin-jiong/PHP.
Problem

Research questions and friction points this paper is trying to address.

Enables continuous learning from multiple audio-visual tasks without joint training
Balances old task retention and new task learning with shared experiences
Enhances cross-task and cross-modal representation through progressive prompt tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Homeostatic and Plastic prompt tuning
Task-shared modality aggregating adapter
Task-specific modality-independent prompts
🔎 Similar Papers
No similar papers found.
J
Jiong Yin
Hangzhou Dianzi University, Institute of Computing Technology, Chinese Academy of Sciences
L
Liang Li
Institute of Computing Technology, Chinese Academy of Sciences
Jiehua Zhang
Jiehua Zhang
University of Oulu
Deep learningObject detectionModel quantization
Y
Yuhan Gao
Hangzhou Dianzi University
Chenggang Yan
Chenggang Yan
Hangzhou Dianzi University
X
Xichun Sheng
Macao Polytechnic University