🤖 AI Summary
To address generalization degradation caused by task interference and negative transfer in multi-task learning, this paper proposes Progressive Task-Specific Adaptation (PTSA). PTSA hierarchically introduces lightweight adapter modules atop a shared backbone network and incorporates a gradient-similarity-based dynamic task clustering mechanism to adaptively allocate shared versus task-specific parameters, enabling parameter-efficient fine-tuning. Crucially, we embed gradient similarity measurement directly into the Swin Transformer architecture. On the PASCAL-Context and NYUD-v2 multi-task benchmarks, PTSA achieves superior performance using only 20% of the trainable parameters required by full fine-tuning—outperforming both the full fine-tuning baseline and existing state-of-the-art methods. The approach simultaneously enhances model efficiency, improves task decoupling, and strengthens cross-task generalization capability.
📝 Abstract
Parameter-efficient fine-tuning methods have emerged as a promising solution for adapting pre-trained models to various downstream tasks. While these methods perform well in single-task learning, extending them to multi-task learning exacerbates common challenges, such as task interference and negative transfer, due to the limited number of trainable parameters. To address these issues, we introduce progressive task-specific multi-task adaptation, a novel parameter-efficient approach for multi-task learning. This approach introduces adapter modules in a pre-trained model such that these modules are shared across all tasks in the initial layers and become progressively more task-specific in the later layers. The motivation is to reduce the conflicts among tasks by allowing transfer learning across all tasks in the initial layers and enabling task-specific learning toward the prediction heads. Additionally, we propose a gradient-based approach for computing task similarity and use this measure to allocate similar tasks to the shared adapter modules. Our task similarity method introduces minimal overhead in the pipeline. We evaluate our approach by adapting the Swin Transformer for dense prediction tasks. Experiments on the PASCAL and NYUD-v2 datasets demonstrate that our approach outperforms a fully fine-tuned multi-task model while requiring only one-fifth of the trainable parameters. This approach achieves better relative improvement to single-task fine-tuning while reducing the number of trainable parameters and surpasses the current state-of-the-art methods for parameter-efficient multi-task learning.