Parameter-Efficient Multi-Task Learning via Progressive Task-Specific Adaptation

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address generalization degradation caused by task interference and negative transfer in multi-task learning, this paper proposes Progressive Task-Specific Adaptation (PTSA). PTSA hierarchically introduces lightweight adapter modules atop a shared backbone network and incorporates a gradient-similarity-based dynamic task clustering mechanism to adaptively allocate shared versus task-specific parameters, enabling parameter-efficient fine-tuning. Crucially, we embed gradient similarity measurement directly into the Swin Transformer architecture. On the PASCAL-Context and NYUD-v2 multi-task benchmarks, PTSA achieves superior performance using only 20% of the trainable parameters required by full fine-tuning—outperforming both the full fine-tuning baseline and existing state-of-the-art methods. The approach simultaneously enhances model efficiency, improves task decoupling, and strengthens cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Parameter-efficient fine-tuning methods have emerged as a promising solution for adapting pre-trained models to various downstream tasks. While these methods perform well in single-task learning, extending them to multi-task learning exacerbates common challenges, such as task interference and negative transfer, due to the limited number of trainable parameters. To address these issues, we introduce progressive task-specific multi-task adaptation, a novel parameter-efficient approach for multi-task learning. This approach introduces adapter modules in a pre-trained model such that these modules are shared across all tasks in the initial layers and become progressively more task-specific in the later layers. The motivation is to reduce the conflicts among tasks by allowing transfer learning across all tasks in the initial layers and enabling task-specific learning toward the prediction heads. Additionally, we propose a gradient-based approach for computing task similarity and use this measure to allocate similar tasks to the shared adapter modules. Our task similarity method introduces minimal overhead in the pipeline. We evaluate our approach by adapting the Swin Transformer for dense prediction tasks. Experiments on the PASCAL and NYUD-v2 datasets demonstrate that our approach outperforms a fully fine-tuned multi-task model while requiring only one-fifth of the trainable parameters. This approach achieves better relative improvement to single-task fine-tuning while reducing the number of trainable parameters and surpasses the current state-of-the-art methods for parameter-efficient multi-task learning.
Problem

Research questions and friction points this paper is trying to address.

Addressing task interference and negative transfer in multi-task learning
Developing parameter-efficient adaptation for multiple downstream tasks
Optimizing task-specific learning while minimizing trainable parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive task-specific adapter modules in later layers
Gradient-based task similarity for shared adapter allocation
Parameter-efficient multi-task learning with minimal trainable parameters
🔎 Similar Papers
No similar papers found.
N
Neeraj Gangwar
University of Illinois Urbana-Champaign
Anshuka Rangi
Anshuka Rangi
Applied Scientist at Amazon, PhD from University of California, San Diego
Online LearningReinforcement LearningDistributed TestingDistributed EstimationInformation Theory
R
Rishabh Deshmukh
Amazon
H
Holakou Rahmanian
Amazon
Y
Yesh Dattatreya
Amazon
N
Nickvash Kani
University of Illinois Urbana-Champaign