TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Full fine-tuning incurs prohibitive computational overhead in multi-task image recognition, while existing parameter-efficient fine-tuning (PEFT) methods struggle to capture fine-grained task distinctions. Method: This paper proposes a task-adaptive dynamic Transformer architecture, featuring (1) a Dynamic Task Filter (DTF) that enables context-aware, input-level task-specific feature selection—the first of its kind—and (2) lightweight task-conditioned prompt embeddings coupled with context-aware feature modulation. The approach integrates dynamic sparse attention with the PEFT paradigm. Contribution/Results: Evaluated on the PASCAL-Context dense scene understanding benchmark, our method achieves significant accuracy gains over strong baselines. It requires only 11.9% (i.e., 1/8.4) of the parameters of full fine-tuning and consistently outperforms state-of-the-art PEFT methods across multiple metrics.

Technology Category

Application Category

📝 Abstract

Transfer learning paradigm has driven substantial advancements in various vision tasks. However, as state-of-the-art models continue to grow, classical full fine-tuning often becomes computationally impractical, particularly in multi-task learning (MTL) setup where training complexity increases proportional to the number of tasks. Consequently, recent studies have explored Parameter-Efficient Fine-Tuning (PEFT) for MTL architectures. Despite some progress, these approaches still exhibit limitations in capturing fine-grained, task-specific features that are crucial to MTL. In this paper, we introduce Task-Adaptive Dynamic transFormer, termed TADFormer, a novel PEFT framework that performs task-aware feature adaptation in the fine-grained manner by dynamically considering task-specific input contexts. TADFormer proposes the parameter-efficient prompting for task adaptation and the Dynamic Task Filter (DTF) to capture task information conditioned on input contexts. Experiments on the PASCAL-Context benchmark demonstrate that the proposed method achieves higher accuracy in dense scene understanding tasks, while reducing the number of trainable parameters by up to 8.4 times when compared to full fine-tuning of MTL models. TADFormer also demonstrates superior parameter efficiency and accuracy compared to recent PEFT methods.

Problem

Research questions and friction points this paper is trying to address.

Multi-task Learning

Parameter-Efficient Fine-Tuning (PEFT)

Image Recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

TADFormer

Dynamic Task Filters

Parameter-Efficient Fine-Tuning

🔎 Similar Papers

No similar papers found.