π€ AI Summary
This work addresses the limitations of conventional full fine-tuning and existing parameter-efficient fine-tuning (PEFT) methods in multi-task learning, which either incur high computational costs or employ fixed low-rank adapters that fail to capture task- and layer-specific rank requirements and inter-task spatial relationships. To overcome these issues, the authors propose a frequency-aware, adaptive-rank multi-task fine-tuning framework. It dynamically allocates adapter ranks per task and layer through a performance-driven rank compression mechanism and introduces a task spectral pyramid decoder that integrates image frequency-domain features into spatial bias modeling to enhance cross-task collaboration. The method achieves significant performance gains over current PEFT approaches on dense visual tasks while reducing parameter counts by up to 9Γ.
π Abstract
Adapting models pre-trained on large-scale datasets is a proven way to reach strong performance quickly for down-stream tasks. However, the growth of state-of-the-art mod-els makes traditional full fine-tuning unsuitable and difficult, especially for multi-task learning (MTL) where cost scales with the number of tasks. As a result, recent studies investigate parameter-efficient fine-tuning (PEFT) using low-rank adaptation to significantly reduce the number of trainable parameters. However, these existing methods use a single, fixed rank, which may not be optimal for differ-ent tasks or positions in the MTL architecture. Moreover, these methods fail to learn spatial information that cap-tures inter-task relationships and helps to improve diverse task predictions. This paper introduces Frequency-Aware and Automatic Rank (FAAR) for efficient MTL fine-tuning. Our method introduces Performance-Driven Rank Shrink-ing (PDRS) to allocate the optimal rank per adapter location and per task. Moreover, by analyzing the image frequency spectrum, FAAR proposes a Task-Spectral Pyramidal Decoder (TS-PD) that injects input-specific context into spatial bias learning to better reflect cross-task relationships. Experiments performed on dense visual task benchmarks show the superiority of our method in terms of both accuracy and efficiency compared to other PEFT methods in MTL. FAAR reduces the number of parameters by up to 9 times compared to traditional MTL fine-tuning whilst improving overall performance. Our code is available.