🤖 AI Summary
High fine-tuning costs of large vision foundation models and the suboptimal performance-efficiency trade-off of task-agnostic parameter-efficient fine-tuning (PEFT) methods motivate this work. We propose a task-aware parameter-and-token selection framework that jointly optimizes parameter updates and token retention based on task relevance. Specifically, we hierarchically identify critical trainable parameters via the Fisher Information Matrix (FIM), dynamically prune and merge semantically salient image tokens, and perform fine-grained, task-adaptive lightweight adaptation while freezing the backbone. Evaluated on FGVC and VTAB-1k benchmarks, our method surpasses full-parameter fine-tuning by 3.40% and 10.35%, respectively, achieving state-of-the-art accuracy. Crucially, it significantly reduces computational overhead and GPU memory consumption, demonstrating superior efficiency without sacrificing performance.
📝 Abstract
Large pre-trained models achieve remarkable performance in vision tasks but are impractical for fine-tuning due to high computational and storage costs. Parameter-Efficient Fine-Tuning (PEFT) methods mitigate this issue by updating only a subset of parameters; however, most existing approaches are task-agnostic, failing to fully exploit task-specific adaptations, which leads to suboptimal efficiency and performance. To address this limitation, we propose Task-Relevant Parameter and Token Selection (TR-PTS), a task-driven framework that enhances both computational efficiency and accuracy. Specifically, we introduce Task-Relevant Parameter Selection, which utilizes the Fisher Information Matrix (FIM) to identify and fine-tune only the most informative parameters in a layer-wise manner, while keeping the remaining parameters frozen. Simultaneously, Task-Relevant Token Selection dynamically preserves the most informative tokens and merges redundant ones, reducing computational overhead. By jointly optimizing parameters and tokens, TR-PTS enables the model to concentrate on task-discriminative information. We evaluate TR-PTS on benchmark, including FGVC and VTAB-1k, where it achieves state-of-the-art performance, surpassing full fine-tuning by 3.40% and 10.35%, respectively. The code are available at https://github.com/synbol/TR-PTS.