🤖 AI Summary
This work proposes FreqAdapter, a novel parameter-efficient fine-tuning method that operates in the frequency domain to overcome the limitations of existing spatial-domain approaches, which often introduce information redundancy and struggle to capture multi-scale features. FreqAdapter introduces, for the first time, a text-guided multi-scale adaptation mechanism in the frequency domain, optimizing receptive fields across different frequency bands to enhance model representational capacity while maintaining an extremely low parameter overhead. By shifting adaptation from the spatial to the spectral domain, the method transcends conventional spatial fine-tuning constraints, achieving significant performance gains on multimodal models such as CLIP and LLaVA, and enabling rapid convergence within a single training epoch.
📝 Abstract
Parameter-efficient fine-tuning methods introduce a small number of training parameters, enabling pre-trained models to adapt rapidly to new data distributions. While these methods have shown promising results, they exhibit notable limitations. First, most existing methods operate in the signal space domain, which results in substantial information redundancy. Second, most existing methods utilize fixed prompts or adaptation layers, failing to fully account for the multi-scale characteristics of signals. To address these challenges, we propose the Multi-Scale Frequency Adapter (FreqAdapter), which integrates textual information and performs multi-scale fine-tuning of signals in the frequency domain. Additionally, we introduce a multi-scale adaptation strategy to optimize receptive fields across different frequency ranges, further enhancing the model's representational capacity. Extensive experiments on multimodal models, including CLIP and LLaVA, demonstrate that FreqAdapter significantly improves both performance and efficiency. FreqAdapter improves performance with minimal cost and fast convergence within one epoch. Code is available at https://github.com/Kelvin-ywc/FreqAdapter.