1%>100%: High-Efficiency Visual Adapter with Complex Linear Projection Optimization

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost of fine-tuning vision foundation models and the limited effectiveness of existing delta-tuning methods in visual tasks. To this end, the authors propose CoLin, a low-rank complex-valued adapter that introduces only approximately 1% trainable parameters. CoLin leverages complex linear projection optimization together with a tailored loss function to enable efficient model adaptation. Theoretical analysis elucidates the convergence properties of low-rank complex matrices and informs the proposed solution. Extensive experiments across diverse benchmarks—including image classification, object detection, segmentation, and rotated object detection in remote sensing—demonstrate that CoLin significantly outperforms both full fine-tuning and state-of-the-art parameter-efficient fine-tuning approaches, achieving superior performance for the first time under extremely low trainable parameter budgets.

Technology Category

Application Category

📝 Abstract
Deploying vision foundation models typically relies on efficient adaptation strategies, whereas conventional full fine-tuning suffers from prohibitive costs and low efficiency. While delta-tuning has proven effective in boosting the performance and efficiency of LLMs during adaptation, its advantages cannot be directly transferred to the fine-tuning pipeline of vision foundation models. To push the boundaries of adaptation efficiency for vision tasks, we propose an adapter with Complex Linear Projection Optimization (CoLin). For architecture, we design a novel low-rank complex adapter that introduces only about 1% parameters to the backbone. For efficiency, we theoretically prove that low-rank composite matrices suffer from severe convergence issues during training, and address this challenge with a tailored loss. Extensive experiments on object detection, segmentation, image classification, and rotated object detection (remote sensing scenario) demonstrate that CoLin outperforms both full fine-tuning and classical delta-tuning approaches with merely 1% parameters for the first time, providing a novel and efficient solution for deployment of vision foundation models. We release the code on https://github.com/DongshuoYin/CoLin.
Problem

Research questions and friction points this paper is trying to address.

vision foundation models
efficient adaptation
parameter-efficient fine-tuning
adapter
low-rank optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual adapter
low-rank adaptation
complex linear projection
efficient fine-tuning
vision foundation models
🔎 Similar Papers
2024-08-29arXiv.orgCitations: 7
D
Dongshuo Yin
BNRist, Department of Computer Science and Technology, Tsinghua University, Beijing, China
X
Xue Yang
School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
D
Deng-Ping Fan
Nankai International Advanced Research Institute (Shenzhen Futian) & SLAI, Shenzhen, China
Shi-Min Hu
Shi-Min Hu
Tsinghua University
Geometry processingGeometric ModelingComputer GraphicsImage and Video processingComputer Aided Design