One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning

๐Ÿ“… 2025-01-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high parameter count and computational overhead of fine-tuning CLIP in few-shot learning, this paper proposes Block-LoRAโ€”a block-wise low-rank adaptation framework. Its core innovations include block-wise low-rank matrix decomposition and cross-block sharing of the downward projection matrix, which simplifies certain matrix multiplications into additive operations. We theoretically establish that Block-LoRA achieves a tighter generalization error bound than standard LoRA. To our knowledge, it is the first method enabling full ImageNet few-shot fine-tuning on a single 24GB GPU. Block-LoRA retains state-of-the-art (SOTA) accuracy while reducing trainable parameters by approximately 60%, and significantly lowering GPU memory consumption and computational costโ€”thus achieving an effective balance between efficiency and practicality.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in fine-tuning Vision-Language Foundation Models (VLMs) have garnered significant attention for their effectiveness in downstream few-shot learning tasks.While these recent approaches exhibits some performance improvements, they often suffer from excessive training parameters and high computational costs. To address these challenges, we propose a novel Block matrix-based low-rank adaptation framework, called Block-LoRA, for fine-tuning VLMs on downstream few-shot tasks. Inspired by recent work on Low-Rank Adaptation (LoRA), Block-LoRA partitions the original low-rank decomposition matrix of LoRA into a series of sub-matrices while sharing all down-projection sub-matrices. This structure not only reduces the number of training parameters, but also transforms certain complex matrix multiplication operations into simpler matrix addition, significantly lowering the computational cost of fine-tuning. Notably, Block-LoRA enables fine-tuning CLIP on the ImageNet few-shot benchmark using a single 24GB GPU. We also show that Block-LoRA has the more tighter bound of generalization error than vanilla LoRA. Without bells and whistles, extensive experiments demonstrate that Block-LoRA achieves competitive performance compared to state-of-the-art CLIP-based few-shot methods, while maintaining a low training parameters count and reduced computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Few-shot Learning
Parameter Efficiency
Computational Cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block-LoRA
CLIP-based Models
Low-resource Fine-tuning
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chunpeng Zhou
Zhejiang university
Qianqian Shen
Qianqian Shen
Zhejiang University
Medical Image AnalysisComputer Vision
Z
Zhi Yu
Zhejiang university
J
Jiajun Bu
Zhejiang university
Haishuai Wang
Haishuai Wang
Harvard University
Data MiningMachine Learning