MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing low-rank adaptation methods are predominantly designed for Vision Transformers (ViTs) and exhibit poor generalization across architectures—especially to Convolutional Neural Networks (CNNs). To address this limitation, we propose MSLoRA, the first parameter-efficient backbone adapter unifying support for both CNNs and ViTs. While keeping the backbone frozen, MSLoRA employs low-rank linear projections to couple multi-scale nonlinear transformations, jointly modulating spatial and channel attention to achieve lightweight feature reweighting. Its core innovation lies in a dot-product fusion and residual-connection-driven attention modulation mechanism that operates without updating pretrained weights. Evaluated on classification, detection, and segmentation tasks, MSLoRA introduces fewer than 5% additional backbone parameters yet delivers significant performance gains, faster convergence, and enhanced optimization stability—demonstrating strong cross-architecture generalization.

Technology Category

Application Category

📝 Abstract
We introduce MSLoRA, a backbone-agnostic, parameter-efficient adapter that reweights feature responses rather than re-tuning the underlying backbone. Existing low-rank adaptation methods are mostly confined to vision transformers (ViTs) and struggle to generalize across architectures. MSLoRA unifies adaptation for both convolutional neural networks (CNNs) and ViTs by combining a low-rank linear projection with a multi-scale nonlinear transformation that jointly modulates spatial and channel attention. The two components are fused through pointwise multiplication and a residual connection, yielding a lightweight module that shifts feature attention while keeping pretrained weights frozen. Extensive experiments demonstrate that MSLoRA consistently improves transfer performance on classification, detection, and segmentation tasks with roughly less than 5% of backbone parameters. The design further enables stable optimization, fast convergence, and strong cross-architecture generalization. By reweighting rather than re-tuning, MSLoRA provides a simple and universal approach for efficient adaptation of frozen vision backbones.
Problem

Research questions and friction points this paper is trying to address.

Unifying adaptation for CNNs and ViTs across architectures
Reweighting feature attention while keeping pretrained weights frozen
Achieving parameter-efficient transfer learning with under 5% parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reweights feature responses via attention modulation
Combines low-rank projection with multi-scale transformation
Unifies adaptation for both CNNs and vision transformers
🔎 Similar Papers
No similar papers found.
X
Xu Yang
Department of Computer Science, Illinois Institute of Technology
Gady Agam
Gady Agam
Unknown affiliation