MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing low-rank adaptation methods are predominantly designed for Vision Transformers (ViTs) and exhibit poor generalization across architectures—especially to Convolutional Neural Networks (CNNs). To address this limitation, we propose MSLoRA, the first parameter-efficient backbone adapter unifying support for both CNNs and ViTs. While keeping the backbone frozen, MSLoRA employs low-rank linear projections to couple multi-scale nonlinear transformations, jointly modulating spatial and channel attention to achieve lightweight feature reweighting. Its core innovation lies in a dot-product fusion and residual-connection-driven attention modulation mechanism that operates without updating pretrained weights. Evaluated on classification, detection, and segmentation tasks, MSLoRA introduces fewer than 5% additional backbone parameters yet delivers significant performance gains, faster convergence, and enhanced optimization stability—demonstrating strong cross-architecture generalization.

Technology Category

Application Category

📝 Abstract

We introduce MSLoRA, a backbone-agnostic, parameter-efficient adapter that reweights feature responses rather than re-tuning the underlying backbone. Existing low-rank adaptation methods are mostly confined to vision transformers (ViTs) and struggle to generalize across architectures. MSLoRA unifies adaptation for both convolutional neural networks (CNNs) and ViTs by combining a low-rank linear projection with a multi-scale nonlinear transformation that jointly modulates spatial and channel attention. The two components are fused through pointwise multiplication and a residual connection, yielding a lightweight module that shifts feature attention while keeping pretrained weights frozen. Extensive experiments demonstrate that MSLoRA consistently improves transfer performance on classification, detection, and segmentation tasks with roughly less than 5% of backbone parameters. The design further enables stable optimization, fast convergence, and strong cross-architecture generalization. By reweighting rather than re-tuning, MSLoRA provides a simple and universal approach for efficient adaptation of frozen vision backbones.

Problem

Research questions and friction points this paper is trying to address.

Unifying adaptation for CNNs and ViTs across architectures

Reweighting feature attention while keeping pretrained weights frozen

Achieving parameter-efficient transfer learning with under 5% parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reweights feature responses via attention modulation

Combines low-rank projection with multi-scale transformation

Unifies adaptation for both CNNs and vision transformers

🔎 Similar Papers

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

2024-06-16arXiv.orgCitations: 3

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Research Engineer, Monetization AI