🤖 AI Summary
Existing parameter-efficient fine-tuning methods, such as LoRA, rely on localized low-rank weight perturbations and struggle to achieve globally consistent representation optimization. This work proposes a centralized adaptation framework that shifts adaptation from the weight space to inter-layer unified refinement by evolving hidden states in parallel across Transformer layers through deeply shared shadow modules. The approach introduces, for the first time, a pre-trainable, cross-layer reusable shadow network that supports decoupled deployment, integrating low-rank substitution with optional detached inference. Under the same trainable parameter budget, the method matches or surpasses LoRA and DoRA on both generative and understanding tasks, while significantly improving transferability, scalability, and edge deployment efficiency.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserting independent low-rank perturbations directly to individual weights, resulting in a local parameterization of adaptation. We propose ShadowPEFT, a centralized PEFT framework that instead performs layer-level refinement through a depth-shared shadow module. At each transformer layer, ShadowPEFT maintains a parallel shadow state and evolves it repeatedly for progressively richer hidden states. This design shifts adaptation from distributed weight-space perturbations to a shared layer-space refinement process. Since the shadow module is decoupled from the backbone, it can be reused across depth, independently pretrained, and optionally deployed in a detached mode, benefiting edge computing scenarios. Experiments on generation and understanding benchmarks show that ShadowPEFT matches or outperforms LoRA and DoRA under comparable trainable-parameter budgets. Additional analyses on shadow pretraining, cross-dataset transfer, parameter scaling, inference latency, and system-level evaluation suggest that centralized layer-space adaptation is a competitive and flexible alternative to conventional low-rank PEFT.