🤖 AI Summary
To address the limited generalization and efficiency of parameter-efficient fine-tuning (PEFT) in multimodal tasks, this paper proposes Synaptic Adaptive Networks (SAN), the first PEFT framework to formally incorporate neuroscientific long-term potentiation/depression (LTP/LTD) mechanisms—establishing a theoretical mapping between low-rank updates and synaptic plasticity. SAN introduces a neuron–synapse collaborative decomposition architecture: it decomposes weight matrices into scale components driven by feedforward features and enables learnable propagation, integrated with neural-dynamics-based gating and multi-stage gradient modulation. Evaluated on 25 vision, 8 commonsense reasoning, and 7 vision-language tasks, SAN consistently outperforms full fine-tuning (+2.4%–8.7%) and LoRA (+1.9%–4.7%), achieving superior parameter efficiency and cross-modal generalization.
📝 Abstract
Advances in Parameter-Efficient Fine-Tuning (PEFT) bridged the performance gap with Full Fine-Tuning (FFT) through sophisticated analysis of pre-trained parameter spaces. Starting from drawing insights from Neural Engrams (NE) in Biological Neural Networks (BNNs), we establish a connection between the low-rank property observed during PEFT's parameter space shifting and neurobiological mechanisms. This observation leads to our proposed method, Synapse and Neuron (SAN), which decomposes and propagates scaling components from anterior feature adjusting vectors towards posterior weight matrices. Our approach is theoretically grounded in Long-Term Potentiation/Depression (LTP/D) phenomena, which govern synapse development through neurotransmitter release modulation. Extensive experiments demonstrate its effectiveness: on extbf{vision tasks} across VTAB, FGVC, and GIC (25 datasets) using ViT, SwinT and ConvNeXt, SAN outperforms FFT up to 8.7% and LoRA by 3.2%; on language tasks using Commonsense Reasoning (8 datasets) with LLaMA models (all generations), surpassing ChatGPT up to 8.5% and LoRA by 4.7%; on visual-language tasks using Mixed Visual Instruction (7 datasets) with LLaVA models, it exceeds FFT up to 2.4% and LoRA by 1.9%. Our code and W&B log will be released in https://github.com/daviddaiiiii/SAN-PEFT