🤖 AI Summary
This work addresses the limited contextual awareness of existing parameter-efficient fine-tuning methods in multi-task adaptation, which often leads to task interference. Inspired by biological neuromodulation, the authors propose a Mixture-of-Experts LoRA framework that employs a lightweight, context-aware gating mechanism to dynamically select expert subspaces. To enhance subspace disentanglement, they introduce a contrastive orthogonality loss. By integrating low-rank adaptation, sparse projection, and neuromodulatory principles, the method significantly outperforms strong baselines such as FlyLoRA on benchmarks including MMLU, GSM8K, and ScienceQA. It achieves superior performance across diverse settings—single-task learning, multi-task fusion, and continual learning—while maintaining high parameter efficiency.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly Low-Rank Adaptation (LoRA), have become essential for adapting Large Language Models (LLMs) to downstream tasks. While the recent FlyLoRA framework successfully leverages bio-inspired sparse random projections to mitigate parameter interference, it relies on a static, magnitude-based routing mechanism that is agnostic to input context. In this paper, we propose NeuroLoRA, a novel Mixture-of-Experts (MoE) based LoRA framework inspired by biological neuromodulation -- the dynamic regulation of neuronal excitability based on context. NeuroLoRA retains the computational efficiency of frozen random projections while introducing a lightweight, learnable neuromodulation gate that contextually rescales the projection space prior to expert selection. We further propose a Contrastive Orthogonality Loss to explicitly enforce separation between expert subspaces, enhancing both task decoupling and continual learning capacity. Extensive experiments on MMLU, GSM8K, and ScienceQA demonstrate that NeuroLoRA consistently outperforms FlyLoRA and other strong baselines across single-task adaptation, multi-task model merging, and sequential continual learning scenarios, while maintaining comparable parameter efficiency.