🤖 AI Summary
Efficient fine-tuning of large language models (LLMs) for multi-task scenarios remains challenging under parameter and memory constraints, particularly in balancing cross-task knowledge sharing with task-specific adaptation.
Method: We propose Bayesian Hierarchical Low-Rank Adaptation (BHLora), a novel framework that introduces a Bayesian hierarchical prior over low-rank adaptation modules. This prior enables few-shot tasks to leverage structural knowledge from related tasks while allowing data-rich tasks to retain sufficient specialization. BHLora integrates Bayesian hierarchical modeling, low-rank matrix decomposition, and multi-task learning, employing scalable variational inference for posterior estimation.
Contribution/Results: BHLora overcomes the trade-off inherent in conventional LoRA—either single-task fine-tuning or uniform joint fine-tuning—by enabling adaptive, task-aware parameter sharing. On standard multi-task benchmarks, it achieves significantly lower perplexity and superior generalization compared to both independent and joint fine-tuning baselines, while maintaining parameter count and GPU memory overhead comparable to standard LoRA.
📝 Abstract
This paper introduces Bayesian Hierarchical Low-Rank Adaption (BoRA), a novel method for finetuning multi-task Large Language Models (LLMs). Current finetuning approaches, such as Low-Rank Adaption (LoRA), perform exeptionally well in reducing training parameters and memory usage but face limitations when applied to multiple similar tasks. Practitioners usually have to choose between training separate models for each task or a single model for all tasks, both of which come with trade-offs in specialization and data utilization. BoRA addresses these trade-offs by leveraging a Bayesian hierarchical model that allows tasks to share information through global hierarchical priors. This enables tasks with limited data to benefit from the overall structure derived from related tasks while allowing tasks with more data to specialize. Our experimental results show that BoRA outperforms both individual and unified model approaches, achieving lower perplexity and better generalization across tasks. This method provides a scalable and efficient solution for multi-task LLM finetuning, with significant practical implications for diverse applications.