🤖 AI Summary
This work addresses the challenge of cross-task interference in multi-task instruction tuning, where shared parameters often hinder the performance of large language models. To mitigate this issue, the authors propose Basis Ability Decomposition for Instruction Tuning (BADIT), a novel approach that decouples model parameters into a set of orthogonal high-singular-value LoRA experts. Orthogonality among these experts is dynamically preserved during training via spherical clustering, enabling each task to be represented as a linear combination of orthogonal basis capabilities. Integrating LoRA-based fine-tuning, singular value decomposition, and multi-task learning, BADIT is evaluated across six prominent large language models on the SuperNI benchmark, consistently outperforming state-of-the-art methods and effectively alleviating cross-task interference.
📝 Abstract
Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interference still exists for the existing solutions because of many parameters also shared by different tasks, and accordingly, we propose a novel solution, namely Basic Abilities Decomposition for multi-task Instruct-Tuning (BADIT). Specifically, we empirically find that certain parameters are consistently co-activated, and that co-activated parameters naturally organize into base groups. This motivates us to analogize that LLMs encode several orthogonal basic abilities, and that any task can be represented as a linear combination of these abilities. Accordingly, we propose BADIT that decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities, and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. We conduct extensive experiments on the SuperNI benchmark with 6 LLMs, and empirical results demonstrate that BADIT can outperform SOTA methods and mitigate the degree of cross-task interference.