🤖 AI Summary
Multi-task low-rank adaptation (LoRA) in large language models (LLMs) incurs substantial memory overhead due to task-specific adapter storage, while existing adapter merging methods suffer from significant performance degradation.
Method: This paper proposes a tunable adapter merging framework grounded in matrix similarity, which quantifies the intrinsic similarity among LoRA weight matrices and formulates an optimization-driven merging strategy.
Contribution/Results: Our approach enables continuous, controllable trade-offs between storage compression and task performance—overcoming the inflexibility of conventional fixed-compromise schemes. Empirical evaluation shows that, at just 52% of the original storage cost, the average performance drop is only 0.2–1.8%, substantially outperforming prior merging techniques. The method thus delivers both high efficiency and practical utility in resource-constrained deployment scenarios.
📝 Abstract
Large language models (LLMs) often leverage adapters, such as low-rank-based adapters, to achieve strong performance on downstream tasks. However, storing a separate adapter for each task significantly increases memory requirements, posing a challenge for resource-constrained environments such as mobile devices. Although model merging techniques can reduce storage costs, they typically result in substantial performance degradation. In this work, we introduce HydraOpt, a new model merging technique that capitalizes on the inherent similarities between the matrices of low-rank adapters. Unlike existing methods that produce a fixed trade-off between storage size and performance, HydraOpt allows us to navigate this spectrum of efficiency and performance. Our experiments show that HydraOpt significantly reduces storage size (48% reduction) compared to storing all adapters, while achieving competitive performance (0.2-1.8% drop). Furthermore, it outperforms existing merging techniques in terms of performance at the same or slightly worse storage efficiency.