Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic understanding in efficiently fusing large language models (LLMs) fine-tuned with lightweight adapters in multi-task learning, particularly regarding the trade-offs among ensembling, merging, and routing strategies. The study systematically evaluates three parameter-efficient fusion approaches—output ensembling, parameter averaging, and input-dependent routing—and demonstrates that non-uniform fusion consistently outperforms uniform methods, with routing yielding significant performance gains despite its higher computational cost. To reconcile this efficiency–performance trade-off, the authors propose a low-overhead expert selection mechanism that combines clustering with greedy subset selection, achieving near-optimal performance while substantially reducing computational overhead, thereby striking an effective balance between model efficacy and efficiency.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies: ensembling, which combines outputs from independent models; merging, which fuses model weights via parameter averaging; and routing, which integrates models in an input-dependent fashion. However, many design decisions in these approaches remain understudied, and the relative benefits of more sophisticated ensembling, merging and routing techniques are not fully understood. We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? And does the flexibility of routing justify its complexity? Our findings indicate that non-uniform ensembling and merging improve performance, but routing offers even greater gains. To mitigate the computational cost of routing, we analyze expert selection techniques, showing that clustering and greedy subset selection can maintain reasonable performance with minimal overhead. These insights advance our understanding of model fusion for multi-task learning.
Problem

Research questions and friction points this paper is trying to address.

ensembling
merging
routing
parameter-efficient experts
multi-task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

parameter-efficient fine-tuning
model fusion
routing
ensembling
multi-task learning
🔎 Similar Papers
No similar papers found.