🤖 AI Summary
This work proposes RouteMoA, a novel framework that addresses the high computational cost and latency of traditional Mixture-of-Agents (MoA) architectures, which suffer from dense topologies and mandatory pre-inference across all models, hindering scalability to large model pools. RouteMoA introduces the first dynamic routing mechanism that operates without pre-inference: a lightweight scorer performs coarse-grained performance prediction to dynamically select a high-potential subset of models, followed by posterior refinement through self- and cross-evaluations based on generated outputs. A multi-objective ranking strategy balances performance, cost, and latency. Extensive experiments demonstrate that RouteMoA consistently outperforms conventional MoA across diverse tasks and model scales, achieving up to 89.8% cost reduction and 63.6% lower latency in large-scale settings.
📝 Abstract
Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency. RouteMoA outperforms MoA across varying tasks and model pool sizes, reducing cost by 89.8% and latency by 63.6% in the large-scale model pool.