🤖 AI Summary
To address the cost-quality trade-off in LLM-based multi-agent collaborative problem-solving, this paper proposes the Fleet of Agents (FoA) framework. FoA dynamically constructs a multi-agent collaboration structure via tree search, incorporates a genetic-inspired particle filtering mechanism for adaptive exploration-exploitation balancing, and introduces a novel heuristic value-function-guided dynamic branching and resampling strategy—achieving, for the first time under multi-LLM settings, Pareto-optimal cost-quality performance. FoA treats LLMs as programmable intelligent agents, enabling parallel inference and online decision optimization. Evaluated on Game of 24, Mini-Crosswords, and WebShop, FoA improves average solution quality by ~5% while reducing computational cost to only 40% of state-of-the-art methods. Remarkably, FoA instantiated with LLaMA3.2-11B surpasses the performance of a standalone LLaMA3.2-90B model.
📝 Abstract
While numerous frameworks have been developed to enhance the reasoning abilities of large language models (LLMs), there is a scarcity of methods that effectively balance the trade-off between cost and quality. In this paper, we introduce Fleet of Agents (FoA), a novel and intuitive yet principled framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. FoA spawns a multitude of agents, each exploring the search space autonomously, followed by a selection phase where resampling based on a heuristic value function optimizes the balance between exploration and exploitation. This mechanism enables dynamic branching, adapting the exploration strategy based on discovered solutions. We conduct extensive experiments on three benchmark tasks, ``Game of 24'', ``Mini-Crosswords'', and ``WebShop'', utilizing four different LLMs, ``GPT-3.5'', ``GPT-4'', ``LLaMA3.2-11B'', and ``LLaMA3.2-90B''. On average across all tasks and LLMs, FoA obtains a quality improvement of ~5% while requiring only ~40% of the cost of previous SOTA methods. Notably, our analyses reveal that (1) FoA achieves the best cost-quality trade-off among all benchmarked methods and (2) FoA + LLaMA3.2-11B surpasses the Llama3.2-90B model. FoA is publicly available at https://github.com/au-clan/FoA.