🤖 AI Summary
Large language models (LLMs) struggle to explicitly model the exploration–exploitation trade-off in Bayesian optimization, leading to unstable and uncontrollable search behavior. This work proposes a multi-agent framework that decouples this trade-off into two distinct roles: policy coordination and candidate generation. The coordinator dynamically assigns interpretable weights to acquisition functions, while the generator conditionally produces candidate solutions based on these weights. By introducing a collaborative multi-agent mechanism, the approach explicitly governs the exploration–exploitation process for the first time, overcoming the cognitive limitations of single-agent prompting and enhancing both interpretability and stability. Experimental results demonstrate that the proposed framework significantly outperforms existing LLM-based methods across multiple continuous black-box optimization benchmarks, effectively avoiding premature convergence and achieving more efficient and robust search performance.
📝 Abstract
The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning over historical evaluations, making search behavior difficult to analyze or control. In this work, we present a metric-level study of LLM-mediated search policy learning, studying how LLMs construct and adapt exploration-exploitation strategies under multiple operational definitions of exploration, including informativeness, diversity, and representativeness. We show that single-agent LLM approaches, which jointly perform strategy selection and candidate generation within a single prompt, suffer from cognitive overload, leading to unstable search dynamics and premature convergence. To address this limitation, we propose a multi-agent framework that decomposes exploration-exploitation control into strategic policy mediation and tactical candidate generation. A strategy agent assigns interpretable weights to multiple search criteria, while a generation agent produces candidates conditioned on the resulting search policy defined as weights. This decomposition renders exploration-exploitation decisions explicit, observable, and adjustable. Empirical results across various continuous optimization benchmarks indicate that separating strategic control from candidate generation substantially improves the effectiveness of LLM-mediated search.