🤖 AI Summary
This work proposes the first fully unsupervised prompt agent framework to address the challenge of effectively exploring high-quality prompts in the absence of supervised reward signals. The approach formulates prompt optimization as a sequential decision-making problem within a structured prompt space, integrating tree search with a two-stage selection mechanism. It first employs large language models to perform fine-grained, unordered pairwise comparisons, then resolves the inconsistency of local comparisons through path-level Bayesian aggregation and global tournament ranking based on the Bradley–Terry–Luce model, thereby enabling globally coherent prompt scoring. Experimental results demonstrate that the method significantly outperforms existing prompt optimization techniques across multiple tasks, validating the efficacy and superiority of agent-based prompt optimization in unsupervised settings.
📝 Abstract
Prompt agents have recently emerged as a promising paradigm for automated prompt optimization, framing refinement as a sequential decision-making problem over a structured prompt space. While this formulation enables the use of advanced planning algorithms, these methods typically assume access to supervised reward signals, which are often unavailable in practical scenarios. In this work, we propose UPA, an Unsupervised Prompt Agent that realizes structured search and selection without relying on supervised feedback. Specifically, during search, UPA iteratively constructs an evolving tree structure to navigate the prompt space, guided by fine-grained and order-invariant pairwise comparisons from Large Language Models (LLMs). Crucially, as these local comparisons do not inherently yield a consistent global scale, we decouple systematic prompt exploration from final selection, introducing a two-stage framework grounded in the Bradley-Terry-Luce (BTL) model. This framework first performs path-wise Bayesian aggregation of local comparisons to filter candidates under uncertainty, followed by global tournament-style comparisons to infer latent prompt quality and identify the optimal prompt. Experiments across multiple tasks demonstrate that UPA consistently outperforms existing prompt optimization methods, showing that agent-style optimization remains highly effective even in fully unsupervised settings.