UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work proposes the first fully unsupervised prompt agent framework to address the challenge of effectively exploring high-quality prompts in the absence of supervised reward signals. The approach formulates prompt optimization as a sequential decision-making problem within a structured prompt space, integrating tree search with a two-stage selection mechanism. It first employs large language models to perform fine-grained, unordered pairwise comparisons, then resolves the inconsistency of local comparisons through path-level Bayesian aggregation and global tournament ranking based on the Bradley–Terry–Luce model, thereby enabling globally coherent prompt scoring. Experimental results demonstrate that the method significantly outperforms existing prompt optimization techniques across multiple tasks, validating the efficacy and superiority of agent-based prompt optimization in unsupervised settings.

Technology Category

Application Category

📝 Abstract

Prompt agents have recently emerged as a promising paradigm for automated prompt optimization, framing refinement as a sequential decision-making problem over a structured prompt space. While this formulation enables the use of advanced planning algorithms, these methods typically assume access to supervised reward signals, which are often unavailable in practical scenarios. In this work, we propose UPA, an Unsupervised Prompt Agent that realizes structured search and selection without relying on supervised feedback. Specifically, during search, UPA iteratively constructs an evolving tree structure to navigate the prompt space, guided by fine-grained and order-invariant pairwise comparisons from Large Language Models (LLMs). Crucially, as these local comparisons do not inherently yield a consistent global scale, we decouple systematic prompt exploration from final selection, introducing a two-stage framework grounded in the Bradley-Terry-Luce (BTL) model. This framework first performs path-wise Bayesian aggregation of local comparisons to filter candidates under uncertainty, followed by global tournament-style comparisons to infer latent prompt quality and identify the optimal prompt. Experiments across multiple tasks demonstrate that UPA consistently outperforms existing prompt optimization methods, showing that agent-style optimization remains highly effective even in fully unsupervised settings.

Problem

Research questions and friction points this paper is trying to address.

unsupervised prompt optimization

prompt agents

tree-based search

reward-free learning

LLM-based comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised Prompt Optimization

Tree-Based Search

Pairwise Comparison