🤖 AI Summary
This work addresses the challenge of efficient prompt selection in large language model prompting under multi-objective performance evaluation. To overcome the limitations of single-metric assessment, it introduces, for the first time, a multi-objective pure exploration multi-armed bandit framework to prompt optimization. The proposed approach targets two key tasks: recovering the Pareto-optimal set of prompts and identifying the best feasible prompt. Novel algorithms are developed under structured reward assumptions, accompanied by theoretical error guarantees in the linear reward setting. Experimental results across multiple large language models demonstrate that the method significantly outperforms existing baselines, establishing an efficient and theoretically grounded framework for multi-objective prompt optimization.
📝 Abstract
Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection -- efficiently identifying the most effective prompts. However, most prior investigations overlook a key challenge: the inherently multi-faceted nature of prompt performance, which cannot be captured by a single metric. To fill this gap, we study the multi-objective prompt selection problem under two practical settings: Pareto prompt set recovery and best feasible prompt identification. Casting the problem into the pure-exploration bandits framework, we adapt provably efficient algorithms from multi-objective bandits and further introduce a novel design for best feasible arm identification in structured bandits, with theoretical guarantees on the identification error in the linear case. Extensive experiments across multiple LLMs show that the bandit-based approaches yield significant improvements over baselines, establishing a principled and efficient framework for multi-objective prompt optimization.