🤖 AI Summary
This work addresses the limitations of insufficient diversity and inconsistent quality in responses generated under open-ended prompts by introducing a dynamic routing mechanism that adaptively selects the optimal large language model based on the input query. The study makes two key contributions: first, it proposes “diversity coverage” as a novel metric to evaluate the comprehensiveness and quality of generated response sets; second, it presents the first query-aware multi-model router that leverages prompt engineering and machine learning strategies for adaptive model selection. Experimental results demonstrate that the proposed approach achieves a performance of 26.3% on NB-WildChat, significantly outperforming the best single model (23.8%), and exhibits strong generalization capabilities on the out-of-domain dataset NB-Curated.
📝 Abstract
When posed with prompts that permit a large number of valid answers, comprehensively generating them is the first step towards satisfying a wide range of users. In this paper, we study methods to elicit a comprehensive set of valid responses. To evaluate this, we introduce \textbf{diversity coverage}, a metric that measures the total quality scores assigned to each \textbf{unique} answer in the predicted answer set relative to the best possible answer set with the same number of answers. Using this metric, we evaluate 18 LLMs, finding no single model dominates at generating diverse responses to a wide range of open-ended prompts. Yet, per each prompt, there exists a model that outperforms all other models significantly at generating a diverse answer set. Motivated by this finding, we introduce a router that predicts the best model for each query. On NB-Wildchat, our trained router outperforms the single best model baseline (26.3% vs $23.8%). We further show generalization to an out-of-domain dataset (NB-Curated) as well as different answer-generation prompting strategies. Our work lays foundation for studying generating comprehensive answers when we have access to a suite of models.