🤖 AI Summary
To address the challenge of efficiently selecting the optimal large language model (LLM) for a specific task under limited annotation resources, this paper proposes LLM SELECTOR—the first active learning framework tailored for LLM selection. It employs an adaptive query selection strategy to identify the most discriminative input instances and introduces a lightweight, judge-style oracle model to replace human annotators, substantially reducing annotation costs. Its core contribution lies in the systematic integration of active learning into LLM evaluation and selection—departing from conventional static benchmarking approaches that rely on exhaustive human annotations. Extensive experiments across six benchmarks and 151 LLMs demonstrate that LLM SELECTOR achieves high selection accuracy while reducing annotation requirements by up to 59.62%, thereby enabling a synergistic optimization of both precision and efficiency.
📝 Abstract
We introduce LLM SELECTOR, the first framework for active model selection of Large Language Models (LLMs). Unlike prior evaluation and benchmarking approaches that rely on fully annotated datasets, LLM SELECTOR efficiently identifies the best LLM with limited annotations. In particular, for any given task, LLM SELECTOR adaptively selects a small set of queries to annotate that are most informative about the best model for the task. To further reduce annotation cost, we leverage a judge-based oracle annotation model. Through extensive experiments on 6 benchmarks with 151 LLMs, we show that LLM SELECTOR reduces annotation costs by up to 59.62% when selecting the best and near-best LLM for the task.