Large Language Model Selection with Limited Annotations

๐Ÿ“… 2026-05-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high annotation cost and inefficiency of traditional evaluation methods for large language models (LLMs). It proposes SELECT-LLM, a novel framework that introduces, for the first time, a universal active selection mechanism applicable to both open-source and black-box models without requiring access to model weights or architectural assumptions. The approach leverages pairwise output similarity among candidate models to design an expected information gainโ€“driven query strategy, enabling the identification of the optimal or near-optimal model with minimal annotation effort. Extensive experiments across 23 datasets and 156 models demonstrate that SELECT-LLM reduces annotation costs by up to 81.8% when selecting the best model and by up to 84.78% when identifying a near-optimal one, substantially outperforming existing baselines.
๐Ÿ“ Abstract
Choosing a Large Language Model (LLM) for a given task requires comparing many strong candidates, yet standard evaluation relies on costly annotations over fixed evaluation sets. To address this challenge, we develop SELECT-LLM, the first framework for active model selection of LLMs. SELECT-LLM aims to find a small set of queries whose annotations are most informative for identifying the best LLM for a given task. To this end, we introduce a query selection rule based on expected information gain, computed from pairwise similarities between candidate model outputs. Because this rule only uses generated model responses, SELECT-LLM can be applied across candidate models without assumptions about their architecture or access to model weights. This makes it suitable for both open-weight and black-box LLMs. We evaluate SELECT-LLM across 23 datasets, 156 evaluated models, diverse task families, and multiple text evaluation metrics. Across all experiments, SELECT-LLM improves over the strongest baseline in every setting, with annotation cost reductions up to 81.8% for best model selection and up to 84.78% for near-best model selection.
Problem

Research questions and friction points this paper is trying to address.

Large Language Model Selection
Limited Annotations
Active Model Selection
Evaluation Efficiency
Query Selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

active model selection
large language models
expected information gain
annotation efficiency
query selection
๐Ÿ”Ž Similar Papers
2024-06-17Conference on Empirical Methods in Natural Language ProcessingCitations: 3