🤖 AI Summary
This work addresses the high cost of acquiring human preference data for aligning large language models by introducing active learning methods tailored specifically to the structural characteristics of preference learning. Moving beyond conventional G- or D-optimality criteria, the paper proposes two novel algorithms: the first provides the first instance-dependent theoretical guarantee on label complexity for preference learning, and the second offers an efficient greedy query strategy suitable for practical deployment. Experiments on real-world preference datasets demonstrate that the proposed approaches substantially improve sample efficiency, achieving strong empirical performance while maintaining rigorous theoretical foundations.
📝 Abstract
Aligning large language models (LLMs) depends on high-quality datasets of human preference labels, which are costly to collect. Although active learning has been studied to improve sample efficiency relative to passive collection, many existing approaches adopt classical experimental design criteria such as G- or D-optimality. These objectives are not tailored to the structure of preference learning, leaving open the design of problem-specific algorithms. In this work, we identify a simple intuition specific to preference learning that calls into question the suitability of these existing design objectives. Motivated by this insight, we propose two active learning algorithms. The first provides the first instance-dependent label complexity guarantee for this setting, and the second is a simple, practical greedy method. We evaluate our algorithm on real-world preference datasets and observe improved sample efficiency compared to existing methods.