🤖 AI Summary
This work addresses the challenge of efficiently selecting high-performing small-scale AI ensembles under unknown task distributions while minimizing costly model queries and human evaluations. The problem is formulated as a distributed multi-winner voting setting, aiming to identify committee members whose collective performance approximates that of the optimal subset. Two feedback settings are considered: binary and pairwise. For binary feedback, a failure-aware greedy algorithm is proposed with a (1−1/e) approximation guarantee, substantially reducing query complexity. For pairwise feedback, a weighted ordinal coverage relaxation is introduced, combined with either limited-family auditing or a minimax wrapper to recover θ-optimality. Theoretical analysis establishes lower bounds on query complexity and provides approximation guarantees, while experiments demonstrate the approach’s effectiveness in achieving high query efficiency and leveraging expert complementarity.
📝 Abstract
Organizations increasingly deploy multiple AI systems across task domains, but selecting a small, high-performing ensemble can require costly model calls, benchmark runs, and human evaluation. We study this selection problem as a distributional variant of multiwinner voting: tasks are drawn from an unknown domain distribution, each task induces feedback over candidate experts, and a committee's value on a task is determined by its best-performing member. We analyze both binary feedback, for tasks with correct/incorrect outcomes, and pairwise feedback, for tasks where candidate outputs are compared by preference. In the binary setting, the induced objective is coverage. We give exhaustive-elicitation baselines and matching worst-case query lower bounds, and we design a failure-conditioned greedy algorithm that preserves the standard $(1-1/e)$ guarantee while obtaining instance-dependent query savings. In the pairwise setting, we study $θ$-winning committees. We show that full-information optimization admits a PTAS but no EPTAS under Gap-ETH, and that the objective is monotone but not submodular. This motivates a weighted ordinal coverage relaxation, which is submodular and supports a failure-conditioned greedy oracle under pairwise feedback. We then convert this oracle back into $θ$-type guarantees through finite-family auditing or a minimax wrapper. We also provide small-scale LLM experiments illustrating the predicted query savings and the role of complementarity in committee selection.