🤖 AI Summary
This work addresses the limitation of existing in-context example selection methods, which optimize solely for prediction accuracy while neglecting model calibration. We propose the first multi-objective optimization framework that jointly enhances both accuracy and calibration. Methodologically, we formulate example selection as a joint optimization problem and introduce COM-BOM—a combinatorial Bayesian optimization algorithm—designed to simultaneously minimize expected calibration error (ECE) and maximize accuracy. COM-BOM efficiently approximates the accuracy–calibration Pareto frontier while drastically reducing LLM API calls. On benchmarks including MMLU-Pro, our approach achieves superior or on-par trade-off performance with significantly fewer API invocations compared to state-of-the-art baselines. To our knowledge, this is the first method to achieve synergistic improvement in both trustworthiness (via calibration) and effectiveness (via accuracy) of in-context example selection.
📝 Abstract
Selecting an optimal set of exemplars is critical for good performance of in-context learning. However, prior exemplar search methods narrowly optimize for predictive accuracy, critically neglecting model calibration--a key determinant of trustworthiness and safe deployment. In this paper, we formulate exemplar selection as a multi-objective optimization problem, explicitly targeting both the maximization of predictive accuracy and the minimization of expected calibration error. We solve this problem with a sample-efficient Combinatorial Bayesian Optimization algorithm (COM-BOM) to find the Pareto front that optimally trades off the two objectives of accuracy and calibration. We evaluate COM-BOM on multiple tasks from unsaturated MMLU-Pro benchmark and find that COM-BOM beats or matches the baselines at jointly optimizing the two objectives, while requiring a minimal number of LLM API calls.