COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the limitation of existing in-context example selection methods, which optimize solely for prediction accuracy while neglecting model calibration. We propose the first multi-objective optimization framework that jointly enhances both accuracy and calibration. Methodologically, we formulate example selection as a joint optimization problem and introduce COM-BOM—a combinatorial Bayesian optimization algorithm—designed to simultaneously minimize expected calibration error (ECE) and maximize accuracy. COM-BOM efficiently approximates the accuracy–calibration Pareto frontier while drastically reducing LLM API calls. On benchmarks including MMLU-Pro, our approach achieves superior or on-par trade-off performance with significantly fewer API invocations compared to state-of-the-art baselines. To our knowledge, this is the first method to achieve synergistic improvement in both trustworthiness (via calibration) and effectiveness (via accuracy) of in-context example selection.

Technology Category

Application Category

📝 Abstract

Selecting an optimal set of exemplars is critical for good performance of in-context learning. However, prior exemplar search methods narrowly optimize for predictive accuracy, critically neglecting model calibration--a key determinant of trustworthiness and safe deployment. In this paper, we formulate exemplar selection as a multi-objective optimization problem, explicitly targeting both the maximization of predictive accuracy and the minimization of expected calibration error. We solve this problem with a sample-efficient Combinatorial Bayesian Optimization algorithm (COM-BOM) to find the Pareto front that optimally trades off the two objectives of accuracy and calibration. We evaluate COM-BOM on multiple tasks from unsaturated MMLU-Pro benchmark and find that COM-BOM beats or matches the baselines at jointly optimizing the two objectives, while requiring a minimal number of LLM API calls.

Problem

Research questions and friction points this paper is trying to address.

Optimizing exemplar selection for accuracy and calibration

Solving multi-objective optimization of predictive performance

Efficiently exploring Pareto frontier with minimal API calls

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian optimization for exemplar selection

Multi-objective optimization balancing accuracy-calibration

Efficient Pareto frontier exploration with minimal API calls

🔎 Similar Papers

Calibration in Deep Learning: A Survey of the State-of-the-Art