Pandora's Regret: A Proper Scoring Rule for Evaluating Sequential Search

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Standard scoring rules such as log loss disregard class ranking, leading to a misalignment between model evaluation and practical utility in sequential search tasks. This work addresses this gap by proposing Pandora’s Regret—a closed-form, pairwise-additive, and strictly proper scoring rule tailored for multi-class sequential search. Formulated through expected search cost modeling, it introduces an adjustable penalty mechanism via Beta distribution parameterization, jointly optimizing ranking performance and probability calibration. Empirical evaluation across 597 models on MedMNIST demonstrates that Pandora’s Regret more accurately predicts clinical diagnostic costs compared to conventional metrics, effectively aligning evaluation objectives with the demands of search-based decision-making.

📝 Abstract

In sequential search, alternatives are tested until the true class is found. Standard proper scoring rules like log loss are local, ignoring the ranking of competitors and misaligning model evaluation with search utility. We show that sequential search induces a pairwise structure that overcomes this. By analyzing the expected cost of optimal search under varying testing costs, we derive Pandora's Regret: a closed-form, pairwise-additive, and strictly proper scoring rule. Pandora's Regret both elicits true probabilities and penalizes rank-reversing miscalibrations where distractors outrank the true class. Our construction yields a one-parameter Beta family that balances penalties for rank-swapping versus probability magnitude, while retaining a grounded interpretation as expected search cost. We prove that log loss, accuracy, and macro-F1 rely on implicit decision models misaligned with sequential search. Across 597 MedMNIST models, Pandora-based metrics better predict clinical diagnostic costs than standard alternatives, extending decision-theoretic scoring rule construction to the multiclass setting.

Problem

Research questions and friction points this paper is trying to address.

sequential search

proper scoring rule

rank calibration

expected search cost

multiclass evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pandora's Regret

proper scoring rule

sequential search