π€ AI Summary
This work addresses the practical limitations of conventional distributionally robust optimization (DRO) under distributional shift, where the worst-case risk becomes unbounded due to the Huber contamination model. To overcome this, the authors propose a data-driven approach that learns a high-probability βcore setβ to separately account for inlier contamination and tail contributions, yielding a finite and computable robust objective. They innovatively introduce a mass-calibrated ambiguity set, establishing an equivalence between expected loss and worst-case risk under imprecise probability, thereby endowing DRO with a clear tolerance-based interpretation. The method bridges imprecise probability theory and DRO, accommodating Bayesian, frequentist, and empirical reference distributions, and is efficiently solvable via linear or second-order cone programming. Experiments on heavy-tailed inventory control, geographically shifted housing price prediction, and demographically shifted text classification demonstrate substantial improvements in balancing robustness and accuracy.
π Abstract
Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set that can capture distributional shifts in out-of-sample environments. While Huber (linear-vacuous) contamination is a classical minimal-assumption model for an $\varepsilon$-fraction of arbitrary perturbations, including it in an ambiguity set can make the worst-case risk infinite and the DRO objective vacuous unless one imposes strong boundedness or support assumptions. We address these challenges by introducing bulk-calibrated credal ambiguity sets: we learn a high-mass bulk set from data while considering contamination inside the bulk and bounding the remaining tail contribution separately. This leads to a closed-form, finite $\mathrm{mean}+\sup$ robust objective and tractable linear or second-order cone programs for common losses and bulk geometries. Through this framework, we highlight and exploit the equivalence between the imprecise probability (IP) notion of upper expectation and the worst-case risk, demonstrating how IP credal sets translate into DRO objectives with interpretable tolerance levels. Experiments on heavy-tailed inventory control, geographically shifted house-price regression, and demographically shifted text classification show competitive robustness-accuracy trade-offs and efficient optimisation times, using Bayesian, frequentist, or empirical reference distributions.