The Sample Complexity of Multicalibration

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work investigates the minimax sample complexity of multi-calibration in the batch setting—specifically, the minimum number of samples required to achieve expected calibration error (ECE) at most ε uniformly over a given class of groups. By constructing randomized predictors via online-to-batch conversion and employing minimax analysis, the authors develop a unified framework applicable to weighted Lₚ multi-calibration (1 ≤ p ≤ 2) and elicitable properties such as expectiles and bounded-density quantiles. Their main contributions include establishing a Θ̃(ε⁻³) sample complexity when |G| ≤ ε⁻ᵏ for any κ > 0—significantly higher than the Θ̃(ε⁻²) rate for marginal calibration—and revealing a sharp threshold at κ = 0 where the complexity transitions from ε⁻² to ε⁻³. They also provide tight upper and lower bounds with exponent 3/p for Lₚ metrics, precisely characterizing the complexity in both batch and online settings.

Technology Category

Application Category

📝 Abstract

We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\varepsilon$ with respect to a given family of groups. For every fixed $κ> 0$, in the regime $|G|\le \varepsilon^{-κ}$, we prove that $\widetildeΘ(\varepsilon^{-3})$ samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as $\widetildeΘ(\varepsilon^{-2})$, and shows that mean-ECE multicalibration is as difficult in the batch setting as it is in the online setting, in contrast to marginal calibration which is strictly more difficult in the online setting. In contrast we observe that for $κ= 0$, the sample complexity of multicalibration remains $\widetildeΘ(\varepsilon^{-2})$ exhibiting a sharp threshold phenomenon. More generally, we establish matching upper and lower bounds, up to polylogarithmic factors, for a weighted $L_p$ multicalibration metric for all $1 \le p \le 2$, with optimal exponent $3/p$. We also extend the lower-bound template to a regular class of elicitable properties, and combine it with the online upper bounds of Hu et al. (2025) to obtain matching bounds for calibrating properties including expectiles and bounded-density quantiles.

Problem

Research questions and friction points this paper is trying to address.

multicalibration

sample complexity

Expected Calibration Error

batch learning

minimax

Innovation

Methods, ideas, or system contributions that make the work stand out.

multicalibration

sample complexity

expected calibration error