🤖 AI Summary
This work addresses the problem of selecting a subset of options under extremely limited feedback—such as bandit or semi-bandit observations—in online settings, with the goal of guaranteeing that at least one “successful” option is identified with probability no less than a pre-specified target, while minimizing resource consumption. To this end, the authors propose a unified algorithmic framework based on Adaptive Conformal Inference (ACI), which dynamically updates control parameters or dual variables to ensure average validity over arbitrary input sequences and achieves sublinear efficiency regret under i.i.d. assumptions. Notably, this approach is the first to simultaneously attain adversarial robustness and stochastic efficiency under minimal feedback, providing guaranteed success rates and resource savings without requiring strong distributional assumptions, thereby outperforming existing methods both theoretically and empirically.
📝 Abstract
We address the problem of conformal selection, where an agent must select a minimal subset of options to ensure that at least one ``success'' is identified with a pre-specified target probability $φ$. While traditional online conformal prediction focuses on maintaining validity for the observed sequence, minimizing the resource cost (efficiency) of such selections, especially under limited feedback, remains a significant challenge. In this work, we consider settings with the most limited ``bandit'' feedback, and demonstrate that the simple Adaptive Conformal Inference (ACI) update rule, when applied to the appropriate control parameter or dual variable, is both adversarially valid, ensuring the success target is met on average for any input sequence (and hence under distribution shifts), and stochastically efficient, achieving sublinear efficiency regret for $i.i.d.$ inputs against an appropriate stochastic benchmark. We show such guarantees under canonical models capturing bandit and semi-bandit feedback to the agent via a unifying algorithmic technique, and analytic framework involving Lyapunov functions. Our approach handles more complex settings than prior work, while requiring significantly less feedback, and our results provide a new theoretical bridge between efficient online learning with limited feedback and distribution-free uncertainty quantification.