🤖 AI Summary
This paper studies online decision-making under resource constraints where rewards and costs evolve adversarially over time—a setting in which standard regret bounds fail to guarantee sublinear growth. To address this, we propose a novel learning paradigm guided by a *prescribed expenditure plan* as the benchmark—first of its kind. Our method leverages a primal-dual framework with three key innovations: (i) budget-balance awareness, (ii) robust perturbation compensation, and (iii) dynamic step-size adaptation. Under both full-information and bandit-feedback settings, the algorithm achieves $O(sqrt{T})$ sublinear regret relative to the plan-following benchmark. Theoretical analysis establishes robustness under highly skewed budget allocations, and empirical evaluation demonstrates substantial improvements over plan-agnostic baselines. Our core contribution is the formalization of plan-driven regulation mechanisms and a systematic characterization of regret bounds under deviation from the prescribed expenditure plan.
📝 Abstract
We study online decision making problems under resource constraints, where both reward and cost functions are drawn from distributions that may change adversarially over time. We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback. It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time. To address this challenge, we analyze a framework in which the learner is guided by a spending plan--a sequence prescribing expected resource usage across rounds. We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget across rounds. We additionally provide a robust variant of our methods to handle worst-case scenarios where the spending plan is highly imbalanced. To conclude, we study the regret of our algorithms when competing against benchmarks that deviate from the prescribed spending plan.