Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies online learning with delayed feedback under capacity constraints—i.e., the system can retain at most $C$ rounds of historical feedback at any time. We introduce the first unified capacity-constrained model, subsuming delayed multi-armed bandits, label-efficient learning, and online scheduling. Our approach integrates Pareto-distributed proxy delays, dynamic batched scheduling, and Clairvoyant/Preemptible feedback mechanisms, coupled with information-theoretic lower bounds and adaptive tracking strategies. We precisely characterize the minimal required capacity as a function of delay structure. For $K$ actions, $T$ rounds, and total delay $D$, we derive tight regret bounds: $widetilde{Theta}ig(sqrt{TK + DK/C + Dlog K}ig)$ for bandit feedback and $widetilde{Theta}ig(sqrt{(D+T)log K}ig)$ for full-information feedback—demonstrating graceful regret degradation with increasing capacity $C$.

Technology Category

Application Category

📝 Abstract
We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = Omega(log(T))$, we achieve regret $widetilde{Theta}(sqrt{TK + DK/C + Dlog(K)})$ for bandits and $widetilde{Theta}(sqrt{(D+T)log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{max}$, adding $smash{widetilde{O}(d_{max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$), the minimax regret is $Thetaigl(sqrt{TK(1+d/C)+Tdlog(K)}igr)$ and the optimal capacity is $Theta(min{K/log(K),d}igr)$ in the bandit setting, while in the full-information setting, the minimax regret is $Thetaigl(sqrt{T(d+1)log(K)}igr)$ and the optimal capacity is $Theta(1)$. For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.
Problem

Research questions and friction points this paper is trying to address.

Study online learning with capacity constraints and delays
Establish optimal regret bounds under clairvoyance and preemptibility
Unify delayed bandits, label-efficient learning, and scheduling frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Capacity-constrained online learning with delays
Minimax-optimal regret with limited tracking capacity
Novel scheduling policies using proxy delays
🔎 Similar Papers
No similar papers found.