Learning to Cover: Online Learning and Optimization with Irreversible Decisions

📅 2024-06-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the discrete, irreversible online facility location problem: over a finite horizon, facilities must be opened cycle-by-cycle to cover stochastic demands while satisfying probabilistic coverage constraints, minimizing the total number of openings. We propose an asymptotically optimal algorithm integrating online classification learning with chance-constrained optimization. Theoretically, we establish the superiority of a “limited exploration–rapid exploitation” strategy. This work delivers the first precise scaling laws linking learning rate and regret in irreversible coverage decisions: the classifier converges to the Bayes-optimal rule at rate $O(1/sqrt{n})$; the regret scales sublinearly as $Thetaig(m^{(1-r)/(1-r^T)}ig)$; and under customer expansion, convergence to the infinite-horizon limit is exponential. These results provide tight theoretical guarantees and a practical algorithmic framework for large-scale, low-latency coverage decisions.

Technology Category

Application Category

📝 Abstract
We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities $m oinfty$ but a finite horizon $T in mathcal{Z}_+$. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best $mathcal{O}(1/sqrt n)$. Thus, we formulate our online learning and optimization problem, with a generalized learning rate $r>0$ and a residual error $1-p$. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in $Thetaleft(m^{frac{1-r}{1-r^T}} ight)$ if $p=1$ (perfect learning) or in $Thetaleft(maxleft{m^{frac{1-r}{1-r^T}},sqrt{m} ight} ight)$ otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.
Problem

Research questions and friction points this paper is trying to address.

Minimize facility openings under coverage target constraints
Optimize irreversible decisions with online learning convergence
Balance exploration and exploitation in facility location optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning with irreversible coverage decisions
Asymptotically optimal algorithm with learning rate
Limited exploration followed by fast exploitation
🔎 Similar Papers
No similar papers found.