Learning to Cover: Online Learning and Optimization with Irreversible Decisions

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This paper studies the discrete, irreversible online facility location problem: over a finite horizon, facilities must be opened cycle-by-cycle to cover stochastic demands while satisfying probabilistic coverage constraints, minimizing the total number of openings. We propose an asymptotically optimal algorithm integrating online classification learning with chance-constrained optimization. Theoretically, we establish the superiority of a “limited exploration–rapid exploitation” strategy. This work delivers the first precise scaling laws linking learning rate and regret in irreversible coverage decisions: the classifier converges to the Bayes-optimal rule at rate $O(1/sqrt{n})$; the regret scales sublinearly as $Thetaig(m^{(1-r)/(1-r^T)}ig)$; and under customer expansion, convergence to the infinite-horizon limit is exponential. These results provide tight theoretical guarantees and a practical algorithmic framework for large-scale, low-latency coverage decisions.

Technology Category

Application Category

📝 Abstract

We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities $m oinfty$ but a finite horizon $T in mathcal{Z}_+$. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best $mathcal{O}(1/sqrt n)$. Thus, we formulate our online learning and optimization problem, with a generalized learning rate $r>0$ and a residual error $1-p$. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in $Thetaleft(m^{frac{1-r}{1-r^T}} ight)$ if $p=1$ (perfect learning) or in $Thetaleft(maxleft{m^{frac{1-r}{1-r^T}},sqrt{m} ight} ight)$ otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.

Problem

Research questions and friction points this paper is trying to address.

Minimize facility openings under coverage target constraints

Optimize irreversible decisions with online learning convergence

Balance exploration and exploitation in facility location optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning with irreversible coverage decisions

Asymptotically optimal algorithm with learning rate

Limited exploration followed by fast exploitation

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique