Online Learning with Probing for Sequential User-Centric Selection

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

We address sequential decision-making problems—such as ride-hailing dispatch, wireless communication scheduling, and content recommendation—where both resource availability and rewards are unknown a priori and probing incurs high cost. To this end, we propose the Probe-Enhanced User-Centric Selection (PUCS) framework, enabling a two-phase “probe-then-assign” policy. First, we formulate PUCS as a unified optimization model and design an offline greedy probing algorithm achieving a constant approximation ratio ζ = (e−1)/(2e−1). Second, we develop the online learning algorithm OLPA, attaining a tight regret bound of O(√T + ln²T), with theoretical proof that its logarithmic factor is optimal. Leveraging combinatorial stochastic bandit learning, adaptive information probing strategies, and rigorous probabilistic analysis, OLPA significantly outperforms state-of-the-art baselines on real-world datasets, empirically validating both the efficacy of proactive probing and the user-centric assignment paradigm.

Technology Category

Application Category

📝 Abstract

We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns $K$ plays to $M$ arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee $ζ= (e-1)/(2e-1)$. For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound $mathcal{O}(sqrt{T} + ln^{2} T)$. We also prove a lower bound $Ω(sqrt{T})$, showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.

Problem

Research questions and friction points this paper is trying to address.

Sequential decision-making with costly information acquisition

Optimizing probing and selection in unknown resource-payoff scenarios

Balancing exploration-exploitation in online combinatorial bandit settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probing-augmented user-centric selection framework

Greedy probing algorithm with approximation guarantee

OLPA algorithm with tight regret bound

🔎 Similar Papers

No similar papers found.