Scheduling Servers with Stochastic Bilinear Rewards

📅 2021-12-13
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This paper addresses online scheduling in a parallel queue system with multiple job classes and multiple servers, where rewards are unknown, dynamically stochastic, and exhibit a bilinear structure. The objective is to jointly maximize cumulative reward and minimize job holding delay (i.e., holding cost), while ensuring system stability—namely, throughput optimality and bounded queue lengths. We propose the first distributed algorithm integrating three key components: (i) dynamic learning of bilinear bandit rewards, (ii) weighted proportional-fair scheduling, and (iii) marginal-cost correction. Theoretically, the algorithm achieves a sublinear regret bound and guarantees bounded expected queue lengths. Empirically, it significantly outperforms existing baselines in both cumulative reward and average delay across computational service and online platform scenarios.
📝 Abstract
We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features characterizing jobs and servers. Our objective is regret minimization, aiming to maximize the cumulative reward of job-server assignments over a time horizon while maintaining a bounded total job holding cost, thus ensuring queueing system stability. This problem is motivated by applications in computing services and online platforms. To address this problem, we propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward maximization, incorporating a bandit strategy. Our algorithm achieves sub-linear regret and sub-linear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we validate the efficiency of our algorithm through numerical experiments.
Problem

Research questions and friction points this paper is trying to address.

Scheduling in multi-class parallel-server queues with uncertain rewards
Minimizing regret while keeping job holding costs bounded
Balancing reward maximization and fair allocation for system stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses bilinear bandit algorithm for rewards
Applies weighted proportional fair scheduling
Ensures stability with sub-linear regret bounds
🔎 Similar Papers
No similar papers found.