Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies budget allocation in crowdsourcing and multi-platform automated bidding: given a fixed total budget, how to dynamically allocate funds across $K$ tasks (or ad auctions) to maximize cumulative reward. The key challenge is that task completion probabilities depend on the relative budget allocation and exhibit diminishing returns. To address this, we formulate budget allocation as a stochastic multi-armed bandit problem over the $K$-dimensional probability simplex—the first such modeling—and propose a novel upper confidence bound algorithm. Our theoretical analysis establishes a fundamental regret bound of $mathcal{O}(Ksqrt{T})$; under the diminishing returns assumption, we improve it to $mathcal{O}(K(log T)^2)$, matching the current state-of-the-art. This work achieves a theoretical breakthrough for stochastic bandits under simplex constraints and introduces a new paradigm for budget-sensitive sequential decision-making.

Technology Category

Application Category

📝 Abstract
Motivated by applications in crowdsourcing, where a fixed sum of money is split among $K$ workers, and autobidding, where a fixed budget is used to bid in $K$ simultaneous auctions, we define a stochastic bandit model where arms belong to the $K$-dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of $K$ stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after $T$ steps is of order $Ksqrt{T}$ (up to log factors) and prove a matching lower bound. Improved bounds of order $K (log T)^2$ are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.
Problem

Research questions and friction points this paper is trying to address.

Optimizing budget allocation across multiple tasks or auctions
Maximizing total reward under stochastic unlocking probabilities
Developing efficient algorithms with provable regret bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic bandit model with budget allocation on simplex
Algorithm achieves order K√T regret with matching bound
Improved K(log T)² regret under diminishing returns
🔎 Similar Papers
No similar papers found.