Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This paper studies budget allocation in crowdsourcing and multi-platform automated bidding: given a fixed total budget, how to dynamically allocate funds across $K$ tasks (or ad auctions) to maximize cumulative reward. The key challenge is that task completion probabilities depend on the relative budget allocation and exhibit diminishing returns. To address this, we formulate budget allocation as a stochastic multi-armed bandit problem over the $K$-dimensional probability simplex—the first such modeling—and propose a novel upper confidence bound algorithm. Our theoretical analysis establishes a fundamental regret bound of $mathcal{O}(Ksqrt{T})$; under the diminishing returns assumption, we improve it to $mathcal{O}(K(log T)^2)$, matching the current state-of-the-art. This work achieves a theoretical breakthrough for stochastic bandits under simplex constraints and introduces a new paradigm for budget-sensitive sequential decision-making.

Technology Category

Application Category

📝 Abstract

Motivated by applications in crowdsourcing, where a fixed sum of money is split among $K$ workers, and autobidding, where a fixed budget is used to bid in $K$ simultaneous auctions, we define a stochastic bandit model where arms belong to the $K$-dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of $K$ stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after $T$ steps is of order $Ksqrt{T}$ (up to log factors) and prove a matching lower bound. Improved bounds of order $K (log T)^2$ are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.

Problem

Research questions and friction points this paper is trying to address.

Optimizing budget allocation across multiple tasks or auctions

Maximizing total reward under stochastic unlocking probabilities

Developing efficient algorithms with provable regret bounds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic bandit model with budget allocation on simplex

Algorithm achieves order K√T regret with matching bound

Improved K(log T)² regret under diminishing returns

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow