๐ค AI Summary
This paper studies the dynamic allocation of reusable resources to strategic agents with private valuations under multi-dimensional long-term budget constraints, aiming to simultaneously maximize social welfare, strictly satisfy all cost constraints, and incentivize truthful reporting. We propose the first incentive-compatible primal-dual online mechanism for this setting, integrating epoch-based delayed dual updates, lazy updates, randomized exploration, and an online learning subroutine to restore the robustness of primal-dual methods in strategic environments. Theoretically, our mechanism achieves a social welfare regret of $ ilde{mathcal{O}}(sqrt{T})$, satisfies all long-term budget constraints strictly at every time step, and is Bayesian incentive compatible (BIC). Its performance asymptotically approaches that of the optimal offline benchmark in the non-strategic settingโmarking the first result that attains near-optimal long-term resource allocation while provably guaranteeing incentive compatibility.
๐ Abstract
Motivated by applications such as cloud platforms allocating GPUs to users or governments deploying mobile health units across competing regions, we study the dynamic allocation of a reusable resource to strategic agents with private valuations. Our objective is to simultaneously (i) maximize social welfare, (ii) satisfy multi-dimensional long-term cost constraints, and (iii) incentivize truthful reporting. We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings -- agents can easily manipulate their reports to distort future dual updates for future gain.
To address this vulnerability, we develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior. Our design combines epoch-based lazy updates -- where dual variables remain fixed within each epoch -- with randomized exploration rounds that extract approximately truthful signals for learning. Leveraging carefully designed online learning subroutines that can be of independent interest for dual updates, our mechanism achieves $ ilde{mathcal{O}}(sqrt{T})$ social welfare regret, satisfies all cost constraints, and ensures incentive alignment. This matches the performance of non-strategic allocation approaches while being robust to strategic agents.