🤖 AI Summary
This study investigates the learning dynamics and equilibrium convergence of agents in repeated proportional allocation auctions under logarithmic utility, a form derived from the fairness–throughput trade-off in wireless network slicing. The paper establishes, for the first time, the existence and uniqueness of a Nash equilibrium under this utility structure. Theoretical analysis demonstrates that algorithms such as Online Gradient Descent (OGD), Dual Averaging with Quadratic regularization (DAQ), and myopic Best Response (BR) all converge to this unique equilibrium even under heterogeneous learning rates, whereas mixed update rules may fail to converge. Extensive simulations further confirm that BR achieves the fastest convergence and yields the highest time-averaged utility, underscoring the critical impact of algorithmic choice on system performance.
📝 Abstract
The Kelly or proportional allocation mechanism is a simple and efficient auction-based scheme that distributes an infinitely divisible resource proportionally to the agents bids. When agents are aware of the allocation rule, their interactions form a game extensively studied in the literature. This paper examines the less explored repeated Kelly game, focusing mainly on utilities that are logarithmic in the allocated resource fraction. We first derive this logarithmic form from fairness-throughput trade-offs in wireless network slicing, and then prove that the induced stage game admits a unique Nash equilibrium NE. For the repeated play, we prove convergence to this NE under three behavioral models: (i) all agents use Online Gradient Descent (OGD), (ii) all agents use Dual Averaging with a quadratic regularizer (DAQ) (a variant of the Follow-the-Regularized leader algorithm), and (iii) all agents play myopic best responses (BR). Our convergence results hold even when agents use personalized learning rates in OGD and DAQ (e.g., tuned to optimize individual regret bounds), and they extend to a broader class of utilities that meet a certain sufficient condition. Finally, we complement our theoretical results with extensive simulations of the repeated Kelly game under several behavioral models, comparing them in terms of convergence speed to the NE, and per-agent time-average utility. The results suggest that BR achieves the fastest convergence and the highest time-average utility, and that convergence to the stage-game NE may fail under heterogeneous update rules.